TY - JOUR
T1 - An Unsupervised Approach to Inferring the Localness of People Using Incomplete Geotemporal Online Check-in Data
AU - Huang, Chao
AU - Wang, Dong
AU - Tao, Jun
N1 - Funding Information:
This material is based on work supported by the National Science Foundation under grants CBET-1637251, CNS-1566465, and IIS-1447795, and the Army Research Office under grant W911NF-16-1-0388. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. Authors’ addresses: C. Huang, 100 University Village Apt A, Notre Dame, IN, 46556; email: [email protected]; D. Wang, 214 B Cushing, University of Notre Dame, Notre Dame, IN, 46556; email: [email protected]; J. Tao, 2007 Coachmans Trail, South Bend, IN, 46637; email: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. ©c 2017 ACM 2157-6904/2017/08-ART80 $15.00 DOI: http://dx.doi.org/10.1145/3022471
Publisher Copyright:
© 2017 ACM.
PY - 2017/8
Y1 - 2017/8
N2 - Inferring the localness of people is to classify people who are local residents in a city from people who visit the city by analyzing online check-in points that are contributed by online users. This information is critical for the urban planning, user profiling, and localized recommendation systems. Supervised learning approaches have been developed to infer the location of people in a city by assuming the availability of high-quality training datasets with complete geotemporal information. In this article, we develop an unsupervised model to accurately identify local people in a city by using the incomplete online check-in data that are publicly available. In particular, we develop an incomplete geotemporal expectation maximization (IGT-EM) scheme, which incorporates a set of hidden variables to represent the localness of people and a set of estimation parameters to represent the likelihood of venues to attract local and nonlocal people, respectively. Our solution can accurately classify local people from nonlocal nones without requiring any training data. We also implement a parallel IGT-EM algorithm by leveraging the computing power of a graphic processing unit (GPU) that consists of 2,496 cores. In the evaluation, we compare our new approach with the existing solutions through four real-world case studies using data from the New York City, Chicago, Boston, and Washington, DC. The results show that our approach can identify the local people and significantly outperform the compared baselines in estimation accuracy and execution time.
AB - Inferring the localness of people is to classify people who are local residents in a city from people who visit the city by analyzing online check-in points that are contributed by online users. This information is critical for the urban planning, user profiling, and localized recommendation systems. Supervised learning approaches have been developed to infer the location of people in a city by assuming the availability of high-quality training datasets with complete geotemporal information. In this article, we develop an unsupervised model to accurately identify local people in a city by using the incomplete online check-in data that are publicly available. In particular, we develop an incomplete geotemporal expectation maximization (IGT-EM) scheme, which incorporates a set of hidden variables to represent the localness of people and a set of estimation parameters to represent the likelihood of venues to attract local and nonlocal people, respectively. Our solution can accurately classify local people from nonlocal nones without requiring any training data. We also implement a parallel IGT-EM algorithm by leveraging the computing power of a graphic processing unit (GPU) that consists of 2,496 cores. In the evaluation, we compare our new approach with the existing solutions through four real-world case studies using data from the New York City, Chicago, Boston, and Washington, DC. The results show that our approach can identify the local people and significantly outperform the compared baselines in estimation accuracy and execution time.
KW - Crowdsourcing
KW - GPU implementation
KW - Localness of people
KW - Maximum likelihood estimation
KW - Online social networks
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85028594056&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85028594056&partnerID=8YFLogxK
U2 - 10.1145/3022471
DO - 10.1145/3022471
M3 - Article
AN - SCOPUS:85028594056
SN - 2157-6904
VL - 8
JO - ACM Transactions on Intelligent Systems and Technology
JF - ACM Transactions on Intelligent Systems and Technology
IS - 6
M1 - 80
ER -