TY - JOUR
T1 - Linking Multiple User Identities of Multiple Services from Massive Mobility Traces
AU - Wang, Huandong
AU - Li, Yong
AU - Wang, Gang
AU - Jin, Depeng
N1 - This work was supported in part by The National Key Research and Development Program of China under grant 2018YFB1800804, the National Nature Science Foundation of China under U1936217, 61971267, 61972223, 61941117, 61861136003, Beijing Natural Science Foundation under L182038, Beijing National Research Center for Information Science and Technology under 20031887521, and research fund of Tsinghua University - Tencent Joint Laboratory for Internet Innovation Technology. Authors’ addresses: H. Wang, Y. Li, and D. Jin, Beijing National Research Center for Information Science and Technology (BNRist), Department of Electronic Engineering, Tsinghua University, China; emails: {wanghuandong, liyong07, jindp}@tsinghua.edu.cn; G. Wang, University of Illinois at Urbana-Champaign (UIUC), USA; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2021 Association for Computing Machinery. 2157-6904/2021/08-ART39 $15.00 https://doi.org/10.1145/3439817
PY - 2021/8/12
Y1 - 2021/8/12
N2 - Understanding the linkability of online user identifiers (IDs) is critical to both service providers (for business intelligence) and individual users (for assessing privacy risks). Existing methods are designed to match IDs across two services but face key challenges of matching multiple services in practice, particularly when users have multiple IDs per service. In this article, we propose a novel system to link IDs across multiple services by exploring the spatial-temporal features of user activities, of which the core idea is that the same user's online IDs are more likely to repeatedly appear at the same location. Specifically, we first utilize a contact graph to capture the "co-location"of all IDs across multiple services. Based on this graph, we propose a set-wise matching algorithm to discover candidate ID sets and use Bayesian inference to generate confidence scores for candidate ranking, which is proved to be optimal. We evaluate our system using two real-world ground-truth datasets from an Internet service provider (4 services, 815K IDs) and Twitter-Foursquare (2 services, 770 IDs). Extensive results show that our system significantly outperforms the state-of-the-art algorithms in accuracy (AUC is higher by 0.1-0.2), and it is highly robust against data quality, matching order, and number of services.
AB - Understanding the linkability of online user identifiers (IDs) is critical to both service providers (for business intelligence) and individual users (for assessing privacy risks). Existing methods are designed to match IDs across two services but face key challenges of matching multiple services in practice, particularly when users have multiple IDs per service. In this article, we propose a novel system to link IDs across multiple services by exploring the spatial-temporal features of user activities, of which the core idea is that the same user's online IDs are more likely to repeatedly appear at the same location. Specifically, we first utilize a contact graph to capture the "co-location"of all IDs across multiple services. Based on this graph, we propose a set-wise matching algorithm to discover candidate ID sets and use Bayesian inference to generate confidence scores for candidate ranking, which is proved to be optimal. We evaluate our system using two real-world ground-truth datasets from an Internet service provider (4 services, 815K IDs) and Twitter-Foursquare (2 services, 770 IDs). Extensive results show that our system significantly outperforms the state-of-the-art algorithms in accuracy (AUC is higher by 0.1-0.2), and it is highly robust against data quality, matching order, and number of services.
KW - Identity linkage
KW - online services
KW - set-wise id matching
KW - spatio-temporal trajectory
UR - https://www.scopus.com/pages/publications/85122505209
UR - https://www.scopus.com/pages/publications/85122505209#tab=citedBy
U2 - 10.1145/3439817
DO - 10.1145/3439817
M3 - Article
AN - SCOPUS:85122505209
SN - 2157-6904
VL - 12
JO - ACM Transactions on Intelligent Systems and Technology
JF - ACM Transactions on Intelligent Systems and Technology
IS - 4
M1 - 39
ER -