TY - GEN
T1 - De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice
AU - Wang, Huandong
AU - Gao, Chen
AU - Li, Yong
AU - Wang, Gang
AU - Jin, Depeng
AU - Sun, Jingbo
N1 - The authors want to thank the anonymous reviewers for their helpful comments. This work was in part supported by the NSF grant CNS-1717028.
PY - 2018
Y1 - 2018
N2 - Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. Meanwhile, there is a growing concern that individual trajectories can be de-anonymized when the data is shared, using information from external sources (e.g. online social networks). To understand this risk, prior works either estimate the theoretical privacy bound or simulate de-anonymization attacks on synthetically created (small) datasets. However, it is not clear how well the theoretical estimations are preserved in practice. In this paper, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review service (45,790 users) on the same user population. The two sets of large ground-truth data provide a rare opportunity to extensively evaluate a variety of de-anonymization algorithms (7 in total). We find that their performance in the real-world dataset is far from the theoretical bound. Further analysis shows that most algorithms have underestimated the impact of spatio-temporal mismatches between the data from different sources, and the high sparsity of user generated data also contributes to the underperformance. Based on these insights, we propose 4 new algorithms that are specially designed to tolerate spatial or temporal mismatches (or both) and model user behavior. Extensive evaluations show that our algorithms achieve more than 17% performance gain over the best existing algorithms, confirming our insights.
AB - Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. Meanwhile, there is a growing concern that individual trajectories can be de-anonymized when the data is shared, using information from external sources (e.g. online social networks). To understand this risk, prior works either estimate the theoretical privacy bound or simulate de-anonymization attacks on synthetically created (small) datasets. However, it is not clear how well the theoretical estimations are preserved in practice. In this paper, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review service (45,790 users) on the same user population. The two sets of large ground-truth data provide a rare opportunity to extensively evaluate a variety of de-anonymization algorithms (7 in total). We find that their performance in the real-world dataset is far from the theoretical bound. Further analysis shows that most algorithms have underestimated the impact of spatio-temporal mismatches between the data from different sources, and the high sparsity of user generated data also contributes to the underperformance. Based on these insights, we propose 4 new algorithms that are specially designed to tolerate spatial or temporal mismatches (or both) and model user behavior. Extensive evaluations show that our algorithms achieve more than 17% performance gain over the best existing algorithms, confirming our insights.
UR - https://www.scopus.com/pages/publications/85095956575
UR - https://www.scopus.com/pages/publications/85095956575#tab=citedBy
U2 - 10.14722/ndss.2018.23211
DO - 10.14722/ndss.2018.23211
M3 - Conference contribution
T3 - 25th Annual Network and Distributed System Security Symposium, NDSS 2018
BT - 25th Annual Network and Distributed System Security Symposium, NDSS 2018
PB - The Internet Society
T2 - 25th Annual Network and Distributed System Security Symposium, NDSS 2018
Y2 - 18 February 2018 through 21 February 2018
ER -