TY - JOUR
T1 - Anonymization and De-Anonymization of Mobility Trajectories
T2 - Dissecting the Gaps between Theory and Practice
AU - Wang, Huandong
AU - Li, Yong
AU - Gao, Chen
AU - Wang, Gang
AU - Tao, Xiaoming
AU - Jin, Depeng
N1 - Funding Information:
This work was supported in part by The National Key Research and Development Program of China under Grant 2018YFB1800804, the National Nature Science Foundation of China under Grants U1936217, 61971267, 61972223, 61941117, and 61861136003, Beijing Natural Science Foundation under Grant L182038, Beijing National Research Center for Information Science and Technology under Grant 20031887521, and research fund of Tsinghua University - Tencent Joint Laboratory for Internet Innovation Technology.
Publisher Copyright:
© 2002-2012 IEEE.
PY - 2021/3/1
Y1 - 2021/3/1
N2 - Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. Meanwhile, there is a growing concern that individual trajectories can be de-anonymized when the data is shared, using information from external sources (e.g., online social networks). To understand this risk, prior works either estimate the theoretical privacy bound or simulate de-anonymization attacks on synthetically created datasets. However, it is not clear how well the theoretical estimations are preserved in practice. In this article, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review service (45,790 users) on the same user population. The two sets of large ground-truth data provide a rare opportunity to extensively evaluate a variety of de-anonymization algorithms (nine in total). We find that their performance in the real-world dataset is far from the theoretical bound. Further analysis shows that most algorithms have under-estimated the impact of spatio-temporal mismatches between the data from different sources, and the high sparsity of user generated data also contributes to the under-performance. Based on these insights, we propose four new algorithms that are specially designed to tolerate spatial or temporal mismatches (or both) and model location contexts and time contexts. Extensive evaluations show that our algorithms achieve more than 17 percent performance gain over the best existing algorithms, confirming our insights. Further, we propose two new location-privacy preserving mechanisms utilizing the spatio-temporal mismatches to better protect users' privacy against the de-anonymization attack. Evaluation results show that our proposed mechanisms can reduce the performance of de-anonymization attacks by over 8.0 percent, demonstrating the effectiveness of our insights.
AB - Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. Meanwhile, there is a growing concern that individual trajectories can be de-anonymized when the data is shared, using information from external sources (e.g., online social networks). To understand this risk, prior works either estimate the theoretical privacy bound or simulate de-anonymization attacks on synthetically created datasets. However, it is not clear how well the theoretical estimations are preserved in practice. In this article, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review service (45,790 users) on the same user population. The two sets of large ground-truth data provide a rare opportunity to extensively evaluate a variety of de-anonymization algorithms (nine in total). We find that their performance in the real-world dataset is far from the theoretical bound. Further analysis shows that most algorithms have under-estimated the impact of spatio-temporal mismatches between the data from different sources, and the high sparsity of user generated data also contributes to the under-performance. Based on these insights, we propose four new algorithms that are specially designed to tolerate spatial or temporal mismatches (or both) and model location contexts and time contexts. Extensive evaluations show that our algorithms achieve more than 17 percent performance gain over the best existing algorithms, confirming our insights. Further, we propose two new location-privacy preserving mechanisms utilizing the spatio-temporal mismatches to better protect users' privacy against the de-anonymization attack. Evaluation results show that our proposed mechanisms can reduce the performance of de-anonymization attacks by over 8.0 percent, demonstrating the effectiveness of our insights.
KW - ISP
KW - Privacy
KW - anonymization and de-anonymization
KW - spatio-temporal trajectory
UR - http://www.scopus.com/inward/record.url?scp=85097750457&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097750457&partnerID=8YFLogxK
U2 - 10.1109/TMC.2019.2952774
DO - 10.1109/TMC.2019.2952774
M3 - Article
AN - SCOPUS:85097750457
VL - 20
SP - 796
EP - 815
JO - IEEE Transactions on Mobile Computing
JF - IEEE Transactions on Mobile Computing
SN - 1536-1233
IS - 3
M1 - 8896049
ER -