TY - GEN
T1 - Privacy risk in anonymized heterogeneous information networks
AU - Zhang, Aston
AU - Gunter, Carl A.
AU - Xie, Xing
AU - Han, Jiawei
AU - Chang, Kevin Chen Chuan
AU - Wang, Xiaofeng
N1 - Funding Information:
This work was supported by HHS 90TR0003-01 (SHARPS), NSF CNS 0964392 (NSF EBAM), 1017782, 1117106, 1223477, 1223495, IIS 1018723, the Adv. Digital Sci. Center UIUC, the Multimodal Info. Access and Synthesis Center UIUC, the U.S. Army Research Lab under Cooperative Agreement No. W911NF-09-2-0053 (NS-CTA) and the U.S. Army Research Office under Cooperative Agreement No. W911NF-13-1-0193. The views expressed are those of the authors only. We thank organizers of KDD Cup 2012 and Tencent Inc. for the datasets, and thank Yizhou Sun, Manish Gupta, Rui Li and Vincent Bindschaedler for insightful discussions.
PY - 2014
Y1 - 2014
N2 - Anonymized user datasets are often released for research or industry applications. As an example, t.qq.com released its anonymized users' profile, social interaction, and recommendation log data in KDD Cup 2012 to call for recommendation algorithms. Since the entities (users and so on) and edges (links among entities) are of multiple types, the released social network is a heterogeneous information network. Prior work has shown how privacy can be compromised in homogeneous information networks by the use of specific types of graph patterns. We show how the extra information derived from heterogeneity can be used to relax these assumptions. To characterize and demonstrate this added threat, we formally define privacy risk in an anonymized heterogeneous information network to identify the vulnerability in the possible way such data are released, and further present a new de-anonymization attack that exploits the vulnerability. Our attack successfully de-anonymized most individuals involved in the data-for an anonymized 1,000-user t.qq.com network of density 0.01, the attack precision is over 90% with a 2.3-million-user auxiliary network.
AB - Anonymized user datasets are often released for research or industry applications. As an example, t.qq.com released its anonymized users' profile, social interaction, and recommendation log data in KDD Cup 2012 to call for recommendation algorithms. Since the entities (users and so on) and edges (links among entities) are of multiple types, the released social network is a heterogeneous information network. Prior work has shown how privacy can be compromised in homogeneous information networks by the use of specific types of graph patterns. We show how the extra information derived from heterogeneity can be used to relax these assumptions. To characterize and demonstrate this added threat, we formally define privacy risk in an anonymized heterogeneous information network to identify the vulnerability in the possible way such data are released, and further present a new de-anonymization attack that exploits the vulnerability. Our attack successfully de-anonymized most individuals involved in the data-for an anonymized 1,000-user t.qq.com network of density 0.01, the attack precision is over 90% with a 2.3-million-user auxiliary network.
KW - Anonymization
KW - Attack
KW - Data mining
KW - Privacy
KW - Social networks
UR - http://www.scopus.com/inward/record.url?scp=85014324638&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85014324638&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2014.53
DO - 10.5441/002/edbt.2014.53
M3 - Conference contribution
AN - SCOPUS:85014324638
T3 - Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings
SP - 595
EP - 606
BT - Advances in Database Technology - EDBT 2014
A2 - Leroy, Vincent
A2 - Christophides, Vassilis
A2 - Christophides, Vassilis
A2 - Idreos, Stratos
A2 - Kementsietsidis, Anastasios
A2 - Garofalakis, Minos
A2 - Amer-Yahia, Sihem
PB - OpenProceedings.org, University of Konstanz, University Library
T2 - 17th International Conference on Extending Database Technology, EDBT 2014
Y2 - 24 March 2014 through 28 March 2014
ER -