Re-identification Attack to Privacy-Preserving Data Analysis with Noisy Sample-Mean

Du Su, Hieu Tri Huynh, Ziao Chen, Yi Lu, Wenmiao Lu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In mining sensitive databases, access to sensitive class attributes of individual records is often prohibited by enforcing field-level security, while only aggregate class-specific statistics are allowed to be released. We consider a common privacy-preserving data analytics scenario where only a noisy sample mean of the class of interest can be queried. Such practice is widely found in medical research and business analytics settings. This paper studies the hazard of re-identification of entire class caused by revealing a noisy sample mean of the class. With a novel formulation of the re-identification attack as a generalized positive-unlabeled learning problem, we prove that the risk function of the re-identification problem is closely related to that of learning with complete data. We demonstrate that with a one-sided noisy sample mean, an effective re-identification attack can be devised with existing PU learning algorithms. We then propose a novel algorithm, growPU, that exploits the unique property of sample mean and consistently outperforms existing PU learning algorithms on the re-identification task. GrowPU achieves re-identification accuracy of 93.6% on the MNIST dataset and 88.1% on an online behavioral dataset with noiseless sample mean. With noise that guarantees 0.01-differential privacy, growPU achieves 91.9% on the MNIST dataset and 84.6% on the online behavioral dataset.

Original languageEnglish (US)
Title of host publicationKDD 2020 - Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1045-1053
Number of pages9
ISBN (Electronic)9781450379984
DOIs
StatePublished - Aug 23 2020
Event26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020 - Virtual, Online, United States
Duration: Aug 23 2020Aug 27 2020

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020
CountryUnited States
CityVirtual, Online
Period8/23/208/27/20

Keywords

  • data privacy
  • positive-unlabeled learning
  • re-identification

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Re-identification Attack to Privacy-Preserving Data Analysis with Noisy Sample-Mean'. Together they form a unique fingerprint.

Cite this