NRF: A naive re-identification framework

Shubhra Kanti Karmaker Santu, Vincent Bindschadler, Cheng Xiang Zhai, Carl A. Gunter

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The promise of big data relies on the release and aggregation of data sets. When these data sets contain sensitive information about individuals, it has been scalable and convenient to protect the privacy of these individuals by de-identification. However, studies show that the combination of de-identified data sets with other data sets risks re-identification of some records. Some studies have shown how to measure this risk in specific contexts where certain types of public data sets (such as voter roles) are assumed to be available to attackers. To the extent that it can be accomplished, such analyses enable the threat of compromises to be balanced against the benefits of sharing data. For example, a study that might save lives by enabling medical research may be enabled in light of a sufficiently low probability of compromise from sharing de-identified data. In this paper, we introduce a general probabilistic re-identification framework that can be instantiated in specific contexts to estimate the probability of compromises based on explicit assumptions. We further propose a baseline of such assumptions that enable a first-cut estimate of risk for practical case studies. We refer to the framework with these assumptions as the Naive Re-identification Framework (NRF). As a case study, we show how we can apply NRF to analyze and quantify the risk of re-identification arising from releasing de-identified medical data in the context of publicly-available social media data. The results of this case study show that NRF can be used to obtain meaningful quantification of the re-identification risk, compare the risk of different social media, and assess risks of combinations of various demographic attributes and medical conditions that individuals may voluntarily disclose on social media.

Original languageEnglish (US)
Title of host publicationWPES 2018 - Proceedings of the 2018 Workshop on Privacy in the Electronic Society, co-located with CCS 2018
PublisherAssociation for Computing Machinery
Pages121-132
Number of pages12
ISBN (Electronic)9781450359894
DOIs
StatePublished - Oct 15 2018
Event17th ACM Workshop on Privacy in the Electronic Society, WPES 2018, held in conjunction with the 25th ACM Conference on Computer and Communications Security, CCS 2018 - Toronto, Canada
Duration: Oct 15 2018 → …

Publication series

NameProceedings of the ACM Conference on Computer and Communications Security
ISSN (Print)1543-7221

Other

Other17th ACM Workshop on Privacy in the Electronic Society, WPES 2018, held in conjunction with the 25th ACM Conference on Computer and Communications Security, CCS 2018
Country/TerritoryCanada
CityToronto
Period10/15/18 → …

Keywords

  • Data privacy
  • Formal privacy model
  • Patient privacy
  • Probabilistic framework
  • Re-identification risk

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'NRF: A naive re-identification framework'. Together they form a unique fingerprint.

Cite this