TY - JOUR
T1 - Plausible deniability for privacy-preserving data synthesis
AU - Bindschaedler, Vincent
AU - Shokri, Reza
AU - Gunter, Carl A.
N1 - Funding Information:
This work was supported in part by NSF CNS grants 13-30491 and 14-08944. The views expressed are those of the authors only.
PY - 2016
Y1 - 2016
N2 - Releasing full data records is one of the most challenging problems in data privacy. On the one hand, many of the popular techniques such as data de-identification are problematic because of their dependence on the background knowledge of adversaries. On the other hand, rigorous methods such as the exponential mechanism for differential privacy are often computationally impractical to use for releasing high dimensional data or cannot preserve high utility of original data due to their extensive data perturbation. This paper presents a criterion called plausible deniability that provides a formal privacy guarantee, notably for releasing sensitive datasets: An output record can be released only if a certain amount of input records are indistinguishable, up to a privacy parameter. This notion does not depend on the background knowledge of an adversary. Also, it can efficiently be checked by privacy tests. We present mechanisms to generate synthetic datasets with similar statistical properties to the input data and the same format. We study this technique both theoretically and experimentally. A key theoretical result shows that, with proper randomization, the plausible deniability mechanism generates differentially private synthetic data. We demonstrate the efficiency of this generative technique on a large dataset; it is shown to preserve the utility of original data with respect to various statistical analysis and machine learning measures.
AB - Releasing full data records is one of the most challenging problems in data privacy. On the one hand, many of the popular techniques such as data de-identification are problematic because of their dependence on the background knowledge of adversaries. On the other hand, rigorous methods such as the exponential mechanism for differential privacy are often computationally impractical to use for releasing high dimensional data or cannot preserve high utility of original data due to their extensive data perturbation. This paper presents a criterion called plausible deniability that provides a formal privacy guarantee, notably for releasing sensitive datasets: An output record can be released only if a certain amount of input records are indistinguishable, up to a privacy parameter. This notion does not depend on the background knowledge of an adversary. Also, it can efficiently be checked by privacy tests. We present mechanisms to generate synthetic datasets with similar statistical properties to the input data and the same format. We study this technique both theoretically and experimentally. A key theoretical result shows that, with proper randomization, the plausible deniability mechanism generates differentially private synthetic data. We demonstrate the efficiency of this generative technique on a large dataset; it is shown to preserve the utility of original data with respect to various statistical analysis and machine learning measures.
UR - http://www.scopus.com/inward/record.url?scp=85020456590&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020456590&partnerID=8YFLogxK
U2 - 10.14778/3055540.3055542
DO - 10.14778/3055540.3055542
M3 - Conference article
AN - SCOPUS:85020456590
VL - 10
SP - 481
EP - 492
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
SN - 2150-8097
IS - 5
T2 - 43rd International Conference on Very Large Data Bases, VLDB 2017
Y2 - 28 August 2017 through 1 September 2017
ER -