DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation

Boxin Wang, Fan Wu, Yunhui Long, Luka Rimanic, Ce Zhang, Bo Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent success of deep neural networks (DNNs) hinges on the availability of large-scale dataset; however, training on such dataset often poses privacy risks for sensitive training information. In this paper, we aim to explore the power of generative models and gradient sparsity, and propose a scalable privacy-preserving generative model DataLens, which is able to generate synthetic data in a differentially private (DP) way given sensitive input data. Thus, it is possible to train models for different down-stream tasks with the generated data while protecting the private information. In particular, we leverage the generative adversarial networks (GAN) and PATE framework to train multiple discriminators as "teacher"models, allowing them to vote with their gradient vectors to guarantee privacy. Comparing with the standard PATE privacy preserving framework which allows teachers to vote on one-dimensional predictions, voting on the high dimensional gradient vectors is challenging in terms of privacy preservation. As dimension reduction techniques are required, we need to navigate a delicate tradeoff space between (1) the improvement of privacy preservation and (2) the slowdown of SGD convergence. To tackle this, we propose a novel dimension compression and aggregation approach TopAgg, which combines top-k dimension compression with a corresponding noise injection mechanism. We theoretically prove that the DataLens framework guarantees differential privacy for its generated data, and provide a novel analysis on its convergence to illustrate such a tradeoff on privacy and convergence rate, which requires non-trivial analysis as it requires a joint analysis on gradient compression, coordinate-wise gradient clipping, and DP mechanism. To demonstrate the practical usage of DataLens, we conduct extensive experiments on diverse datasets including MNIST, Fashion-MNIST, and high dimensional CelebA and Place365 datasets. We show that DataLens significantly outperforms other baseline differentially private data generative models. Our code is publicly available at https://github.com/AI-secure/DataLens.

Original languageEnglish (US)
Title of host publicationCCS 2021 - Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security
PublisherAssociation for Computing Machinery
Pages2146-2168
Number of pages23
ISBN (Electronic)9781450384544
DOIs
StatePublished - Nov 12 2021
Event27th ACM Annual Conference on Computer and Communication Security, CCS 2021 - Virtual, Online, Korea, Republic of
Duration: Nov 15 2021Nov 19 2021

Publication series

NameProceedings of the ACM Conference on Computer and Communications Security
ISSN (Print)1543-7221

Conference

Conference27th ACM Annual Conference on Computer and Communication Security, CCS 2021
Country/TerritoryKorea, Republic of
CityVirtual, Online
Period11/15/2111/19/21

Keywords

  • differential privacy
  • generative models
  • gradient compression

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation'. Together they form a unique fingerprint.

Cite this