Sparse parameter recovery from aggregated data

Avradeep Bhowmik, Joydeep Ghosh, Oluwasanmi Koyejo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data aggregation is becoming an increasingly common technique for sharing sensitive information, and for reducing data size when storage and/or communication costs are high. Aggregate quantities such as group-average are a form of semi-supervision as they do not directly provide information of individual values, but despite their wide-spread use, prior literature on learning individual-level models from aggregated data is extremely limited. This paper investigates the effect of data aggregation on parameter recovery for a sparse linear model, when known results are no longer applicable. In particular, we consider a scenario where the data are collected into groups e.g. aggregated patient records, and first-order empirical moments are available only at the group level. Despite this obfuscation of individual data values, we can show that the true parameter is recoverable with high probability using these aggregates when the collection of true group moments is an incoherent matrix, and the empirical moment estimates have been computed from a sufficiently large number of samples. To the best of our knowledge, ours are the first results on structured parameter recovery using only aggregated data. Experimental results on synthetic data are provided in support of these theoretical claims. We also show that parameter estimation from aggregated data approaches the accuracy of parameter estimation obtainable from non-aggregated or "individual" samples, when applied to two real world healthcare applications-predictive modeling of CMS Medicare reimbursement claims, and modeling of Texas State healthcare charges.

Original languageEnglish (US)
Title of host publication33rd International Conference on Machine Learning, ICML 2016
EditorsKilian Q. Weinberger, Maria Florina Balcan
PublisherInternational Machine Learning Society (IMLS)
Pages1726-1741
Number of pages16
ISBN (Electronic)9781510829008
StatePublished - 2016
Event33rd International Conference on Machine Learning, ICML 2016 - New York City, United States
Duration: Jun 19 2016Jun 24 2016

Publication series

Name33rd International Conference on Machine Learning, ICML 2016
Volume3

Other

Other33rd International Conference on Machine Learning, ICML 2016
Country/TerritoryUnited States
CityNew York City
Period6/19/166/24/16

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Sparse parameter recovery from aggregated data'. Together they form a unique fingerprint.

Cite this