TY - GEN
T1 - Sparse parameter recovery from aggregated data
AU - Bhowmik, Avradeep
AU - Ghosh, Joydeep
AU - Koyejo, Oluwasanmi
N1 - Funding Information:
Authors acknowledge support from NSF under grant IIS-1421729.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2016
Y1 - 2016
N2 - Data aggregation is becoming an increasingly common technique for sharing sensitive information, and for reducing data size when storage and/or communication costs are high. Aggregate quantities such as group-average are a form of semi-supervision as they do not directly provide information of individual values, but despite their wide-spread use, prior literature on learning individual-level models from aggregated data is extremely limited. This paper investigates the effect of data aggregation on parameter recovery for a sparse linear model, when known results are no longer applicable. In particular, we consider a scenario where the data are collected into groups e.g. aggregated patient records, and first-order empirical moments are available only at the group level. Despite this obfuscation of individual data values, we can show that the true parameter is recoverable with high probability using these aggregates when the collection of true group moments is an incoherent matrix, and the empirical moment estimates have been computed from a sufficiently large number of samples. To the best of our knowledge, ours are the first results on structured parameter recovery using only aggregated data. Experimental results on synthetic data are provided in support of these theoretical claims. We also show that parameter estimation from aggregated data approaches the accuracy of parameter estimation obtainable from non-aggregated or "individual" samples, when applied to two real world healthcare applications-predictive modeling of CMS Medicare reimbursement claims, and modeling of Texas State healthcare charges.
AB - Data aggregation is becoming an increasingly common technique for sharing sensitive information, and for reducing data size when storage and/or communication costs are high. Aggregate quantities such as group-average are a form of semi-supervision as they do not directly provide information of individual values, but despite their wide-spread use, prior literature on learning individual-level models from aggregated data is extremely limited. This paper investigates the effect of data aggregation on parameter recovery for a sparse linear model, when known results are no longer applicable. In particular, we consider a scenario where the data are collected into groups e.g. aggregated patient records, and first-order empirical moments are available only at the group level. Despite this obfuscation of individual data values, we can show that the true parameter is recoverable with high probability using these aggregates when the collection of true group moments is an incoherent matrix, and the empirical moment estimates have been computed from a sufficiently large number of samples. To the best of our knowledge, ours are the first results on structured parameter recovery using only aggregated data. Experimental results on synthetic data are provided in support of these theoretical claims. We also show that parameter estimation from aggregated data approaches the accuracy of parameter estimation obtainable from non-aggregated or "individual" samples, when applied to two real world healthcare applications-predictive modeling of CMS Medicare reimbursement claims, and modeling of Texas State healthcare charges.
UR - http://www.scopus.com/inward/record.url?scp=84999018430&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84999018430&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84999018430
T3 - 33rd International Conference on Machine Learning, ICML 2016
SP - 1726
EP - 1741
BT - 33rd International Conference on Machine Learning, ICML 2016
A2 - Weinberger, Kilian Q.
A2 - Balcan, Maria Florina
PB - International Machine Learning Society (IMLS)
T2 - 33rd International Conference on Machine Learning, ICML 2016
Y2 - 19 June 2016 through 24 June 2016
ER -