TY - JOUR
T1 - Robust Mean Estimation in High Dimensions
T2 - An Outlier-Fraction Agnostic and Efficient Algorithm
AU - Deshmukh, Aditya
AU - Liu, Jing
AU - Veeravalli, Venugopal V.
N1 - Funding Information:
This work was supported in part by the U.S. Army Research Laboratory under Cooperative Agreement W911NF-17-2-0196 and in part by the U.S. National Science Foundation through the University of Illinois at Urbana-Champaign under Grant 2106727. An earlier version of this paper was presented in part at the IEEE International Symposium on Information Theory (ISIT), Helsinki, Finland, from June 2022 to July 2022 [DOI: 10.1109/ISIT50566.2022.9834585].
Publisher Copyright:
© 2023 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is formulated as the minimization of the ℓ0-‘norm’ of an outlier indicator vector, under a second moment constraint on the datapoints. The ℓ0-‘norm’ is then relaxed to the ℓp-norm (0 < p ≤ 1) in the objective, and it is shown that the global minima for each of these objectives are order-optimal and have optimal breakdown point for the robust mean estimation problem. Furthermore, a computationally tractable iterative ℓp-minimization and hard thresholding algorithm is proposed that outputs an order-optimal robust estimate of the population mean. The proposed algorithm (with breakdown point ≈ 0.3) does not require prior knowledge of the fraction of outliers, in contrast with most existing algorithms, and for p = 1 it has near-linear time complexity. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods.
AB - The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is formulated as the minimization of the ℓ0-‘norm’ of an outlier indicator vector, under a second moment constraint on the datapoints. The ℓ0-‘norm’ is then relaxed to the ℓp-norm (0 < p ≤ 1) in the objective, and it is shown that the global minima for each of these objectives are order-optimal and have optimal breakdown point for the robust mean estimation problem. Furthermore, a computationally tractable iterative ℓp-minimization and hard thresholding algorithm is proposed that outputs an order-optimal robust estimate of the population mean. The proposed algorithm (with breakdown point ≈ 0.3) does not require prior knowledge of the fraction of outliers, in contrast with most existing algorithms, and for p = 1 it has near-linear time complexity. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods.
KW - Robust inference
KW - data contamination
KW - high-dimensional statistics
KW - linear time-complexity algorithms
UR - http://www.scopus.com/inward/record.url?scp=85149366027&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149366027&partnerID=8YFLogxK
U2 - 10.1109/TIT.2023.3249197
DO - 10.1109/TIT.2023.3249197
M3 - Article
AN - SCOPUS:85149366027
SN - 0018-9448
VL - 69
SP - 4675
EP - 4690
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 7
ER -