TY - JOUR

T1 - Robust Mean Estimation in High Dimensions

T2 - An Outlier-Fraction Agnostic and Efficient Algorithm

AU - Deshmukh, Aditya

AU - Liu, Jing

AU - Veeravalli, Venugopal V.

N1 - Funding Information:
This work was supported in part by the U.S. Army Research Laboratory under Cooperative Agreement W911NF-17-2-0196 and in part by the U.S. National Science Foundation through the University of Illinois at Urbana-Champaign under Grant 2106727. An earlier version of this paper was presented in part at the IEEE International Symposium on Information Theory (ISIT), Helsinki, Finland, from June 2022 to July 2022 [DOI: 10.1109/ISIT50566.2022.9834585].
Publisher Copyright:
© 2023 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.

PY - 2023/7/1

Y1 - 2023/7/1

N2 - The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is formulated as the minimization of the ℓ0-‘norm’ of an outlier indicator vector, under a second moment constraint on the datapoints. The ℓ0-‘norm’ is then relaxed to the ℓp-norm (0 < p ≤ 1) in the objective, and it is shown that the global minima for each of these objectives are order-optimal and have optimal breakdown point for the robust mean estimation problem. Furthermore, a computationally tractable iterative ℓp-minimization and hard thresholding algorithm is proposed that outputs an order-optimal robust estimate of the population mean. The proposed algorithm (with breakdown point ≈ 0.3) does not require prior knowledge of the fraction of outliers, in contrast with most existing algorithms, and for p = 1 it has near-linear time complexity. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods.

AB - The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is formulated as the minimization of the ℓ0-‘norm’ of an outlier indicator vector, under a second moment constraint on the datapoints. The ℓ0-‘norm’ is then relaxed to the ℓp-norm (0 < p ≤ 1) in the objective, and it is shown that the global minima for each of these objectives are order-optimal and have optimal breakdown point for the robust mean estimation problem. Furthermore, a computationally tractable iterative ℓp-minimization and hard thresholding algorithm is proposed that outputs an order-optimal robust estimate of the population mean. The proposed algorithm (with breakdown point ≈ 0.3) does not require prior knowledge of the fraction of outliers, in contrast with most existing algorithms, and for p = 1 it has near-linear time complexity. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods.

KW - Robust inference

KW - data contamination

KW - high-dimensional statistics

KW - linear time-complexity algorithms

UR - http://www.scopus.com/inward/record.url?scp=85149366027&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85149366027&partnerID=8YFLogxK

U2 - 10.1109/TIT.2023.3249197

DO - 10.1109/TIT.2023.3249197

M3 - Article

AN - SCOPUS:85149366027

SN - 0018-9448

VL - 69

SP - 4675

EP - 4690

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

IS - 7

ER -