Abstract
The problem of robust mean estimation in high dimensions is studied, in which a certain fraction (less than half) of the datapoints can be arbitrarily corrupted. Motivated by compressive sensing, the robust mean estimation problem is formulated as the minimization of the ℓ0-‘norm’ of an outlier indicator vector, under a second moment constraint on the datapoints. The ℓ0-‘norm’ is then relaxed to the ℓp-norm (0 < p ≤ 1) in the objective, and it is shown that the global minima for each of these objectives are order-optimal and have optimal breakdown point for the robust mean estimation problem. Furthermore, a computationally tractable iterative ℓp-minimization and hard thresholding algorithm is proposed that outputs an order-optimal robust estimate of the population mean. The proposed algorithm (with breakdown point ≈ 0.3) does not require prior knowledge of the fraction of outliers, in contrast with most existing algorithms, and for p = 1 it has near-linear time complexity. Both synthetic and real data experiments demonstrate that the proposed algorithm outperforms state-of-the-art robust mean estimation methods.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 4675-4690 |
| Number of pages | 16 |
| Journal | IEEE Transactions on Information Theory |
| Volume | 69 |
| Issue number | 7 |
| DOIs | |
| State | Published - Jul 1 2023 |
Keywords
- Robust inference
- data contamination
- high-dimensional statistics
- linear time-complexity algorithms
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Library and Information Sciences
Fingerprint
Dive into the research topics of 'Robust Mean Estimation in High Dimensions: An Outlier-Fraction Agnostic and Efficient Algorithm'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS