Abstract
Locality Preserving Indexing (LPI) has been quite successful in tackling document analysis problems, such as clustering or classification. The approach relies on the Locality Preserving Criterion, which preserves the locality of the data points. However, LPI takes every word in a data corpus into account, even though many words may not be useful for document clustering. To overcome this problem, we propose an approach called Locality Preserving Feature Learning (LPFL), which incorporates feature selection into LPI. Specifically, we aim to find a subset of features, and learn a linear transformation to optimize the Locality Preserving Criterion based on these features. The resulting optimization problem is a mixed integer programming problem, which we relax into a constrained Frobenius norm minimization problem, and solve using a variation of Alternating Direction Method (ADM). ADM, which iteratively updates the linear transformation matrix, the residue matrix and the Lagrangian multiplier, is theoretically guaranteed to converge at the rate O( 1/t ). Experiments on benchmark document datasets show that our proposed method outperforms LPI, as well as other state-of-the-art document analysis approaches.
Original language | English (US) |
---|---|
Pages (from-to) | 477-485 |
Number of pages | 9 |
Journal | Journal of Machine Learning Research |
Volume | 22 |
State | Published - 2012 |
Event | 15th International Conference on Artificial Intelligence and Statistics, AISTATS 2012 - La Palma, Spain Duration: Apr 21 2012 → Apr 23 2012 |
ASJC Scopus subject areas
- Software
- Artificial Intelligence
- Control and Systems Engineering
- Statistics and Probability