Dual-clustering maximum entropy with application to classification and word embedding

Xiaolong Wang, Jingjing Wang, Chengxiang Zhai

Research output: Contribution to conferencePaperpeer-review

Abstract

Maximum Entropy (ME), as a general-purpose machine learning model, has been successfully applied to various fields such as text mining and natural language processing. It has been used as a classification technique and recently also applied to learn word embedding. ME establishes a distribution of the exponential form over items (classes/words). When training such a model, learning efficiency is guaranteed by globally updating the entire set of model parameters associated with all items at each training instance. This creates a significant computational challenge when the number of items is large. To achieve learning efficiency with affordable computational cost, we propose an approach named Dual-Clustering Maximum Entropy (DCME). Exploiting the primal-dual form of ME, it conducts clustering in the dual space and approximates each dual distribution by the corresponding cluster center. This naturally enables a hybrid online-offline optimization algorithm whose time complexity per instance only scales as the product of the feature/word vector dimensionality and the cluster number. Experimental studies on text classification and word embedding learning demonstrate that DCME effectively strikes a balance between training speed and model quality, substantially outperforming state-of-the-art methods.

Original languageEnglish (US)
Pages3323-3329
Number of pages7
StatePublished - 2017
Event31st AAAI Conference on Artificial Intelligence, AAAI 2017 - San Francisco, United States
Duration: Feb 4 2017Feb 10 2017

Other

Other31st AAAI Conference on Artificial Intelligence, AAAI 2017
Country/TerritoryUnited States
CitySan Francisco
Period2/4/172/10/17

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Dual-clustering maximum entropy with application to classification and word embedding'. Together they form a unique fingerprint.

Cite this