Competitive Learning Mechanisms for Scalable, Incremental and Balanced Clustering of Streaming Texts

Arindam Banerjee, Joydeep Ghosh

Research output: Contribution to conferencePaperpeer-review

Abstract

Automated clustering of text documents such as web pages is becoming increasingly important for organizing the vast amounts of information available over the internet. This problem is also very challenging since typically text is represented by very high dimensional (> 1000), normalized (unit length) vectors. Moreover documents are continually being created and their statistics also change with time because of changing new-stories etc, so one needs incremental learning algorithms that can adapt to non-stationary environments. We model high-dimensional, normalized data using a mixture of von Mises-Fisher distributions, and then modify this generative model in a principled way to yield frequency sensitive competitive learning mechanisms that are applicable to streaming data, and produce balanced clusters. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques.

Original languageEnglish (US)
Pages2697-2702
Number of pages6
StatePublished - 2003
Externally publishedYes
EventInternational Joint Conference on Neural Networks 2003 - Portland, OR, United States
Duration: Jul 20 2003Jul 24 2003

Other

OtherInternational Joint Conference on Neural Networks 2003
Country/TerritoryUnited States
CityPortland, OR
Period7/20/037/24/03

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Competitive Learning Mechanisms for Scalable, Incremental and Balanced Clustering of Streaming Texts'. Together they form a unique fingerprint.

Cite this