Aspect guided text categorization with unobserved labels

Dan Roth, Yuancheng Tu

Research output: Chapter in Book/Report/Conference proceedingConference contribution


This paper proposes a novel multiclass classification method and exhibits its advantage in the domain of text categorization with a large label space and, most importantly, when some of the labels were not observed in the training data. The key insight is the introduction of intermediate aspect variables that encode properties of the labels. Aspect variables serve as a joint representation for observed and unobserved labels. This way the classification problem can be viewed as a structure learning problem with natural constraints on assignments to the aspect variables. We solve the problem as a constrained optimization problem over multiple learners and show significant improvement in classifying short sentences into a large label space of categories, including previously unobserved categories.

Original languageEnglish (US)
Title of host publicationICDM 2009 - The 9th IEEE International Conference on Data Mining
Number of pages6
StatePublished - 2009
Event9th IEEE International Conference on Data Mining, ICDM 2009 - Miami, FL, United States
Duration: Dec 6 2009Dec 9 2009


Other9th IEEE International Conference on Data Mining, ICDM 2009
CountryUnited States
CityMiami, FL

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Aspect guided text categorization with unobserved labels'. Together they form a unique fingerprint.

Cite this