On dataless hierarchical text classification

Yangqiu Song, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we systematically study the problem of dataless hierarchical text classification. Unlike standard text classification schemes that rely on supervised training, dataless classification depends on understanding the labels of the sought after categories and requires no labeled data. Given a collection of text documents and a set of labels, we show that understanding the labels can be used to accurately categorize the documents. This is done by embedding both labels and documents in a semantic space that allows one to compute meaningful semantic similarity between a document and a potential label. We show that this scheme can be used to support accurate multiclass classification without any supervision. We study several semantic representations and show how to improve the classification using bootstrapping. Our results show that bootstrapped dataless classification is competitive with supervised classification with thousands of labeled examples.

Original languageEnglish (US)
Title of host publicationProceedings of the National Conference on Artificial Intelligence
PublisherAI Access Foundation
Pages1579-1585
Number of pages7
ISBN (Electronic)9781577356783
StatePublished - Jan 1 2014
Event28th AAAI Conference on Artificial Intelligence, AAAI 2014, 26th Innovative Applications of Artificial Intelligence Conference, IAAI 2014 and the 5th Symposium on Educational Advances in Artificial Intelligence, EAAI 2014 - Quebec City, Canada
Duration: Jul 27 2014Jul 31 2014

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume2

Other

Other28th AAAI Conference on Artificial Intelligence, AAAI 2014, 26th Innovative Applications of Artificial Intelligence Conference, IAAI 2014 and the 5th Symposium on Educational Advances in Artificial Intelligence, EAAI 2014
CountryCanada
CityQuebec City
Period7/27/147/31/14

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'On dataless hierarchical text classification'. Together they form a unique fingerprint.

Cite this