Text classification

Charu C. Aggarwal, Chengxiang Zhai

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The problem of classification has been widely studied in the database, data mining, and information retrieval communities. The problem of classification is defined as follows. Given a set of records D = {X1,…,XN} and a set of k different discrete values indexed by {1…k}, each representing a category, the task is to assign one category (equivalently the corresponding index value) to each record Xi. The problem is usually solved by using a supervised learning approach where a set of training data records (i.e., records with known category labels) are used to construct a classification model, which relates the features in the underlying record to one of the class labels. For a given test instance for which the class is unknown, the training model is used to predict a class label for this instance. The problem may also be solved by using unsupervised approaches that do not require labeled training data, in which case keyword queries characterizing each class are often manually created, and bootstrapping may be used to heuristically obtain pseudo training data. Our review focuses on supervised learning approaches.

Original languageEnglish (US)
Title of host publicationData Classification
Subtitle of host publicationAlgorithms and Applications
PublisherCRC Press
Pages287-336
Number of pages50
ISBN (Electronic)9781466586758
ISBN (Print)9781466586741
DOIs
StatePublished - Jan 1 2014

ASJC Scopus subject areas

  • Economics, Econometrics and Finance(all)
  • Business, Management and Accounting(all)
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Text classification'. Together they form a unique fingerprint.

Cite this