Active learning: A survey

Charu C. Aggarwal, Xiangnan Kong, Quanquan Gu, Jiawei Han, Philip S. Yu

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

One of the great challenges in a wide variety of learning problems is the ability to obtain sufficient labeled data for modeling purposes. Labeled data is often expensive to obtain, and frequently requires laborious human effort. In many domains, unlabeled data is copious, though labels can be attached to such data at a specific cost in the labeling process. Some examples of such data are as follows: Document Collections: Large amounts of document data may be available on the Web, which are usually unlabeled. In such cases, it is desirable to attach labels to documents in order to create a learning model. A common approach is to manually label the documents in order to label the training data, a process that is slow, painstaking, and laborious. Privacy-Constrained Data Sets: In many scenarios, the labels on records may be sensitive information, which may be acquired at a significant query cost (e.g., obtaining permission from the relevant entity). Social Networks: In social networks, it may be desirable to identify nodes with specific properties. For example, an advertising company may desire to identify nodes in the social network that are interested in “cosmetics.” However, it is rare that labeled nodes will be available in the network that have interests in a specific area. Identification of relevant nodes may only occur through either manual examination of social network posts, or through user surveys. Both processes are time-consuming and costly.

Original languageEnglish (US)
Title of host publicationData Classification
Subtitle of host publicationAlgorithms and Applications
PublisherCRC Press
Pages571-605
Number of pages35
ISBN (Electronic)9781466586758
ISBN (Print)9781466586741
DOIs
StatePublished - Jan 1 2014

    Fingerprint

ASJC Scopus subject areas

  • Economics, Econometrics and Finance(all)
  • Business, Management and Accounting(all)
  • Computer Science(all)

Cite this

Aggarwal, C. C., Kong, X., Gu, Q., Han, J., & Yu, P. S. (2014). Active learning: A survey. In Data Classification: Algorithms and Applications (pp. 571-605). CRC Press. https://doi.org/10.1201/b17320