Active learning: A survey

Charu C. Aggarwal, Xiangnan Kong, Quanquan Gu, Jiawei Han, Philip S. Yu

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

One of the great challenges in a wide variety of learning problems is the ability to obtain sufficient labeled data for modeling purposes. Labeled data is often expensive to obtain, and frequently requires laborious human effort. In many domains, unlabeled data is copious, though labels can be attached to such data at a specific cost in the labeling process. Some examples of such data are as follows: Document Collections: Large amounts of document data may be available on the Web, which are usually unlabeled. In such cases, it is desirable to attach labels to documents in order to create a learning model. A common approach is to manually label the documents in order to label the training data, a process that is slow, painstaking, and laborious. Privacy-Constrained Data Sets: In many scenarios, the labels on records may be sensitive information, which may be acquired at a significant query cost (e.g., obtaining permission from the relevant entity). Social Networks: In social networks, it may be desirable to identify nodes with specific properties. For example, an advertising company may desire to identify nodes in the social network that are interested in “cosmetics.” However, it is rare that labeled nodes will be available in the network that have interests in a specific area. Identification of relevant nodes may only occur through either manual examination of social network posts, or through user surveys. Both processes are time-consuming and costly.

Original languageEnglish (US)
Title of host publicationData Classification
Subtitle of host publicationAlgorithms and Applications
PublisherCRC Press
Pages571-605
Number of pages35
ISBN (Electronic)9781466586758
ISBN (Print)9781466586741
DOIs
StatePublished - Jan 1 2014

Fingerprint

Labels
Cosmetics
Labeling
Marketing
Costs
Problem-Based Learning
Social networks
Active learning
Node
Industry

ASJC Scopus subject areas

  • Economics, Econometrics and Finance(all)
  • Business, Management and Accounting(all)
  • Computer Science(all)

Cite this

Aggarwal, C. C., Kong, X., Gu, Q., Han, J., & Yu, P. S. (2014). Active learning: A survey. In Data Classification: Algorithms and Applications (pp. 571-605). CRC Press. https://doi.org/10.1201/b17320

Active learning : A survey. / Aggarwal, Charu C.; Kong, Xiangnan; Gu, Quanquan; Han, Jiawei; Yu, Philip S.

Data Classification: Algorithms and Applications. CRC Press, 2014. p. 571-605.

Research output: Chapter in Book/Report/Conference proceedingChapter

Aggarwal, CC, Kong, X, Gu, Q, Han, J & Yu, PS 2014, Active learning: A survey. in Data Classification: Algorithms and Applications. CRC Press, pp. 571-605. https://doi.org/10.1201/b17320
Aggarwal CC, Kong X, Gu Q, Han J, Yu PS. Active learning: A survey. In Data Classification: Algorithms and Applications. CRC Press. 2014. p. 571-605 https://doi.org/10.1201/b17320
Aggarwal, Charu C. ; Kong, Xiangnan ; Gu, Quanquan ; Han, Jiawei ; Yu, Philip S. / Active learning : A survey. Data Classification: Algorithms and Applications. CRC Press, 2014. pp. 571-605
@inbook{7bb36d742bf34b9d854aa0778ec47903,
title = "Active learning: A survey",
abstract = "One of the great challenges in a wide variety of learning problems is the ability to obtain sufficient labeled data for modeling purposes. Labeled data is often expensive to obtain, and frequently requires laborious human effort. In many domains, unlabeled data is copious, though labels can be attached to such data at a specific cost in the labeling process. Some examples of such data are as follows: Document Collections: Large amounts of document data may be available on the Web, which are usually unlabeled. In such cases, it is desirable to attach labels to documents in order to create a learning model. A common approach is to manually label the documents in order to label the training data, a process that is slow, painstaking, and laborious. Privacy-Constrained Data Sets: In many scenarios, the labels on records may be sensitive information, which may be acquired at a significant query cost (e.g., obtaining permission from the relevant entity). Social Networks: In social networks, it may be desirable to identify nodes with specific properties. For example, an advertising company may desire to identify nodes in the social network that are interested in “cosmetics.” However, it is rare that labeled nodes will be available in the network that have interests in a specific area. Identification of relevant nodes may only occur through either manual examination of social network posts, or through user surveys. Both processes are time-consuming and costly.",
author = "Aggarwal, {Charu C.} and Xiangnan Kong and Quanquan Gu and Jiawei Han and Yu, {Philip S.}",
year = "2014",
month = "1",
day = "1",
doi = "10.1201/b17320",
language = "English (US)",
isbn = "9781466586741",
pages = "571--605",
booktitle = "Data Classification",
publisher = "CRC Press",

}

TY - CHAP

T1 - Active learning

T2 - A survey

AU - Aggarwal, Charu C.

AU - Kong, Xiangnan

AU - Gu, Quanquan

AU - Han, Jiawei

AU - Yu, Philip S.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - One of the great challenges in a wide variety of learning problems is the ability to obtain sufficient labeled data for modeling purposes. Labeled data is often expensive to obtain, and frequently requires laborious human effort. In many domains, unlabeled data is copious, though labels can be attached to such data at a specific cost in the labeling process. Some examples of such data are as follows: Document Collections: Large amounts of document data may be available on the Web, which are usually unlabeled. In such cases, it is desirable to attach labels to documents in order to create a learning model. A common approach is to manually label the documents in order to label the training data, a process that is slow, painstaking, and laborious. Privacy-Constrained Data Sets: In many scenarios, the labels on records may be sensitive information, which may be acquired at a significant query cost (e.g., obtaining permission from the relevant entity). Social Networks: In social networks, it may be desirable to identify nodes with specific properties. For example, an advertising company may desire to identify nodes in the social network that are interested in “cosmetics.” However, it is rare that labeled nodes will be available in the network that have interests in a specific area. Identification of relevant nodes may only occur through either manual examination of social network posts, or through user surveys. Both processes are time-consuming and costly.

AB - One of the great challenges in a wide variety of learning problems is the ability to obtain sufficient labeled data for modeling purposes. Labeled data is often expensive to obtain, and frequently requires laborious human effort. In many domains, unlabeled data is copious, though labels can be attached to such data at a specific cost in the labeling process. Some examples of such data are as follows: Document Collections: Large amounts of document data may be available on the Web, which are usually unlabeled. In such cases, it is desirable to attach labels to documents in order to create a learning model. A common approach is to manually label the documents in order to label the training data, a process that is slow, painstaking, and laborious. Privacy-Constrained Data Sets: In many scenarios, the labels on records may be sensitive information, which may be acquired at a significant query cost (e.g., obtaining permission from the relevant entity). Social Networks: In social networks, it may be desirable to identify nodes with specific properties. For example, an advertising company may desire to identify nodes in the social network that are interested in “cosmetics.” However, it is rare that labeled nodes will be available in the network that have interests in a specific area. Identification of relevant nodes may only occur through either manual examination of social network posts, or through user surveys. Both processes are time-consuming and costly.

UR - http://www.scopus.com/inward/record.url?scp=85054219779&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054219779&partnerID=8YFLogxK

U2 - 10.1201/b17320

DO - 10.1201/b17320

M3 - Chapter

AN - SCOPUS:85054219779

SN - 9781466586741

SP - 571

EP - 605

BT - Data Classification

PB - CRC Press

ER -