TY - JOUR
T1 - Margin-based active learning for structured predictions
AU - Small, Kevin
AU - Roth, Dan
N1 - Funding Information:
The authors would like to thanks Ming-Wei Chang, Alex Klementiev, Vasin Punyakanok, Nick Rizzolo, and the reviewers for their helpful comments regarding this work. This work has been partially funded by NSF grant ITR IIS-0428472, a research grant from Motorola Labs, DARPA funding under the Bootstrap Learning Program, and by MIAS, a DHS funded Center for Multimodal Information Access and Synthesis at UIUC.
PY - 2010/12
Y1 - 2010/12
N2 - Margin-based active learning remains the most widely used active learning paradigm due to its simplicity and empirical successes. However, most works are limited to binary or multiclass prediction problems, thus restricting the applicability of these approaches to many complex prediction problems where active learning would be most useful. For example, machine learning techniques for natural language processing applications often require combining multiple interdependent prediction problems-generally referred to as learning in structured output spaces. In many such application domains, complexity is further managed by decomposing a complex prediction into a sequence of predictions where earlier predictions are used as input to later predictions-commonly referred to as a pipeline model. This work describes methods for extending existing margin-based active learning techniques to these two settings, thus increasing the scope of problems for which active learning can be applied. We empirically validate these proposed active learning techniques by reducing the annotated data requirements on multiple instances of synthetic data, a semantic role labeling task, and a named entity and relation extraction system.
AB - Margin-based active learning remains the most widely used active learning paradigm due to its simplicity and empirical successes. However, most works are limited to binary or multiclass prediction problems, thus restricting the applicability of these approaches to many complex prediction problems where active learning would be most useful. For example, machine learning techniques for natural language processing applications often require combining multiple interdependent prediction problems-generally referred to as learning in structured output spaces. In many such application domains, complexity is further managed by decomposing a complex prediction into a sequence of predictions where earlier predictions are used as input to later predictions-commonly referred to as a pipeline model. This work describes methods for extending existing margin-based active learning techniques to these two settings, thus increasing the scope of problems for which active learning can be applied. We empirically validate these proposed active learning techniques by reducing the annotated data requirements on multiple instances of synthetic data, a semantic role labeling task, and a named entity and relation extraction system.
KW - Active learning
KW - Pipeline models
KW - Structured output spaces
KW - Structured predictions
UR - http://www.scopus.com/inward/record.url?scp=79952312481&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952312481&partnerID=8YFLogxK
U2 - 10.1007/s13042-010-0003-y
DO - 10.1007/s13042-010-0003-y
M3 - Article
AN - SCOPUS:79952312481
SN - 1868-8071
VL - 1
SP - 3
EP - 25
JO - International Journal of Machine Learning and Cybernetics
JF - International Journal of Machine Learning and Cybernetics
IS - 1-4
ER -