TY - GEN
T1 - Exploring the Efficiency of Batch Active Learning for Human-in-the-Loop Relation Extraction
AU - Lourentzou, Ismini
AU - Gruhl, Daniel
AU - Welch, Steve
N1 - Publisher Copyright:
© 2018 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC BY 4.0 License.
PY - 2018/4/23
Y1 - 2018/4/23
N2 - Domain-specific relation extraction requires training data for supervised learning models, and thus, significant labeling effort. Distant supervision is often leveraged for creating large annotated corpora however these methods require handling the inherent noise. On the other hand, active learning approaches can reduce the annotation cost by selecting the most beneficial examples to label in order to learn a good model. The choice of examples can be performed sequentially, i.e. select one example in each iteration, or in batches, i.e. select a set of examples in each iteration. The optimization of the batch size is a practical problem faced in every real-world application of active learning, however it is often treated as a parameter decided in advance. In this work, we study the trade-off between model performance, the number of requested labels in a batch and the time spent in each round for real-time, domain specific relation extraction. Our results show that the use of an appropriate batch size produces competitive performance, even compared to a fully sequential strategy, while reducing the training time dramatically.
AB - Domain-specific relation extraction requires training data for supervised learning models, and thus, significant labeling effort. Distant supervision is often leveraged for creating large annotated corpora however these methods require handling the inherent noise. On the other hand, active learning approaches can reduce the annotation cost by selecting the most beneficial examples to label in order to learn a good model. The choice of examples can be performed sequentially, i.e. select one example in each iteration, or in batches, i.e. select a set of examples in each iteration. The optimization of the batch size is a practical problem faced in every real-world application of active learning, however it is often treated as a parameter decided in advance. In this work, we study the trade-off between model performance, the number of requested labels in a batch and the time spent in each round for real-time, domain specific relation extraction. Our results show that the use of an appropriate batch size produces competitive performance, even compared to a fully sequential strategy, while reducing the training time dramatically.
KW - active learning
KW - batch mode active learning
KW - deep learning
KW - neural networks
KW - relation extraction
UR - https://www.scopus.com/pages/publications/85071333887
UR - https://www.scopus.com/pages/publications/85071333887#tab=citedBy
U2 - 10.1145/3184558.3191546
DO - 10.1145/3184558.3191546
M3 - Conference contribution
AN - SCOPUS:85071333887
T3 - The Web Conference 2018 - Companion of the World Wide Web Conference, WWW 2018
SP - 1131
EP - 1138
BT - The Web Conference 2018 - Companion of the World Wide Web Conference, WWW 2018
PB - Association for Computing Machinery
T2 - 27th International World Wide Web, WWW 2018
Y2 - 23 April 2018 through 27 April 2018
ER -