Adapting text instead of the model: An open domain approach

Gourab Kundu, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Natural language systems trained on labeled data from one domain do not perform well on other domains. Most adaptation algorithms proposed in the literature train a new model for the new domain using unlabeled data. However, it is time consuming to retrain big models or pipeline systems. Moreover, the domain of a new target sentence may not be known, and one may not have significant amount of unlabeled data for every new domain. To pursue the goal of an Open Domain NLP (train once, test anywhere), we propose ADUT (ADaptation Using label-preserving Transformation), an approach that avoids the need for retraining and does not require knowledge of the new domain, or any data from it. Our approach applies simple label-preserving transformations to the target text so that the transformed text is more similar to the training domain; it then applies the existing model on the transformed sentences and combines the predictions to produce the desired prediction on the target text. We instantiate ADUT for the case of Semantic Role Labeling (SRL) and show that it compares favorably with approaches that retrain their model on the target domain. Specifically, this "on the fly" adaptation approach yields 13% error reduction for a single parse system when adapting from the news wire text to fiction.

Original languageEnglish (US)
Title of host publicationCoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
Pages229-237
Number of pages9
StatePublished - 2011
Externally publishedYes
Event15th Conference on Computational Natural Language Learning, CoNLL 2011 - Portland, OR, United States
Duration: Jun 23 2011Jun 24 2011

Publication series

NameCoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference

Other

Other15th Conference on Computational Natural Language Learning, CoNLL 2011
Country/TerritoryUnited States
CityPortland, OR
Period6/23/116/24/11

ASJC Scopus subject areas

  • Artificial Intelligence
  • Linguistics and Language
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Adapting text instead of the model: An open domain approach'. Together they form a unique fingerprint.

Cite this