Adapting text instead of the model: An open domain approach

Gourab Kundu, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Natural language systems trained on labeled data from one domain do not perform well on other domains. Most adaptation algorithms proposed in the literature train a new model for the new domain using unlabeled data. However, it is time consuming to retrain big models or pipeline systems. Moreover, the domain of a new target sentence may not be known, and one may not have significant amount of unlabeled data for every new domain. To pursue the goal of an Open Domain NLP (train once, test anywhere), we propose ADUT (ADaptation Using label-preserving Transformation), an approach that avoids the need for retraining and does not require knowledge of the new domain, or any data from it. Our approach applies simple label-preserving transformations to the target text so that the transformed text is more similar to the training domain; it then applies the existing model on the transformed sentences and combines the predictions to produce the desired prediction on the target text. We instantiate ADUT for the case of Semantic Role Labeling (SRL) and show that it compares favorably with approaches that retrain their model on the target domain. Specifically, this "on the fly" adaptation approach yields 13% error reduction for a single parse system when adapting from the news wire text to fiction.

Original languageEnglish (US)
Title of host publicationCoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
Pages229-237
Number of pages9
StatePublished - Dec 1 2011
Event15th Conference on Computational Natural Language Learning, CoNLL 2011 - Portland, OR, United States
Duration: Jun 23 2011Jun 24 2011

Publication series

NameCoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference

Other

Other15th Conference on Computational Natural Language Learning, CoNLL 2011
CountryUnited States
CityPortland, OR
Period6/23/116/24/11

Fingerprint

Labels
retraining
Labeling
news
Pipelines
Semantics
semantics
Wire
language
literature
time

ASJC Scopus subject areas

  • Artificial Intelligence
  • Linguistics and Language
  • Human-Computer Interaction

Cite this

Kundu, G., & Roth, D. (2011). Adapting text instead of the model: An open domain approach. In CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference (pp. 229-237). (CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference).

Adapting text instead of the model : An open domain approach. / Kundu, Gourab; Roth, Dan.

CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference. 2011. p. 229-237 (CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kundu, G & Roth, D 2011, Adapting text instead of the model: An open domain approach. in CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference. CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference, pp. 229-237, 15th Conference on Computational Natural Language Learning, CoNLL 2011, Portland, OR, United States, 6/23/11.
Kundu G, Roth D. Adapting text instead of the model: An open domain approach. In CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference. 2011. p. 229-237. (CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference).
Kundu, Gourab ; Roth, Dan. / Adapting text instead of the model : An open domain approach. CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference. 2011. pp. 229-237 (CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference).
@inproceedings{df2e012d938d472bb63a2bf75263a758,
title = "Adapting text instead of the model: An open domain approach",
abstract = "Natural language systems trained on labeled data from one domain do not perform well on other domains. Most adaptation algorithms proposed in the literature train a new model for the new domain using unlabeled data. However, it is time consuming to retrain big models or pipeline systems. Moreover, the domain of a new target sentence may not be known, and one may not have significant amount of unlabeled data for every new domain. To pursue the goal of an Open Domain NLP (train once, test anywhere), we propose ADUT (ADaptation Using label-preserving Transformation), an approach that avoids the need for retraining and does not require knowledge of the new domain, or any data from it. Our approach applies simple label-preserving transformations to the target text so that the transformed text is more similar to the training domain; it then applies the existing model on the transformed sentences and combines the predictions to produce the desired prediction on the target text. We instantiate ADUT for the case of Semantic Role Labeling (SRL) and show that it compares favorably with approaches that retrain their model on the target domain. Specifically, this {"}on the fly{"} adaptation approach yields 13{\%} error reduction for a single parse system when adapting from the news wire text to fiction.",
author = "Gourab Kundu and Dan Roth",
year = "2011",
month = "12",
day = "1",
language = "English (US)",
isbn = "9781932432923",
series = "CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference",
pages = "229--237",
booktitle = "CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference",

}

TY - GEN

T1 - Adapting text instead of the model

T2 - An open domain approach

AU - Kundu, Gourab

AU - Roth, Dan

PY - 2011/12/1

Y1 - 2011/12/1

N2 - Natural language systems trained on labeled data from one domain do not perform well on other domains. Most adaptation algorithms proposed in the literature train a new model for the new domain using unlabeled data. However, it is time consuming to retrain big models or pipeline systems. Moreover, the domain of a new target sentence may not be known, and one may not have significant amount of unlabeled data for every new domain. To pursue the goal of an Open Domain NLP (train once, test anywhere), we propose ADUT (ADaptation Using label-preserving Transformation), an approach that avoids the need for retraining and does not require knowledge of the new domain, or any data from it. Our approach applies simple label-preserving transformations to the target text so that the transformed text is more similar to the training domain; it then applies the existing model on the transformed sentences and combines the predictions to produce the desired prediction on the target text. We instantiate ADUT for the case of Semantic Role Labeling (SRL) and show that it compares favorably with approaches that retrain their model on the target domain. Specifically, this "on the fly" adaptation approach yields 13% error reduction for a single parse system when adapting from the news wire text to fiction.

AB - Natural language systems trained on labeled data from one domain do not perform well on other domains. Most adaptation algorithms proposed in the literature train a new model for the new domain using unlabeled data. However, it is time consuming to retrain big models or pipeline systems. Moreover, the domain of a new target sentence may not be known, and one may not have significant amount of unlabeled data for every new domain. To pursue the goal of an Open Domain NLP (train once, test anywhere), we propose ADUT (ADaptation Using label-preserving Transformation), an approach that avoids the need for retraining and does not require knowledge of the new domain, or any data from it. Our approach applies simple label-preserving transformations to the target text so that the transformed text is more similar to the training domain; it then applies the existing model on the transformed sentences and combines the predictions to produce the desired prediction on the target text. We instantiate ADUT for the case of Semantic Role Labeling (SRL) and show that it compares favorably with approaches that retrain their model on the target domain. Specifically, this "on the fly" adaptation approach yields 13% error reduction for a single parse system when adapting from the news wire text to fiction.

UR - http://www.scopus.com/inward/record.url?scp=84862296344&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862296344&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84862296344

SN - 9781932432923

T3 - CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference

SP - 229

EP - 237

BT - CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference

ER -