Generating confusion sets for context-sensitive error correction

Alla Rozovskaya, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we consider the problem of generating candidate corrections for the task of correcting errors in text. We focus on the task of correcting errors in preposition usage made by non-native English speakers, using discriminative classifiers. The standard approach to the problem assumes that the set of candidate corrections for a preposition consists of all preposition choices participating in the task. We determine likely preposition confusions using an annotated corpus of non-native text and use this knowledge to produce smaller sets of candidates. We propose several methods of restricting candidate sets. These methods exclude candidate prepositions that are not observed as valid corrections in the annotated corpus and take into account the likelihood of each preposition confusion in the non-native text. We find that restricting candidates to those that are observed in the non-native data improves both the precision and the recall compared to the approach that views all prepositions as possible candidates. Furthermore, the approach that takes into account the likelihood of each preposition confusion is shown to be the most effective.

Original languageEnglish (US)
Title of host publicationEMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages961-970
Number of pages10
StatePublished - Dec 1 2010
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2010 - Cambridge, MA, United States
Duration: Oct 9 2010Oct 11 2010

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2010
CountryUnited States
CityCambridge, MA
Period10/9/1010/11/10

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint Dive into the research topics of 'Generating confusion sets for context-sensitive error correction'. Together they form a unique fingerprint.

  • Cite this

    Rozovskaya, A., & Roth, D. (2010). Generating confusion sets for context-sensitive error correction. In EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 961-970)