An Ensemble Method for Spelling Correction in Consumer Health Questions

Halil Kilicoglu, Marcelo Fiszman, Kirk Roberts, Dina Demner-Fushman

Research output: Contribution to journalArticlepeer-review

Abstract

Orthographic and grammatical errors are a common feature of informal texts written by lay people. Health-related questions asked by consumers are a case in point. Automatic interpretation of consumer health questions is hampered by such errors. In this paper, we propose a method that combines techniques based on edit distance and frequency counts with a contextual similarity-based method for detecting and correcting orthographic errors, including misspellings, word breaks, and punctuation errors. We evaluate our method on a set of spell-corrected questions extracted from the NLM collection of consumer health questions. Our method achieves a F1 score of 0.61, compared to an informed baseline of 0.29, achieved using ESpell, a spelling correction system developed for biomedical queries. Our results show that orthographic similarity is most relevant in spelling error correction in consumer health questions and that frequency and contextual information are complementary to orthographic features.

Original languageEnglish (US)
Pages (from-to)727-736
Number of pages10
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
Volume2015
StatePublished - Jan 1 2015
Externally publishedYes

ASJC Scopus subject areas

  • Medicine(all)

Fingerprint Dive into the research topics of 'An Ensemble Method for Spelling Correction in Consumer Health Questions'. Together they form a unique fingerprint.

Cite this