TY - JOUR
T1 - An Ensemble Method for Spelling Correction in Consumer Health Questions
AU - Kilicoglu, Halil
AU - Fiszman, Marcelo
AU - Roberts, Kirk
AU - Demner-Fushman, Dina
PY - 2015
Y1 - 2015
N2 - Orthographic and grammatical errors are a common feature of informal texts written by lay people. Health-related questions asked by consumers are a case in point. Automatic interpretation of consumer health questions is hampered by such errors. In this paper, we propose a method that combines techniques based on edit distance and frequency counts with a contextual similarity-based method for detecting and correcting orthographic errors, including misspellings, word breaks, and punctuation errors. We evaluate our method on a set of spell-corrected questions extracted from the NLM collection of consumer health questions. Our method achieves a F1 score of 0.61, compared to an informed baseline of 0.29, achieved using ESpell, a spelling correction system developed for biomedical queries. Our results show that orthographic similarity is most relevant in spelling error correction in consumer health questions and that frequency and contextual information are complementary to orthographic features.
AB - Orthographic and grammatical errors are a common feature of informal texts written by lay people. Health-related questions asked by consumers are a case in point. Automatic interpretation of consumer health questions is hampered by such errors. In this paper, we propose a method that combines techniques based on edit distance and frequency counts with a contextual similarity-based method for detecting and correcting orthographic errors, including misspellings, word breaks, and punctuation errors. We evaluate our method on a set of spell-corrected questions extracted from the NLM collection of consumer health questions. Our method achieves a F1 score of 0.61, compared to an informed baseline of 0.29, achieved using ESpell, a spelling correction system developed for biomedical queries. Our results show that orthographic similarity is most relevant in spelling error correction in consumer health questions and that frequency and contextual information are complementary to orthographic features.
UR - https://www.scopus.com/pages/publications/84981241938
UR - https://www.scopus.com/pages/publications/84981241938#tab=citedBy
M3 - Article
C2 - 26958208
AN - SCOPUS:84981241938
SN - 1559-4076
VL - 2015
SP - 727
EP - 736
JO - AMIA Annual Symposium Proceedings
JF - AMIA Annual Symposium Proceedings
ER -