TY - GEN
T1 - A generalized hidden Markov model with discriminative training for query spelling correction
AU - Li, Yanen
AU - Duan, Huizhong
AU - Zhai, Chengxiang
PY - 2012
Y1 - 2012
N2 - Query spelling correction is a crucial component of modern search engines. Existing methods in the literature for search query spelling correction have two major drawbacks. First, they are unable to handle certain important types of spelling errors, such as concatenation and splitting. Second, they cannot efficiently evaluate all the candidate corrections due to the complex form of their scoring functions, and a heuristic filtering step must be applied to select a working set of top-K most promising candidates for final scoring, leading to non-optimal predictions. In this paper we address both limitations and propose a novel generalized Hidden Markov Model with discriminative training that can not only handle all the major types of spelling errors, including splitting and concatenation errors, in a single unified framework, but also efficiently evaluate all the candidate corrections to ensure the finding of a globally optimal correction. Experiments on two query spelling correction datasets demonstrate that the proposed generalized HMM is effective for correcting multiple types of spelling errors. The results also show that it significantly outperforms the current approach for generating top-K candidate corrections, making it a better first-stage filter to enable any other complex spelling correction algorithm to have access to a better working set of candidate corrections as well as to cover splitting and concatenation errors, which no existing method in academic literature can correct.
AB - Query spelling correction is a crucial component of modern search engines. Existing methods in the literature for search query spelling correction have two major drawbacks. First, they are unable to handle certain important types of spelling errors, such as concatenation and splitting. Second, they cannot efficiently evaluate all the candidate corrections due to the complex form of their scoring functions, and a heuristic filtering step must be applied to select a working set of top-K most promising candidates for final scoring, leading to non-optimal predictions. In this paper we address both limitations and propose a novel generalized Hidden Markov Model with discriminative training that can not only handle all the major types of spelling errors, including splitting and concatenation errors, in a single unified framework, but also efficiently evaluate all the candidate corrections to ensure the finding of a globally optimal correction. Experiments on two query spelling correction datasets demonstrate that the proposed generalized HMM is effective for correcting multiple types of spelling errors. The results also show that it significantly outperforms the current approach for generating top-K candidate corrections, making it a better first-stage filter to enable any other complex spelling correction algorithm to have access to a better working set of candidate corrections as well as to cover splitting and concatenation errors, which no existing method in academic literature can correct.
KW - discriminative training for HMMS
KW - generalized hidden Markov models
KW - query spelling correction
UR - http://www.scopus.com/inward/record.url?scp=84866620837&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866620837&partnerID=8YFLogxK
U2 - 10.1145/2348283.2348365
DO - 10.1145/2348283.2348365
M3 - Conference contribution
AN - SCOPUS:84866620837
SN - 9781450316583
T3 - SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 611
EP - 620
BT - SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval
T2 - 35th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012
Y2 - 12 August 2012 through 16 August 2012
ER -