TY - JOUR
T1 - An empirical study of gene synonym query expansion in biomedical information retrieval
AU - Lu, Yue
AU - Fang, Hui
AU - Zhai, Chengxiang
N1 - Funding Information:
Acknowledgments This material is based in part upon work supported by the National Science Foundation under award number 0425852 and work supported by NIH/NLM grant 1 R01 LM009153-01.
PY - 2009/2
Y1 - 2009/2
N2 - Due to the heavy use of gene synonyms in biomedical text, people have tried many query expansion techniques using synonyms in order to improve performance in biomedical information retrieval. However, mixed results have been reported. The main challenge is that it is not trivial to assign appropriate weights to the added gene synonyms in the expanded query; under-weighting of synonyms would not bring much benefit, while overweighting some unreliable synonyms can hurt performance significantly. So far, there has been no systematic evaluation of various synonym query expansion strategies for biomedical text. In this work, we propose two different strategies to extend a standard language modeling approach for gene synonym query expansion and conduct a systematic evaluation of these methods on all the available TREC biomedical text collections for ad hoc document retrieval. Our experiment results show that synonym expansion can significantly improve the retrieval accuracy. However, different query types require different synonym expansion methods, and appropriate weighting of gene names and synonym terms is critical for improving performance.
AB - Due to the heavy use of gene synonyms in biomedical text, people have tried many query expansion techniques using synonyms in order to improve performance in biomedical information retrieval. However, mixed results have been reported. The main challenge is that it is not trivial to assign appropriate weights to the added gene synonyms in the expanded query; under-weighting of synonyms would not bring much benefit, while overweighting some unreliable synonyms can hurt performance significantly. So far, there has been no systematic evaluation of various synonym query expansion strategies for biomedical text. In this work, we propose two different strategies to extend a standard language modeling approach for gene synonym query expansion and conduct a systematic evaluation of these methods on all the available TREC biomedical text collections for ad hoc document retrieval. Our experiment results show that synonym expansion can significantly improve the retrieval accuracy. However, different query types require different synonym expansion methods, and appropriate weighting of gene names and synonym terms is critical for improving performance.
KW - Biomedical information retrieval
KW - Language modeling
KW - Synonym query expansion
UR - http://www.scopus.com/inward/record.url?scp=58149260392&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=58149260392&partnerID=8YFLogxK
U2 - 10.1007/s10791-008-9075-7
DO - 10.1007/s10791-008-9075-7
M3 - Article
AN - SCOPUS:58149260392
SN - 1386-4564
VL - 12
SP - 51
EP - 68
JO - Information Retrieval
JF - Information Retrieval
IS - 1
ER -