TY - GEN
T1 - Automatically generating gene summaries from biomedical literature
AU - Ling, Xu
AU - Jiang, Jing
AU - He, Xin
AU - Mei, Qiaozhu
AU - Zhai, Chengxiang
AU - Schatz, Bruce
PY - 2006
Y1 - 2006
N2 - Biologists often need to find information about genes whose function is not described in the genome databases. Currently they must try to search disparate biomedical literature to locate relevant articles, and spend considerable efforts reading the retrieved articles in order to locate the most relevant knowledge about the gene. We describe our software, the first that automatically generates gene summaries from biomedical literature. We present a two-stage summarization method, which involves first retrieving relevant articles and then extracting the most informative sentences from the retrieved articles to generate a structured gene summary. The generated summary explicitly covers multiple aspects of a gene, such as the sequence information, mutant phenotypes, and molecular interaction with other genes. We propose several heuristic approaches to improve the accuracy in both stages. The proposed methods are evaluated using 10 randomly chosen genes from FlyBase and a subset of Medline abstracts about Drosophila. The results show that the precision of the top selected sentences in the 6 aspects is typically about 50-70%, and the generated summaries are quite informative, indicating that our approaches are effective in automatically summarizing literature information about genes. The generated summaries not only are directly useful to biologists but also serve as useful entry points to enable them to quickly digest the retrieved literature articles.
AB - Biologists often need to find information about genes whose function is not described in the genome databases. Currently they must try to search disparate biomedical literature to locate relevant articles, and spend considerable efforts reading the retrieved articles in order to locate the most relevant knowledge about the gene. We describe our software, the first that automatically generates gene summaries from biomedical literature. We present a two-stage summarization method, which involves first retrieving relevant articles and then extracting the most informative sentences from the retrieved articles to generate a structured gene summary. The generated summary explicitly covers multiple aspects of a gene, such as the sequence information, mutant phenotypes, and molecular interaction with other genes. We propose several heuristic approaches to improve the accuracy in both stages. The proposed methods are evaluated using 10 randomly chosen genes from FlyBase and a subset of Medline abstracts about Drosophila. The results show that the precision of the top selected sentences in the 6 aspects is typically about 50-70%, and the generated summaries are quite informative, indicating that our approaches are effective in automatically summarizing literature information about genes. The generated summaries not only are directly useful to biologists but also serve as useful entry points to enable them to quickly digest the retrieved literature articles.
UR - http://www.scopus.com/inward/record.url?scp=33749574999&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749574999&partnerID=8YFLogxK
M3 - Conference contribution
C2 - 17094226
AN - SCOPUS:33749574999
SN - 9812564632
SN - 9789812564634
T3 - Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006
SP - 40
EP - 51
BT - Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006
T2 - 11th Pacific Symposium on Biocomputing 2006, PSB 2006
Y2 - 3 January 2006 through 7 January 2006
ER -