TY - JOUR
T1 - DeepMeSH
T2 - Deep semantic representation for improving large-scale MeSH indexing
AU - Peng, Shengwen
AU - You, Ronghui
AU - Wang, Hongning
AU - Zhai, Chengxiang
AU - Mamitsuka, Hiroshi
AU - Zhu, Shanfeng
N1 - Publisher Copyright:
© 2016 The Author 2016. Published by Oxford University Press.
PY - 2016/6/15
Y1 - 2016/6/15
N2 - Motivation: Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. Methods: We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation. Results: DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations.
AB - Motivation: Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. Methods: We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation. Results: DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations.
UR - http://www.scopus.com/inward/record.url?scp=84976502747&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84976502747&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btw294
DO - 10.1093/bioinformatics/btw294
M3 - Article
C2 - 27307646
AN - SCOPUS:84976502747
SN - 1367-4803
VL - 32
SP - i70-i79
JO - Bioinformatics
JF - Bioinformatics
IS - 12
ER -