Predicting medical subject headings based on abstract similarity and citations to MEDLINE records

Adam K. Kehoe, Vetle I. Torvik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe a classifier-enhanced nearest neighbor approach to assigning Medical Subject Headings (MeSH®) to unlabeled documents using a combination of abstract similarities and direct citations to labeled MEDLINE records. The approach frames the classification problem by decomposing it into sets of siblings in the MeSH hierarchy (e.g., training a classifier for predicting 'Heterocyclic Compounds, 2-Ring' vs. other 'Heterocyclic Compounds'). Preliminary experiments using a small but diverse set of MeSH terms shows the highest performance when using both abstracts and citations compared to each alone, and coupled with a non-naive classifier: 90+% precision and recall with 10fold cross-validation. NLM's Medical Text Indexer (MTI) tool achieves similar overall performance but varies more across the terms tested. For example, MTI performs better on 'Heterocyclic Compounds, 2-Ring', while our approach performs better on Alzheimer Disease and Neuroimaging. Our approach can be applied broadly to documents with abstracts that are similar to (or cite) MEDLINE abstracts, which would help linking and searching across bibliographic databases beyond MEDLINE.

Original languageEnglish (US)
Title of host publicationJCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages167-170
Number of pages4
ISBN (Electronic)9781450342292
DOIs
StatePublished - Sep 1 2016
Event16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016 - Newark, United States
Duration: Jun 19 2016Jun 23 2016

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
Volume2016-September
ISSN (Print)1552-5996

Other

Other16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016
CountryUnited States
CityNewark
Period6/19/166/23/16

Keywords

  • Controlled vocabularies
  • Curation of bibliographic databases
  • Machine Learning
  • Medical subject headings

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Predicting medical subject headings based on abstract similarity and citations to MEDLINE records'. Together they form a unique fingerprint.

Cite this