Predicting medical subject headings based on abstract similarity and citations to MEDLINE records

Adam K. Kehoe, Vetle Ingvald Torvik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe a classifier-enhanced nearest neighbor approach to assigning Medical Subject Headings (MeSH®) to unlabeled documents using a combination of abstract similarities and direct citations to labeled MEDLINE records. The approach frames the classification problem by decomposing it into sets of siblings in the MeSH hierarchy (e.g., training a classifier for predicting 'Heterocyclic Compounds, 2-Ring' vs. other 'Heterocyclic Compounds'). Preliminary experiments using a small but diverse set of MeSH terms shows the highest performance when using both abstracts and citations compared to each alone, and coupled with a non-naive classifier: 90+% precision and recall with 10fold cross-validation. NLM's Medical Text Indexer (MTI) tool achieves similar overall performance but varies more across the terms tested. For example, MTI performs better on 'Heterocyclic Compounds, 2-Ring', while our approach performs better on Alzheimer Disease and Neuroimaging. Our approach can be applied broadly to documents with abstracts that are similar to (or cite) MEDLINE abstracts, which would help linking and searching across bibliographic databases beyond MEDLINE.

Original languageEnglish (US)
Title of host publicationJCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages167-170
Number of pages4
Volume2016-September
ISBN (Electronic)9781450342292
DOIs
StatePublished - Sep 1 2016
Event16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016 - Newark, United States
Duration: Jun 19 2016Jun 23 2016

Other

Other16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016
CountryUnited States
CityNewark
Period6/19/166/23/16

Fingerprint

Classifiers
Neuroimaging
Experiments

Keywords

  • Controlled vocabularies
  • Curation of bibliographic databases
  • Machine Learning
  • Medical subject headings

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Kehoe, A. K., & Torvik, V. I. (2016). Predicting medical subject headings based on abstract similarity and citations to MEDLINE records. In JCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries (Vol. 2016-September, pp. 167-170). [7559580] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/2910896.2910920

Predicting medical subject headings based on abstract similarity and citations to MEDLINE records. / Kehoe, Adam K.; Torvik, Vetle Ingvald.

JCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries. Vol. 2016-September Institute of Electrical and Electronics Engineers Inc., 2016. p. 167-170 7559580.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kehoe, AK & Torvik, VI 2016, Predicting medical subject headings based on abstract similarity and citations to MEDLINE records. in JCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries. vol. 2016-September, 7559580, Institute of Electrical and Electronics Engineers Inc., pp. 167-170, 16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016, Newark, United States, 6/19/16. https://doi.org/10.1145/2910896.2910920
Kehoe AK, Torvik VI. Predicting medical subject headings based on abstract similarity and citations to MEDLINE records. In JCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries. Vol. 2016-September. Institute of Electrical and Electronics Engineers Inc. 2016. p. 167-170. 7559580 https://doi.org/10.1145/2910896.2910920
Kehoe, Adam K. ; Torvik, Vetle Ingvald. / Predicting medical subject headings based on abstract similarity and citations to MEDLINE records. JCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries. Vol. 2016-September Institute of Electrical and Electronics Engineers Inc., 2016. pp. 167-170
@inproceedings{f1db83bc76454af387aa000f9076b2bb,
title = "Predicting medical subject headings based on abstract similarity and citations to MEDLINE records",
abstract = "We describe a classifier-enhanced nearest neighbor approach to assigning Medical Subject Headings (MeSH{\circledR}) to unlabeled documents using a combination of abstract similarities and direct citations to labeled MEDLINE records. The approach frames the classification problem by decomposing it into sets of siblings in the MeSH hierarchy (e.g., training a classifier for predicting 'Heterocyclic Compounds, 2-Ring' vs. other 'Heterocyclic Compounds'). Preliminary experiments using a small but diverse set of MeSH terms shows the highest performance when using both abstracts and citations compared to each alone, and coupled with a non-naive classifier: 90+{\%} precision and recall with 10fold cross-validation. NLM's Medical Text Indexer (MTI) tool achieves similar overall performance but varies more across the terms tested. For example, MTI performs better on 'Heterocyclic Compounds, 2-Ring', while our approach performs better on Alzheimer Disease and Neuroimaging. Our approach can be applied broadly to documents with abstracts that are similar to (or cite) MEDLINE abstracts, which would help linking and searching across bibliographic databases beyond MEDLINE.",
keywords = "Controlled vocabularies, Curation of bibliographic databases, Machine Learning, Medical subject headings",
author = "Kehoe, {Adam K.} and Torvik, {Vetle Ingvald}",
year = "2016",
month = "9",
day = "1",
doi = "10.1145/2910896.2910920",
language = "English (US)",
volume = "2016-September",
pages = "167--170",
booktitle = "JCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Predicting medical subject headings based on abstract similarity and citations to MEDLINE records

AU - Kehoe, Adam K.

AU - Torvik, Vetle Ingvald

PY - 2016/9/1

Y1 - 2016/9/1

N2 - We describe a classifier-enhanced nearest neighbor approach to assigning Medical Subject Headings (MeSH®) to unlabeled documents using a combination of abstract similarities and direct citations to labeled MEDLINE records. The approach frames the classification problem by decomposing it into sets of siblings in the MeSH hierarchy (e.g., training a classifier for predicting 'Heterocyclic Compounds, 2-Ring' vs. other 'Heterocyclic Compounds'). Preliminary experiments using a small but diverse set of MeSH terms shows the highest performance when using both abstracts and citations compared to each alone, and coupled with a non-naive classifier: 90+% precision and recall with 10fold cross-validation. NLM's Medical Text Indexer (MTI) tool achieves similar overall performance but varies more across the terms tested. For example, MTI performs better on 'Heterocyclic Compounds, 2-Ring', while our approach performs better on Alzheimer Disease and Neuroimaging. Our approach can be applied broadly to documents with abstracts that are similar to (or cite) MEDLINE abstracts, which would help linking and searching across bibliographic databases beyond MEDLINE.

AB - We describe a classifier-enhanced nearest neighbor approach to assigning Medical Subject Headings (MeSH®) to unlabeled documents using a combination of abstract similarities and direct citations to labeled MEDLINE records. The approach frames the classification problem by decomposing it into sets of siblings in the MeSH hierarchy (e.g., training a classifier for predicting 'Heterocyclic Compounds, 2-Ring' vs. other 'Heterocyclic Compounds'). Preliminary experiments using a small but diverse set of MeSH terms shows the highest performance when using both abstracts and citations compared to each alone, and coupled with a non-naive classifier: 90+% precision and recall with 10fold cross-validation. NLM's Medical Text Indexer (MTI) tool achieves similar overall performance but varies more across the terms tested. For example, MTI performs better on 'Heterocyclic Compounds, 2-Ring', while our approach performs better on Alzheimer Disease and Neuroimaging. Our approach can be applied broadly to documents with abstracts that are similar to (or cite) MEDLINE abstracts, which would help linking and searching across bibliographic databases beyond MEDLINE.

KW - Controlled vocabularies

KW - Curation of bibliographic databases

KW - Machine Learning

KW - Medical subject headings

UR - http://www.scopus.com/inward/record.url?scp=84989965524&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84989965524&partnerID=8YFLogxK

U2 - 10.1145/2910896.2910920

DO - 10.1145/2910896.2910920

M3 - Conference contribution

AN - SCOPUS:84989965524

VL - 2016-September

SP - 167

EP - 170

BT - JCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries

PB - Institute of Electrical and Electronics Engineers Inc.

ER -