A probabilistic similarity metric for Medline records: a model for author name disambiguation.

Vetle Ingvald Torvik, Marc Weeber, Don R. Swanson, Neil R. Smalheiser

Research output: Contribution to journalArticle

Abstract

We present a model for automatically generating training sets and estimating the probability that a pair of Medline records sharing a last and first name initial are authored by the same individual, based on shared title words, journal name, co-authors, medical subject headings, language, and affiliation, as well as distinctive features of the name itself (i.e., presence of middle initial, suffix, and prevalence in Medline).

Original languageEnglish (US)
Number of pages1
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
StatePublished - 2003
Externally publishedYes

Fingerprint

Names
Medical Subject Headings
Language
benzoylprop-ethyl

ASJC Scopus subject areas

  • Medicine(all)

Cite this

@article{043acff701c94f9480e5f96634e8e570,
title = "A probabilistic similarity metric for Medline records: a model for author name disambiguation.",
abstract = "We present a model for automatically generating training sets and estimating the probability that a pair of Medline records sharing a last and first name initial are authored by the same individual, based on shared title words, journal name, co-authors, medical subject headings, language, and affiliation, as well as distinctive features of the name itself (i.e., presence of middle initial, suffix, and prevalence in Medline).",
author = "Torvik, {Vetle Ingvald} and Marc Weeber and Swanson, {Don R.} and Smalheiser, {Neil R.}",
year = "2003",
language = "English (US)",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - A probabilistic similarity metric for Medline records

T2 - a model for author name disambiguation.

AU - Torvik, Vetle Ingvald

AU - Weeber, Marc

AU - Swanson, Don R.

AU - Smalheiser, Neil R.

PY - 2003

Y1 - 2003

N2 - We present a model for automatically generating training sets and estimating the probability that a pair of Medline records sharing a last and first name initial are authored by the same individual, based on shared title words, journal name, co-authors, medical subject headings, language, and affiliation, as well as distinctive features of the name itself (i.e., presence of middle initial, suffix, and prevalence in Medline).

AB - We present a model for automatically generating training sets and estimating the probability that a pair of Medline records sharing a last and first name initial are authored by the same individual, based on shared title words, journal name, co-authors, medical subject headings, language, and affiliation, as well as distinctive features of the name itself (i.e., presence of middle initial, suffix, and prevalence in Medline).

UR - http://www.scopus.com/inward/record.url?scp=16544383397&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=16544383397&partnerID=8YFLogxK

M3 - Article

C2 - 14728536

AN - SCOPUS:16544383397

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -