Why name ambiguity resolution matters for scholarly big data research

Jinseok Kim, Jana Diesner, Heejun Kim, Amirhossein Aleyasen, Hwan Min Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper illustrates how data pre-processing choices about author name disambiguation can affect research findings about scholarly networks and hypotheses about underlying social mechanisms. We have analyzed three big scholarly datasets that were disambiguated algorithmically and via two common initial-based disambiguation methods; namely first-initial and all-initials disambiguation. The comparison of resulting bibliometric and network properties revealed that initial-disambiguation bears the prevalent risks of incorrectly merging author identities, underestimating the number of unique authors and inflating the average productivity and number of collaborators per author. The gaps between outcomes of name ambiguity resolution methods range from -4.23% to -87.36% per dataset for the number of unique authors, from 3.75% to 691.20% for average productivity, and from 5.06% to 285.28% for degree centrality for initial based methods compared to algorithmic disambiguation. This calls for special attention to data pre-processing choices in scholarly big data research.

Original languageEnglish (US)
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
EditorsWo Chang, Jun Huan, Nick Cercone, Saumyadipta Pyne, Vasant Honavar, Jimmy Lin, Xiaohua Tony Hu, Charu Aggarwal, Bamshad Mobasher, Jian Pei, Raghunath Nambiar
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9781479956654
DOIs
StatePublished - Jan 7 2015
Event2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington, United States
Duration: Oct 27 2014Oct 30 2014

Publication series

NameProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Other

Other2nd IEEE International Conference on Big Data, IEEE Big Data 2014
Country/TerritoryUnited States
CityWashington
Period10/27/1410/30/14

Keywords

  • Bibliometrics
  • Collaboration
  • Disambiguation
  • Network analysis

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems

Fingerprint

Dive into the research topics of 'Why name ambiguity resolution matters for scholarly big data research'. Together they form a unique fingerprint.

Cite this