Abstract
We present a nearest neighbor approach to ethnicity classification. Given an author name, all of its instances (or the most similar ones) in PubMed are identified and coupled with their respective country of affiliation, and then probabilistically mapped to a set of 26 predefined ethnicities. The dominant ethnicity (or pair of ethnicities) is assigned as the class. The predictions are also used to upgrade Genni (Smith, Singh, and Torvik, 2013) to provide ethnicity-specific gender predictions for cases like Italian vs. English Andrea, Turkish vs. Korean Bora, Israeli vs. Nordic Eli, and Slavic vs. Japanese Renko. Ethnea and Genni 2.0 are available at http://abel.lis.illinois.edu
Original language | English (US) |
---|---|
Number of pages | 1 |
State | Published - Mar 2016 |
Event | International Symposium on Science of Science - Library of Congress, Washington DC, United States Duration: Mar 22 2016 → Mar 23 2016 |
Conference
Conference | International Symposium on Science of Science |
---|---|
Country/Territory | United States |
City | Washington DC |
Period | 3/22/16 → 3/23/16 |
Keywords
- bibliometrics
- ethnicity classification
- machine learning
Fingerprint
Dive into the research topics of 'Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database'. Together they form a unique fingerprint.Datasets
-
Genni + Ethnea for the Author-ity 2009 dataset
Torvik, V. I. (Creator), University of Illinois Urbana-Champaign, Apr 19 2018
DOI: 10.13012/B2IDB-9087546_V1
Dataset