Disambiguation and co-authorship networks of the U.S. patent inventor database (1975-2010)

Guan Cheng Li, Ronald Lai, Alexander D'Amour, David M. Doolin, Ye Sun, Vetle I. Torvik, Amy Z. Yu, Fleming Lee

Research output: Contribution to journalArticle

Abstract

Research into invention, innovation policy, and technology strategy can greatly benefit from an accurate understanding of inventor careers. The United States Patent and Trademark Office does not provide unique inventor identifiers, however, making large-scale studies challenging. Many scholars of innovation have implemented ad-hoc disambiguation methods based on string similarity thresholds and string comparison matching; such methods have been shown to be vulnerable to a number of problems that can adversely affect research results. The authors address this issue contributing (1) an application of the Author-ity disambiguation approach (Torvik et al., 2005; Torvik and Smalheiser, 2009) to the US utility patent database, (2) a new iterative blocking scheme that expands the match space of this algorithm while maintaining scalability, (3) a public posting of the algorithm and code, and (4) a public posting of the results of the algorithm in the form of a database of inventors and their associated patents. The paper provides an overview of the disambiguation method, assesses its accuracy, and calculates network measures based on co-authorship and collaboration variables. It illustrates the potential for large-scale innovation studies across time and space with visualizations of inventor mobility across the United States. The complete input and results data from the original disambiguation are available at (http://dvn.iq.harvard.edu/dvn/dv/patent); revised data described here are at (http://funglab.berkeley.edu/pub/disamb-no- postpolishing.csv); original and revised code is available at (https://github.com/funginstitute/disambiguator); visualizations of inventor mobility are at (http://funglab.berkeley.edu/mobility/).

Original languageEnglish (US)
Pages (from-to)941-955
Number of pages15
JournalResearch Policy
Volume43
Issue number6
DOIs
StatePublished - Jul 2014

Fingerprint

Innovation
Visualization
Time and motion study
Trademarks
Patents and inventions
Scalability
Co-authorship
Inventor
Data base
Patents

Keywords

  • Careers
  • Disambiguation
  • Inventors
  • Networks
  • Patents

ASJC Scopus subject areas

  • Strategy and Management
  • Management Science and Operations Research
  • Management of Technology and Innovation

Cite this

Disambiguation and co-authorship networks of the U.S. patent inventor database (1975-2010). / Li, Guan Cheng; Lai, Ronald; D'Amour, Alexander; Doolin, David M.; Sun, Ye; Torvik, Vetle I.; Yu, Amy Z.; Lee, Fleming.

In: Research Policy, Vol. 43, No. 6, 07.2014, p. 941-955.

Research output: Contribution to journalArticle

Li, Guan Cheng ; Lai, Ronald ; D'Amour, Alexander ; Doolin, David M. ; Sun, Ye ; Torvik, Vetle I. ; Yu, Amy Z. ; Lee, Fleming. / Disambiguation and co-authorship networks of the U.S. patent inventor database (1975-2010). In: Research Policy. 2014 ; Vol. 43, No. 6. pp. 941-955.
@article{1fcbb17fab764b0496e0286f1707feab,
title = "Disambiguation and co-authorship networks of the U.S. patent inventor database (1975-2010)",
abstract = "Research into invention, innovation policy, and technology strategy can greatly benefit from an accurate understanding of inventor careers. The United States Patent and Trademark Office does not provide unique inventor identifiers, however, making large-scale studies challenging. Many scholars of innovation have implemented ad-hoc disambiguation methods based on string similarity thresholds and string comparison matching; such methods have been shown to be vulnerable to a number of problems that can adversely affect research results. The authors address this issue contributing (1) an application of the Author-ity disambiguation approach (Torvik et al., 2005; Torvik and Smalheiser, 2009) to the US utility patent database, (2) a new iterative blocking scheme that expands the match space of this algorithm while maintaining scalability, (3) a public posting of the algorithm and code, and (4) a public posting of the results of the algorithm in the form of a database of inventors and their associated patents. The paper provides an overview of the disambiguation method, assesses its accuracy, and calculates network measures based on co-authorship and collaboration variables. It illustrates the potential for large-scale innovation studies across time and space with visualizations of inventor mobility across the United States. The complete input and results data from the original disambiguation are available at (http://dvn.iq.harvard.edu/dvn/dv/patent); revised data described here are at (http://funglab.berkeley.edu/pub/disamb-no- postpolishing.csv); original and revised code is available at (https://github.com/funginstitute/disambiguator); visualizations of inventor mobility are at (http://funglab.berkeley.edu/mobility/).",
keywords = "Careers, Disambiguation, Inventors, Networks, Patents",
author = "Li, {Guan Cheng} and Ronald Lai and Alexander D'Amour and Doolin, {David M.} and Ye Sun and Torvik, {Vetle I.} and Yu, {Amy Z.} and Fleming Lee",
year = "2014",
month = "7",
doi = "10.1016/j.respol.2014.01.012",
language = "English (US)",
volume = "43",
pages = "941--955",
journal = "Research Policy",
issn = "0048-7333",
publisher = "Elsevier",
number = "6",

}

TY - JOUR

T1 - Disambiguation and co-authorship networks of the U.S. patent inventor database (1975-2010)

AU - Li, Guan Cheng

AU - Lai, Ronald

AU - D'Amour, Alexander

AU - Doolin, David M.

AU - Sun, Ye

AU - Torvik, Vetle I.

AU - Yu, Amy Z.

AU - Lee, Fleming

PY - 2014/7

Y1 - 2014/7

N2 - Research into invention, innovation policy, and technology strategy can greatly benefit from an accurate understanding of inventor careers. The United States Patent and Trademark Office does not provide unique inventor identifiers, however, making large-scale studies challenging. Many scholars of innovation have implemented ad-hoc disambiguation methods based on string similarity thresholds and string comparison matching; such methods have been shown to be vulnerable to a number of problems that can adversely affect research results. The authors address this issue contributing (1) an application of the Author-ity disambiguation approach (Torvik et al., 2005; Torvik and Smalheiser, 2009) to the US utility patent database, (2) a new iterative blocking scheme that expands the match space of this algorithm while maintaining scalability, (3) a public posting of the algorithm and code, and (4) a public posting of the results of the algorithm in the form of a database of inventors and their associated patents. The paper provides an overview of the disambiguation method, assesses its accuracy, and calculates network measures based on co-authorship and collaboration variables. It illustrates the potential for large-scale innovation studies across time and space with visualizations of inventor mobility across the United States. The complete input and results data from the original disambiguation are available at (http://dvn.iq.harvard.edu/dvn/dv/patent); revised data described here are at (http://funglab.berkeley.edu/pub/disamb-no- postpolishing.csv); original and revised code is available at (https://github.com/funginstitute/disambiguator); visualizations of inventor mobility are at (http://funglab.berkeley.edu/mobility/).

AB - Research into invention, innovation policy, and technology strategy can greatly benefit from an accurate understanding of inventor careers. The United States Patent and Trademark Office does not provide unique inventor identifiers, however, making large-scale studies challenging. Many scholars of innovation have implemented ad-hoc disambiguation methods based on string similarity thresholds and string comparison matching; such methods have been shown to be vulnerable to a number of problems that can adversely affect research results. The authors address this issue contributing (1) an application of the Author-ity disambiguation approach (Torvik et al., 2005; Torvik and Smalheiser, 2009) to the US utility patent database, (2) a new iterative blocking scheme that expands the match space of this algorithm while maintaining scalability, (3) a public posting of the algorithm and code, and (4) a public posting of the results of the algorithm in the form of a database of inventors and their associated patents. The paper provides an overview of the disambiguation method, assesses its accuracy, and calculates network measures based on co-authorship and collaboration variables. It illustrates the potential for large-scale innovation studies across time and space with visualizations of inventor mobility across the United States. The complete input and results data from the original disambiguation are available at (http://dvn.iq.harvard.edu/dvn/dv/patent); revised data described here are at (http://funglab.berkeley.edu/pub/disamb-no- postpolishing.csv); original and revised code is available at (https://github.com/funginstitute/disambiguator); visualizations of inventor mobility are at (http://funglab.berkeley.edu/mobility/).

KW - Careers

KW - Disambiguation

KW - Inventors

KW - Networks

KW - Patents

UR - http://www.scopus.com/inward/record.url?scp=84899961214&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899961214&partnerID=8YFLogxK

U2 - 10.1016/j.respol.2014.01.012

DO - 10.1016/j.respol.2014.01.012

M3 - Article

AN - SCOPUS:84899961214

VL - 43

SP - 941

EP - 955

JO - Research Policy

JF - Research Policy

SN - 0048-7333

IS - 6

ER -