An incremental graph-partitioning algorithm for entity resolution

Gregory Tauer, Ketan Date, Rakesh Nagi, Moises Sudit

Research output: Contribution to journalArticlepeer-review

Abstract

Entity resolution is an important data association task when fusing information from multiple sources. Oftentimes the information arrives continuously and the entity resolution algorithm needs to efficiently update its solution upon receiving new information. In this work, we introduce an incremental entity resolution algorithm based on a graph partitioning formulation. The developed algorithm is able to handle both incrementally arriving entity references, as well as incrementally arriving information which changes the pairwise similarity scores between the references. New information is handled in a way that allows the algorithm to reconsider past decisions when contradicting information arrives. Because the graph partitioning formulation used is NP-Hard, a heuristic algorithm is developed to produce good solutions, which is also compatible with a blocking technique to limit the number of required comparisons. The algorithm is tested on a variety of datasets (randomly generated and real) and it is shown that allowing the algorithm to consider revised scores and revisit prior decisions offers a substantial improvement to accuracy (approximately 30–40% better F-Score on a natural language dataset), compared to other greedy heuristics on the same set of coefficients. It is also shown that, on a test set with 100 references, the incremental algorithm is up to an order of magnitude faster than a batch algorithm approach that re-solves the entire problem.

Original languageEnglish (US)
Pages (from-to)171-183
Number of pages13
JournalInformation Fusion
Volume46
DOIs
StatePublished - Mar 2019

Keywords

  • Data association
  • Entity resolution
  • Graph partitioning
  • Incremental algorithm

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems
  • Hardware and Architecture

Fingerprint Dive into the research topics of 'An incremental graph-partitioning algorithm for entity resolution'. Together they form a unique fingerprint.

Cite this