### Abstract

Entity resolution is an important data association task when fusing information from multiple sources. Oftentimes the information arrives continuously and the entity resolution algorithm needs to efficiently update its solution upon receiving new information. In this work, we introduce an incremental entity resolution algorithm based on a graph partitioning formulation. The developed algorithm is able to handle both incrementally arriving entity references, as well as incrementally arriving information which changes the pairwise similarity scores between the references. New information is handled in a way that allows the algorithm to reconsider past decisions when contradicting information arrives. Because the graph partitioning formulation used is NP-Hard, a heuristic algorithm is developed to produce good solutions, which is also compatible with a blocking technique to limit the number of required comparisons. The algorithm is tested on a variety of datasets (randomly generated and real) and it is shown that allowing the algorithm to consider revised scores and revisit prior decisions offers a substantial improvement to accuracy (approximately 30–40% better F-Score on a natural language dataset), compared to other greedy heuristics on the same set of coefficients. It is also shown that, on a test set with 100 references, the incremental algorithm is up to an order of magnitude faster than a batch algorithm approach that re-solves the entire problem.

Language | English (US) |
---|---|

Pages | 171-183 |

Number of pages | 13 |

Journal | Information Fusion |

Volume | 46 |

DOIs | |

State | Published - Mar 1 2019 |

### Fingerprint

### Keywords

- Data association
- Entity resolution
- Graph partitioning
- Incremental algorithm

### ASJC Scopus subject areas

- Software
- Signal Processing
- Information Systems
- Hardware and Architecture

### Cite this

*Information Fusion*,

*46*, 171-183. https://doi.org/10.1016/j.inffus.2018.06.001

**An incremental graph-partitioning algorithm for entity resolution.** / Tauer, Gregory; Date, Ketan; Nagi, Rakesh; Sudit, Moises.

Research output: Contribution to journal › Article

*Information Fusion*, vol. 46, pp. 171-183. https://doi.org/10.1016/j.inffus.2018.06.001

}

TY - JOUR

T1 - An incremental graph-partitioning algorithm for entity resolution

AU - Tauer, Gregory

AU - Date, Ketan

AU - Nagi, Rakesh

AU - Sudit, Moises

PY - 2019/3/1

Y1 - 2019/3/1

N2 - Entity resolution is an important data association task when fusing information from multiple sources. Oftentimes the information arrives continuously and the entity resolution algorithm needs to efficiently update its solution upon receiving new information. In this work, we introduce an incremental entity resolution algorithm based on a graph partitioning formulation. The developed algorithm is able to handle both incrementally arriving entity references, as well as incrementally arriving information which changes the pairwise similarity scores between the references. New information is handled in a way that allows the algorithm to reconsider past decisions when contradicting information arrives. Because the graph partitioning formulation used is NP-Hard, a heuristic algorithm is developed to produce good solutions, which is also compatible with a blocking technique to limit the number of required comparisons. The algorithm is tested on a variety of datasets (randomly generated and real) and it is shown that allowing the algorithm to consider revised scores and revisit prior decisions offers a substantial improvement to accuracy (approximately 30–40% better F-Score on a natural language dataset), compared to other greedy heuristics on the same set of coefficients. It is also shown that, on a test set with 100 references, the incremental algorithm is up to an order of magnitude faster than a batch algorithm approach that re-solves the entire problem.

AB - Entity resolution is an important data association task when fusing information from multiple sources. Oftentimes the information arrives continuously and the entity resolution algorithm needs to efficiently update its solution upon receiving new information. In this work, we introduce an incremental entity resolution algorithm based on a graph partitioning formulation. The developed algorithm is able to handle both incrementally arriving entity references, as well as incrementally arriving information which changes the pairwise similarity scores between the references. New information is handled in a way that allows the algorithm to reconsider past decisions when contradicting information arrives. Because the graph partitioning formulation used is NP-Hard, a heuristic algorithm is developed to produce good solutions, which is also compatible with a blocking technique to limit the number of required comparisons. The algorithm is tested on a variety of datasets (randomly generated and real) and it is shown that allowing the algorithm to consider revised scores and revisit prior decisions offers a substantial improvement to accuracy (approximately 30–40% better F-Score on a natural language dataset), compared to other greedy heuristics on the same set of coefficients. It is also shown that, on a test set with 100 references, the incremental algorithm is up to an order of magnitude faster than a batch algorithm approach that re-solves the entire problem.

KW - Data association

KW - Entity resolution

KW - Graph partitioning

KW - Incremental algorithm

UR - http://www.scopus.com/inward/record.url?scp=85049037223&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049037223&partnerID=8YFLogxK

U2 - 10.1016/j.inffus.2018.06.001

DO - 10.1016/j.inffus.2018.06.001

M3 - Article

VL - 46

SP - 171

EP - 183

JO - Information Fusion

T2 - Information Fusion

JF - Information Fusion

SN - 1566-2535

ER -