XER: An Explainable Model for Entity Resolution using an Efficient Solution for the Clique Partitioning Problem

Samhita Vadrevu, Wen Mei Hwu, Rakesh Nagi, Jinjun Xiong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a global, selfexplainable solution to solve a prominent NLP problem: Entity Resolution (ER). We formulate ER as a graph partitioning problem. Every mention of a real-world entity is represented by a node in the graph, and the pairwise similarity scores between the mentions are used to associate these nodes to exactly one clique, which represents a real-world entity in the ER domain. In this paper, we use Clique Partitioning Problem (CPP), which is an Integer Program (IP) to formulate ER as a graph partitioning problem and then highlight the explainable nature of this method. Since CPP is NP-Hard, we introduce an efficient solution procedure, the xER algorithm, to solve CPP as a combination of finding maximal cliques in the graph and then performing generalized set packing using a novel formulation. We discuss the advantages of using xER over the traditional methods and provide the computational experiments and results of applying this method to ER data sets.

Original languageEnglish (US)
Title of host publicationTrustNLP 2021 - 1st Workshop on Trustworthy Natural Language Processing, Proceedings of the Workshop
EditorsYada Pruksachatkun, Anil Ramakrishna, Kai-Wei Chang, Satyapriya Krishna, Jwala Dhamala, Tanaya Guha, Xiang Ren
PublisherAssociation for Computational Linguistics (ACL)
Pages34-44
Number of pages11
ISBN (Electronic)9781954085336
StatePublished - 2021
Event1st Workshop on Trustworthy Natural Language Processing, TrustNLP 2021 - Virtual, Online
Duration: Jun 10 2021 → …

Publication series

NameTrustNLP 2021 - 1st Workshop on Trustworthy Natural Language Processing, Proceedings of the Workshop

Conference

Conference1st Workshop on Trustworthy Natural Language Processing, TrustNLP 2021
CityVirtual, Online
Period6/10/21 → …

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Computational Theory and Mathematics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'XER: An Explainable Model for Entity Resolution using an Efficient Solution for the Clique Partitioning Problem'. Together they form a unique fingerprint.

Cite this