Improved similarity assessment and spectral clustering for unsupervised linking of data extracted from bridge inspection reports

Kaijian Liu, Nora El-Gohary

Research output: Contribution to journalArticlepeer-review

Abstract

Textual bridge inspection reports are important data sources for supporting data-driven bridge deterioration prediction and maintenance decision making. Information extraction methods are available to extract data/information from these reports to support data-driven analytics. However, directly using the extracted data/information in data analytics is still challenging because, even within the same report, there exist multiple data records that describe the same entity, which increases the dimensionality of the data and adversely affects the performance of the analytics. The first step to address this problem is to link the multiple records that describe the same entity and same type of instances (e.g., all cracks on a specific bridge deck), so that they can be subsequently fused into a single unified representation for dimensionality reduction without information loss. To address this need, this paper proposes a spectral clustering-based method for unsupervised data linking. The method includes: (1) a concept similarity assessment method, which allows for assessing concept similarity even when corpus or semantic information is not available for the application at hand; (2) a record similarity assessment method, which captures and uses similarity assessment dependencies to reduce the number of falsely-linked records; and (3) an improved spectral clustering method, which uses iterative bi-partitioning to better link records in an unsupervised way and to address the transitive closure problem. The proposed data linking method was evaluated in linking records extracted from ten bridge inspection reports. It achieved an average precision, recall, and F-1 measure of 96.2%, 88.3%, and 92.1%, respectively.

Original languageEnglish (US)
Article number101496
JournalAdvanced Engineering Informatics
Volume51
DOIs
StatePublished - Jan 2022
Externally publishedYes

Keywords

  • Bridges
  • Data linking/linkage
  • Deterioration prediction
  • Maintenance decision making
  • Similarity assessment
  • Spectral clustering
  • Unsupervised machine learning

ASJC Scopus subject areas

  • Information Systems
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Improved similarity assessment and spectral clustering for unsupervised linking of data extracted from bridge inspection reports'. Together they form a unique fingerprint.

Cite this