Abstract
Textual bridge inspection reports are important data sources for supporting data-driven bridge deterioration prediction and maintenance decision making. Information extraction methods are available to extract data/information from these reports to support data-driven analytics. However, directly using the extracted data/information in data analytics is still challenging because, even within the same report, there exist multiple data records that describe the same entity, which increases the dimensionality of the data and adversely affects the performance of the analytics. The first step to address this problem is to link the multiple records that describe the same entity and same type of instances (e.g., all cracks on a specific bridge deck), so that they can be subsequently fused into a single unified representation for dimensionality reduction without information loss. To address this need, this paper proposes a spectral clustering-based method for unsupervised data linking. The method includes: (1) a concept similarity assessment method, which allows for assessing concept similarity even when corpus or semantic information is not available for the application at hand; (2) a record similarity assessment method, which captures and uses similarity assessment dependencies to reduce the number of falsely-linked records; and (3) an improved spectral clustering method, which uses iterative bi-partitioning to better link records in an unsupervised way and to address the transitive closure problem. The proposed data linking method was evaluated in linking records extracted from ten bridge inspection reports. It achieved an average precision, recall, and F-1 measure of 96.2%, 88.3%, and 92.1%, respectively.
Original language | English (US) |
---|---|
Article number | 101496 |
Journal | Advanced Engineering Informatics |
Volume | 51 |
DOIs | |
State | Published - Jan 2022 |
Externally published | Yes |
Keywords
- Bridges
- Data linking/linkage
- Deterioration prediction
- Maintenance decision making
- Similarity assessment
- Spectral clustering
- Unsupervised machine learning
ASJC Scopus subject areas
- Information Systems
- Artificial Intelligence