Unsupervised named entity normalization for supporting information fusion for big bridge data analytics

Kaijian Liu, Nora El-Gohary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The large amount of multi-type and multi-source bridge data open unprecedented opportunities to big data analytics for better bridge deterioration prediction. Information fusion is needed prior to the analytics to transform the heterogeneous data from different sources into a unified representation. Resolving the ambiguities in the named entities extracted from bridge inspection reports is one of the most important fusion tasks. The ambiguity stems from the use of different and ambiguous surface forms to the same target named entity. There is, thus, a need for named entity normalization (NEN) methods that can map these ambiguous surface forms into their canonical form – an identifier concept. However, existing NEN methods are limited in this regard. This is because they mostly require pre-established knowledge (e.g., dictionaries or Wikipedia) and/or training data, and mostly ignore the impact of the normalization on data analytics. To address this need, this paper proposes an unsupervised NEN method. It includes two main components: candidate identifier concept generation based on multi-grams of each named entity set, and candidate identifier concept ranking based on a proposed ranking function. The function uses the TF-IDF (term frequency–inverse document frequency) weight and is further improved by considering the impacts of gram lengths and positions on the ranking. It aims to balance the abstractness and detailedness of the identifier concepts, so as to ensure that the resulting data are neither too dense nor too sparse for the analytics. A set of experiments were conducted to evaluate the performance of the proposed method. It achieved an accuracy of 84.5%.

Original languageEnglish (US)
Title of host publicationAdvanced Computing Strategies for Engineering - 25th EG-ICE International Workshop 2018, Proceedings
EditorsIan F. Smith, Bernd Domer
PublisherSpringer-Verlag Berlin Heidelberg
Pages130-149
Number of pages20
ISBN (Print)9783319916378
DOIs
StatePublished - 2018
Event25th Workshop of the European Group for Intelligent Computing in Engineering, EG-ICE 2018 - Lausanne, Switzerland
Duration: Jun 10 2018Jun 13 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10864 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other25th Workshop of the European Group for Intelligent Computing in Engineering, EG-ICE 2018
CountrySwitzerland
CityLausanne
Period6/10/186/13/18

Keywords

  • Big data analytics
  • Bridge deterioration prediction
  • Named entity normalization

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Unsupervised named entity normalization for supporting information fusion for big bridge data analytics'. Together they form a unique fingerprint.

Cite this