A Machine-Learning Approach for Semantically-Enriched Building-Code Sentence Generation for Automatic Semantic Analysis

Ruichuan Zhang, Nora El-Gohary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Existing automated code checking (ACC) systems require the extraction of requirements from regulatory textual documents into computer-processable rule representations. The information extraction processes in those ACC systems are based on either human interpretation, manual annotation, or predefined automated information extraction rules. Despite the high performance they showed, rule-based information extraction approaches, by nature, lack sufficient scalability - the rules typically need some level of adaptation if the characteristics of the text change. Machine learning-based methods, instead of relying on hand-crafted rules, automatically capture the underlying patterns of the existing training text and have a great capability of generalizing to a variety of texts. A more scalable, machine learning-based approach is thus needed to achieve a more robust performance across different types of codes/documents for automatically generating semantically-enriched building-code sentences for the purpose of ACC. To address this need, this paper proposes a machine learning-based approach for generating semantically-enriched building-code sentences, which are annotated syntactically and semantically, for supporting IE. For improved robustness and scalability, the proposed approach uses transfer learning strategies to train deep neural network models on both general-domain and domain-specific data. The proposed approach consists of four steps: (1) data preparation and preprocessing; (2) development of a base deep neural network model for generating semantically-enriched building-code sentences; (3) model training using transfer learning strategies; and (4) model evaluation. The proposed approach was evaluated on a corpus of sentences from the 2009 International Building Code (IBC) and the Champaign 2015 IBC Amendments. The preliminary results show that the proposed approach achieved an optimal precision of 88%, recall of 86%, and F1-measure of 87%, indicating good performance.

Original languageEnglish (US)
Title of host publicationConstruction Research Congress 2020
Subtitle of host publicationComputer Applications - Selected Papers from the Construction Research Congress 2020
EditorsPingbo Tang, David Grau, Mounir El Asmar
PublisherAmerican Society of Civil Engineers
Pages1261-1270
Number of pages10
ISBN (Electronic)9780784482865
StatePublished - 2020
EventConstruction Research Congress 2020: Computer Applications - Tempe, United States
Duration: Mar 8 2020Mar 10 2020

Publication series

NameConstruction Research Congress 2020: Computer Applications - Selected Papers from the Construction Research Congress 2020

Conference

ConferenceConstruction Research Congress 2020: Computer Applications
Country/TerritoryUnited States
CityTempe
Period3/8/203/10/20

ASJC Scopus subject areas

  • Civil and Structural Engineering
  • Building and Construction

Fingerprint

Dive into the research topics of 'A Machine-Learning Approach for Semantically-Enriched Building-Code Sentence Generation for Automatic Semantic Analysis'. Together they form a unique fingerprint.

Cite this