Semantic-based text classification of environmental regulatory documents for supporting automated environmental compliance checking in construction

Peng Zhou, Nora El-Gohary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automated environmental compliance checking requires automated extraction of rules from environmental regulatory textual documents, such as energy conservation codes and U.S. Environmental Protection Agency (EPA) regulations. Automated rule extraction requires complex text processing and analysis for information extraction and subsequent formalization of the extracted information into computer-processable rules. In our automated compliance checking (ACC) approach, we first classify the text into predefined categories to filter out irrelevant text, thereby improving further semantic information extraction and compliance reasoning efficiency. The categories used are predefined in a semantic text classification (TC) topic hierarchy. In this paper, we present our machine-learning-based TC algorithm for classifying clauses in environmental regulatory documents based on the TC topic hierarchy. In developing our TC algorithm, different text preprocessing techniques, machine learning algorithms, and performance improvement strategies were tested and evaluated. Our final TC algorithm was tested on 10 regulatory documents, such as the 2012 International Energy Conservation Code, and evaluated in terms of precision and recall. The algorithm achieved around 96% and 85% recall and precision, respectively, on the testing data.

Original languageEnglish (US)
Title of host publicationConstruction Research Congress 2014
Subtitle of host publicationConstruction in a Global Network - Proceedings of the 2014 Construction Research Congress
PublisherAmerican Society of Civil Engineers (ASCE)
Pages897-906
Number of pages10
ISBN (Print)9780784413517
DOIs
StatePublished - Jan 1 2014
Event2014 Construction Research Congress: Construction in a Global Network, CRC 2014 - Atlanta, GA, United States
Duration: May 19 2014May 21 2014

Publication series

NameConstruction Research Congress 2014: Construction in a Global Network - Proceedings of the 2014 Construction Research Congress

Other

Other2014 Construction Research Congress: Construction in a Global Network, CRC 2014
Country/TerritoryUnited States
CityAtlanta, GA
Period5/19/145/21/14

ASJC Scopus subject areas

  • Building and Construction

Fingerprint

Dive into the research topics of 'Semantic-based text classification of environmental regulatory documents for supporting automated environmental compliance checking in construction'. Together they form a unique fingerprint.

Cite this