Ontology-based, multi-label text classification for enhanced information retrieval for supporting automated environmental compliance checking

Peng Zhou, Nora El-Gohary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In order to fully automate the environmental regulatory compliance checking process, we need to automatically extract the rules from applicable environmental regulatory textual documents, such as energy conservation codes. In our automated compliance checking (ACC) approach, prior to rule extraction, we first classify the text into pre-defined categories to only retrieve relevant clauses and filter out irrelevant ones, thereby improving the efficiency and accuracy of rule extraction. Machine learning (ML) techniques have been commonly-used for text classification (TC). ML-based TC has, generally, performed well. However, given the need for an exceptionally-high performance (100% recall and >85% precision) for ACC (to avoid consequent compliance reasoning errors), we need further performance improvement. Therefore, in this paper, we present an ontology-based TC algorithm to further improve the classification performance by utilizing the semantic features of the text. We used a domain ontology for conceptualizing the environmental knowledge. In comparison to the ML-based approach, in our ontology-based approach, a document (or clause) is represented in terms of semantic concepts and relations, rather than just terms (words). The semantic concepts and relations in the ontology (e.g. "is-a" relations) help in recognizing the semantic features of the text. Our ontology-based TC algorithm was tested on twelve environmental regulatory documents - such as the 2012 International Energy Conservation Code - evaluated in terms of precision and recall, and compared with our previously-utilized ML-based approach. Our results show that our ontology-based approach achieves 96.62% and 96.34% recall and precision, respectively, thereby outperforming the ML-based approach.

Original languageEnglish (US)
Title of host publicationComputing in Civil and Building Engineering - Proceedings of the 2014 International Conference on Computing in Civil and Building Engineering
EditorsR. Raymond Issa, Ian Flood
PublisherAmerican Society of Civil Engineers (ASCE)
Pages2238-2245
Number of pages8
ISBN (Electronic)9780784413616
DOIs
StatePublished - 2014
Event2014 International Conference on Computing in Civil and Building Engineering - Orlando, United States
Duration: Jun 23 2014Jun 25 2014

Publication series

NameComputing in Civil and Building Engineering - Proceedings of the 2014 International Conference on Computing in Civil and Building Engineering

Other

Other2014 International Conference on Computing in Civil and Building Engineering
Country/TerritoryUnited States
CityOrlando
Period6/23/146/25/14

ASJC Scopus subject areas

  • Computer Science Applications
  • Civil and Structural Engineering
  • Building and Construction

Fingerprint

Dive into the research topics of 'Ontology-based, multi-label text classification for enhanced information retrieval for supporting automated environmental compliance checking'. Together they form a unique fingerprint.

Cite this