Ontology-based information extraction from environmental regulations for supporting environmental compliance checking

Peng Zhou, Nora El-Gohary

Research output: Contribution to conferencePaper

Abstract

Automated environmental regulatory compliance checking requires automated extraction of regulatory requirements/rules from environmental regulatory textual documents, such as energy conservation codes and environmental protection agency (EPA) regulations. Natural language processing (NLP) aims to enable computers to analyze and process natural text in a human-like manner. Information extraction (IE) is an application of NLP that aims to automatically extract specific information from text to support a specific computational task. In the proposed automated compliance checking (ACC) approach, after classifying the text for filtering out irrelevant regulatory provisions, pattern-matching-based IE techniques are used for extracting regulatory information, from the classified text, into certain predefined semantic patterns. In their previous work, the authors have proposed a semantic, rule-based methodology and algorithm for extracting information from building codes. This paper builds on the authors' previous work in three main ways. First, the proposed IE algorithm is used in combination with text classification (TC) algorithms to enhance the efficiency (by avoiding unnecessary computational processing of irrelevant text) and performance (by avoiding potential noise and errors resulting from processing irrelevant text) of IE. Second, the IE algorithm is adapted to environmental regulatory text, which is different from building codes in terms of its syntactic and semantic features. Third, to enhance performance, a deeper (more detailed) ontology is used and a conceptual dependency structure is built to capture dependency information to reduce text ambiguities. The proposed IE algorithm was tested in extracting regulatory requirements from the 2012 International Energy Conservation Code, and the testing results showed 99.85% recall and 99.55% precision.

Original languageEnglish (US)
Pages190-198
Number of pages9
StatePublished - Jan 1 2015
Event2015 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2015 - Austin, United States
Duration: Jun 21 2015Jun 23 2015

Other

Other2015 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2015
CountryUnited States
CityAustin
Period6/21/156/23/15

Fingerprint

Environmental regulations
Ontology
Semantics
Energy conservation
Processing
Text processing
Pattern matching
Environmental Protection Agency
Syntactics
Compliance
Testing

ASJC Scopus subject areas

  • Civil and Structural Engineering
  • Computer Science Applications

Cite this

Zhou, P., & El-Gohary, N. (2015). Ontology-based information extraction from environmental regulations for supporting environmental compliance checking. 190-198. Paper presented at 2015 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2015, Austin, United States.

Ontology-based information extraction from environmental regulations for supporting environmental compliance checking. / Zhou, Peng; El-Gohary, Nora.

2015. 190-198 Paper presented at 2015 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2015, Austin, United States.

Research output: Contribution to conferencePaper

Zhou, P & El-Gohary, N 2015, 'Ontology-based information extraction from environmental regulations for supporting environmental compliance checking' Paper presented at 2015 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2015, Austin, United States, 6/21/15 - 6/23/15, pp. 190-198.
Zhou P, El-Gohary N. Ontology-based information extraction from environmental regulations for supporting environmental compliance checking. 2015. Paper presented at 2015 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2015, Austin, United States.
Zhou, Peng ; El-Gohary, Nora. / Ontology-based information extraction from environmental regulations for supporting environmental compliance checking. Paper presented at 2015 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2015, Austin, United States.9 p.
@conference{96c1dd1358514a068b8a93fc970d256a,
title = "Ontology-based information extraction from environmental regulations for supporting environmental compliance checking",
abstract = "Automated environmental regulatory compliance checking requires automated extraction of regulatory requirements/rules from environmental regulatory textual documents, such as energy conservation codes and environmental protection agency (EPA) regulations. Natural language processing (NLP) aims to enable computers to analyze and process natural text in a human-like manner. Information extraction (IE) is an application of NLP that aims to automatically extract specific information from text to support a specific computational task. In the proposed automated compliance checking (ACC) approach, after classifying the text for filtering out irrelevant regulatory provisions, pattern-matching-based IE techniques are used for extracting regulatory information, from the classified text, into certain predefined semantic patterns. In their previous work, the authors have proposed a semantic, rule-based methodology and algorithm for extracting information from building codes. This paper builds on the authors' previous work in three main ways. First, the proposed IE algorithm is used in combination with text classification (TC) algorithms to enhance the efficiency (by avoiding unnecessary computational processing of irrelevant text) and performance (by avoiding potential noise and errors resulting from processing irrelevant text) of IE. Second, the IE algorithm is adapted to environmental regulatory text, which is different from building codes in terms of its syntactic and semantic features. Third, to enhance performance, a deeper (more detailed) ontology is used and a conceptual dependency structure is built to capture dependency information to reduce text ambiguities. The proposed IE algorithm was tested in extracting regulatory requirements from the 2012 International Energy Conservation Code, and the testing results showed 99.85{\%} recall and 99.55{\%} precision.",
author = "Peng Zhou and Nora El-Gohary",
year = "2015",
month = "1",
day = "1",
language = "English (US)",
pages = "190--198",
note = "2015 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2015 ; Conference date: 21-06-2015 Through 23-06-2015",

}

TY - CONF

T1 - Ontology-based information extraction from environmental regulations for supporting environmental compliance checking

AU - Zhou, Peng

AU - El-Gohary, Nora

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Automated environmental regulatory compliance checking requires automated extraction of regulatory requirements/rules from environmental regulatory textual documents, such as energy conservation codes and environmental protection agency (EPA) regulations. Natural language processing (NLP) aims to enable computers to analyze and process natural text in a human-like manner. Information extraction (IE) is an application of NLP that aims to automatically extract specific information from text to support a specific computational task. In the proposed automated compliance checking (ACC) approach, after classifying the text for filtering out irrelevant regulatory provisions, pattern-matching-based IE techniques are used for extracting regulatory information, from the classified text, into certain predefined semantic patterns. In their previous work, the authors have proposed a semantic, rule-based methodology and algorithm for extracting information from building codes. This paper builds on the authors' previous work in three main ways. First, the proposed IE algorithm is used in combination with text classification (TC) algorithms to enhance the efficiency (by avoiding unnecessary computational processing of irrelevant text) and performance (by avoiding potential noise and errors resulting from processing irrelevant text) of IE. Second, the IE algorithm is adapted to environmental regulatory text, which is different from building codes in terms of its syntactic and semantic features. Third, to enhance performance, a deeper (more detailed) ontology is used and a conceptual dependency structure is built to capture dependency information to reduce text ambiguities. The proposed IE algorithm was tested in extracting regulatory requirements from the 2012 International Energy Conservation Code, and the testing results showed 99.85% recall and 99.55% precision.

AB - Automated environmental regulatory compliance checking requires automated extraction of regulatory requirements/rules from environmental regulatory textual documents, such as energy conservation codes and environmental protection agency (EPA) regulations. Natural language processing (NLP) aims to enable computers to analyze and process natural text in a human-like manner. Information extraction (IE) is an application of NLP that aims to automatically extract specific information from text to support a specific computational task. In the proposed automated compliance checking (ACC) approach, after classifying the text for filtering out irrelevant regulatory provisions, pattern-matching-based IE techniques are used for extracting regulatory information, from the classified text, into certain predefined semantic patterns. In their previous work, the authors have proposed a semantic, rule-based methodology and algorithm for extracting information from building codes. This paper builds on the authors' previous work in three main ways. First, the proposed IE algorithm is used in combination with text classification (TC) algorithms to enhance the efficiency (by avoiding unnecessary computational processing of irrelevant text) and performance (by avoiding potential noise and errors resulting from processing irrelevant text) of IE. Second, the IE algorithm is adapted to environmental regulatory text, which is different from building codes in terms of its syntactic and semantic features. Third, to enhance performance, a deeper (more detailed) ontology is used and a conceptual dependency structure is built to capture dependency information to reduce text ambiguities. The proposed IE algorithm was tested in extracting regulatory requirements from the 2012 International Energy Conservation Code, and the testing results showed 99.85% recall and 99.55% precision.

UR - http://www.scopus.com/inward/record.url?scp=84936872225&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84936872225&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:84936872225

SP - 190

EP - 198

ER -