TY - GEN
T1 - Ontology-based information extraction from environmental regulations for supporting environmental compliance checking
AU - Zhou, Peng
AU - El-Gohary, Nora
N1 - Publisher Copyright:
© 2015 ASCE.
PY - 2015
Y1 - 2015
N2 - Automated environmental regulatory compliance checking requires automated extraction of regulatory requirements/rules from environmental regulatory textual documents, such as energy conservation codes and environmental protection agency (EPA) regulations. Natural language processing (NLP) aims to enable computers to analyze and process natural text in a human-like manner. Information extraction (IE) is an application of NLP that aims to automatically extract specific information from text to support a specific computational task. In the proposed automated compliance checking (ACC) approach, after classifying the text for filtering out irrelevant regulatory provisions, pattern-matching-based IE techniques are used for extracting regulatory information, from the classified text, into certain predefined semantic patterns. In their previous work, the authors have proposed a semantic, rule-based methodology and algorithm for extracting information from building codes. This paper builds on the authors' previous work in three main ways. First, the proposed IE algorithm is used in combination with text classification (TC) algorithms to enhance the efficiency (by avoiding unnecessary computational processing of irrelevant text) and performance (by avoiding potential noise and errors resulting from processing irrelevant text) of IE. Second, the IE algorithm is adapted to environmental regulatory text, which is different from building codes in terms of its syntactic and semantic features. Third, to enhance performance, a deeper (more detailed) ontology is used and a conceptual dependency structure is built to capture dependency information to reduce text ambiguities. The proposed IE algorithm was tested in extracting regulatory requirements from the 2012 International Energy Conservation Code, and the testing results showed 99.85% recall and 99.55% precision.
AB - Automated environmental regulatory compliance checking requires automated extraction of regulatory requirements/rules from environmental regulatory textual documents, such as energy conservation codes and environmental protection agency (EPA) regulations. Natural language processing (NLP) aims to enable computers to analyze and process natural text in a human-like manner. Information extraction (IE) is an application of NLP that aims to automatically extract specific information from text to support a specific computational task. In the proposed automated compliance checking (ACC) approach, after classifying the text for filtering out irrelevant regulatory provisions, pattern-matching-based IE techniques are used for extracting regulatory information, from the classified text, into certain predefined semantic patterns. In their previous work, the authors have proposed a semantic, rule-based methodology and algorithm for extracting information from building codes. This paper builds on the authors' previous work in three main ways. First, the proposed IE algorithm is used in combination with text classification (TC) algorithms to enhance the efficiency (by avoiding unnecessary computational processing of irrelevant text) and performance (by avoiding potential noise and errors resulting from processing irrelevant text) of IE. Second, the IE algorithm is adapted to environmental regulatory text, which is different from building codes in terms of its syntactic and semantic features. Third, to enhance performance, a deeper (more detailed) ontology is used and a conceptual dependency structure is built to capture dependency information to reduce text ambiguities. The proposed IE algorithm was tested in extracting regulatory requirements from the 2012 International Energy Conservation Code, and the testing results showed 99.85% recall and 99.55% precision.
UR - http://www.scopus.com/inward/record.url?scp=84936872225&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84936872225&partnerID=8YFLogxK
U2 - 10.1061/9780784479247.024
DO - 10.1061/9780784479247.024
M3 - Conference contribution
AN - SCOPUS:84936872225
T3 - Congress on Computing in Civil Engineering, Proceedings
SP - 190
EP - 198
BT - Computing in Civil Engineering 2015 - Proceedings of the 2015 International Workshop on Computing in Civil Engineering
A2 - O'Brien, William J.
A2 - Ponticelli, Simone
PB - American Society of Civil Engineers
T2 - 2015 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2015
Y2 - 21 June 2015 through 23 June 2015
ER -