TY - GEN
T1 - Ontology-based, multi-label text classification for enhanced information retrieval for supporting automated environmental compliance checking
AU - Zhou, Peng
AU - El-Gohary, Nora
PY - 2014
Y1 - 2014
N2 - In order to fully automate the environmental regulatory compliance checking process, we need to automatically extract the rules from applicable environmental regulatory textual documents, such as energy conservation codes. In our automated compliance checking (ACC) approach, prior to rule extraction, we first classify the text into pre-defined categories to only retrieve relevant clauses and filter out irrelevant ones, thereby improving the efficiency and accuracy of rule extraction. Machine learning (ML) techniques have been commonly-used for text classification (TC). ML-based TC has, generally, performed well. However, given the need for an exceptionally-high performance (100% recall and >85% precision) for ACC (to avoid consequent compliance reasoning errors), we need further performance improvement. Therefore, in this paper, we present an ontology-based TC algorithm to further improve the classification performance by utilizing the semantic features of the text. We used a domain ontology for conceptualizing the environmental knowledge. In comparison to the ML-based approach, in our ontology-based approach, a document (or clause) is represented in terms of semantic concepts and relations, rather than just terms (words). The semantic concepts and relations in the ontology (e.g. "is-a" relations) help in recognizing the semantic features of the text. Our ontology-based TC algorithm was tested on twelve environmental regulatory documents - such as the 2012 International Energy Conservation Code - evaluated in terms of precision and recall, and compared with our previously-utilized ML-based approach. Our results show that our ontology-based approach achieves 96.62% and 96.34% recall and precision, respectively, thereby outperforming the ML-based approach.
AB - In order to fully automate the environmental regulatory compliance checking process, we need to automatically extract the rules from applicable environmental regulatory textual documents, such as energy conservation codes. In our automated compliance checking (ACC) approach, prior to rule extraction, we first classify the text into pre-defined categories to only retrieve relevant clauses and filter out irrelevant ones, thereby improving the efficiency and accuracy of rule extraction. Machine learning (ML) techniques have been commonly-used for text classification (TC). ML-based TC has, generally, performed well. However, given the need for an exceptionally-high performance (100% recall and >85% precision) for ACC (to avoid consequent compliance reasoning errors), we need further performance improvement. Therefore, in this paper, we present an ontology-based TC algorithm to further improve the classification performance by utilizing the semantic features of the text. We used a domain ontology for conceptualizing the environmental knowledge. In comparison to the ML-based approach, in our ontology-based approach, a document (or clause) is represented in terms of semantic concepts and relations, rather than just terms (words). The semantic concepts and relations in the ontology (e.g. "is-a" relations) help in recognizing the semantic features of the text. Our ontology-based TC algorithm was tested on twelve environmental regulatory documents - such as the 2012 International Energy Conservation Code - evaluated in terms of precision and recall, and compared with our previously-utilized ML-based approach. Our results show that our ontology-based approach achieves 96.62% and 96.34% recall and precision, respectively, thereby outperforming the ML-based approach.
UR - http://www.scopus.com/inward/record.url?scp=84934272603&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84934272603&partnerID=8YFLogxK
U2 - 10.1061/9780784413616.278
DO - 10.1061/9780784413616.278
M3 - Conference contribution
AN - SCOPUS:84934272603
T3 - Computing in Civil and Building Engineering - Proceedings of the 2014 International Conference on Computing in Civil and Building Engineering
SP - 2238
EP - 2245
BT - Computing in Civil and Building Engineering - Proceedings of the 2014 International Conference on Computing in Civil and Building Engineering
A2 - Issa, R. Raymond
A2 - Flood, Ian
PB - American Society of Civil Engineers (ASCE)
T2 - 2014 International Conference on Computing in Civil and Building Engineering
Y2 - 23 June 2014 through 25 June 2014
ER -