TY - JOUR
T1 - Ontology-based multilabel text classification of construction regulatory documents
AU - Zhou, Peng
AU - El-Gohary, Nora
N1 - Funding Information:
The authors would like to thank the National Science Foundation (NSF). This material is based upon work supported by NSF under Grant No. 1201170. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF.
Publisher Copyright:
© 2015 American Society of Civil Engineers.
Copyright:
Copyright 2016 Elsevier B.V., All rights reserved.
PY - 2016/7/1
Y1 - 2016/7/1
N2 - In order to fully automate the environmental regulatory compliance checking process, rules should be automatically extracted from applicable environmental regulatory textual documents, such as energy conservation codes. In the authors' automated compliance checking (ACC) approach, prior to rule extraction, the text is first classified into predefined categories to only retrieve relevant clauses and filter out irrelevant ones, thereby improving the efficiency and accuracy of rule extraction. Machine learning (ML) techniques have been commonly used for text classification (TC). Nonontology-based, ML-based TC has, generally, performed well. However, given the need for an exceptionally high performance in TC to support high performance in ACC, further TC performance improvement is needed. To address this need, an ontology-based TC algorithm is proposed to further improve the classification performance by utilizing the semantic features of the text. A domain ontology for conceptualizing the environmental knowledge was used. The proposed ontology-based TC algorithm was tested on 25 environmental regulatory documents, evaluated using four evaluation metrics, and compared with the authors' previously utilized ML-based approach. Based on the testing data, the results show that the ontology-based approach consistently outperformed the ML-based approach, under all evaluation metrics.
AB - In order to fully automate the environmental regulatory compliance checking process, rules should be automatically extracted from applicable environmental regulatory textual documents, such as energy conservation codes. In the authors' automated compliance checking (ACC) approach, prior to rule extraction, the text is first classified into predefined categories to only retrieve relevant clauses and filter out irrelevant ones, thereby improving the efficiency and accuracy of rule extraction. Machine learning (ML) techniques have been commonly used for text classification (TC). Nonontology-based, ML-based TC has, generally, performed well. However, given the need for an exceptionally high performance in TC to support high performance in ACC, further TC performance improvement is needed. To address this need, an ontology-based TC algorithm is proposed to further improve the classification performance by utilizing the semantic features of the text. A domain ontology for conceptualizing the environmental knowledge was used. The proposed ontology-based TC algorithm was tested on 25 environmental regulatory documents, evaluated using four evaluation metrics, and compared with the authors' previously utilized ML-based approach. Based on the testing data, the results show that the ontology-based approach consistently outperformed the ML-based approach, under all evaluation metrics.
UR - http://www.scopus.com/inward/record.url?scp=84975225132&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84975225132&partnerID=8YFLogxK
U2 - 10.1061/(ASCE)CP.1943-5487.0000530
DO - 10.1061/(ASCE)CP.1943-5487.0000530
M3 - Article
AN - SCOPUS:84975225132
SN - 0887-3801
VL - 30
JO - Journal of Computing in Civil Engineering
JF - Journal of Computing in Civil Engineering
IS - 4
M1 - 04015058
ER -