TY - GEN
T1 - Unsupervised Machine Learning for Augmented Data Analytics of Building Codes
AU - Zhang, Ruichuan
AU - El-Gohary, Nora
N1 - Publisher Copyright:
© 2019 American Society of Civil Engineers.
PY - 2019
Y1 - 2019
N2 - Existing automated code checking methods/tools are unable to automatically analyze and represent all types of requirements (e.g., requirements that are too complex or that require human judgement). Recent efforts in the area of augmented data analytics have proposed the use of templates to facilitate the analysis of text. However, most of these efforts have constructed such templates manually, which is labor-intensive. More importantly, it is difficult for manually-developed templates to capture the linguistic variations in building codes. More research is, thus, needed to automate the generation of templates to support the tagging and extraction of information from building codes. To address this need, this paper proposes an unsupervised machine-learning based method to extract sentence templates that describe syntactic and semantic features and patterns from building codes. The proposed method is composed of four main steps: (1) data preprocessing; (2) identifying the different groups of sentence fragments using clustering; (3) identifying the fixed parts and the slots in the templates based on the syntactic and semantic patterns of the sentence fragment groups; and (4) evaluating the extracted templates. The proposed method was implemented and tested on a corpus of text from the International Building Code. An accuracy of 0.76 was achieved.
AB - Existing automated code checking methods/tools are unable to automatically analyze and represent all types of requirements (e.g., requirements that are too complex or that require human judgement). Recent efforts in the area of augmented data analytics have proposed the use of templates to facilitate the analysis of text. However, most of these efforts have constructed such templates manually, which is labor-intensive. More importantly, it is difficult for manually-developed templates to capture the linguistic variations in building codes. More research is, thus, needed to automate the generation of templates to support the tagging and extraction of information from building codes. To address this need, this paper proposes an unsupervised machine-learning based method to extract sentence templates that describe syntactic and semantic features and patterns from building codes. The proposed method is composed of four main steps: (1) data preprocessing; (2) identifying the different groups of sentence fragments using clustering; (3) identifying the fixed parts and the slots in the templates based on the syntactic and semantic patterns of the sentence fragment groups; and (4) evaluating the extracted templates. The proposed method was implemented and tested on a corpus of text from the International Building Code. An accuracy of 0.76 was achieved.
UR - http://www.scopus.com/inward/record.url?scp=85092244245&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092244245&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85092244245
T3 - Computing in Civil Engineering 2019: Data, Sensing, and Analytics - Selected Papers from the ASCE International Conference on Computing in Civil Engineering 2019
SP - 74
EP - 81
BT - Computing in Civil Engineering 2019
A2 - Cho, Yong K.
A2 - Leite, Fernanda
A2 - Behzadan, Amir
A2 - Wang, Chao
PB - American Society of Civil Engineers
T2 - ASCE International Conference on Computing in Civil Engineering 2019: Data, Sensing, and Analytics, i3CE 2019
Y2 - 17 June 2019 through 19 June 2019
ER -