TY - JOUR
T1 - Data-theoretic approach for socio-technical risk analysis
T2 - Text mining licensee event reports of U.S. nuclear power plants
AU - Pence, Justin
AU - Farshadmanesh, Pegah
AU - Kim, Jinmo
AU - Blake, Cathy
AU - Mohaghegh, Zahra
N1 - This material is based on work supported by the United States National Science Foundation (NSF) under Grant No. 1535167. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The authors would like to thank graduate student, Jooho Lee, for her support in data pre-processing. The authors would also like to thank all members of the Socio-Technical Risk Analysis (SoTeRiA) Laboratory (http://soteria.npre.illinois.edu/) for their feedback, and especially appreciate the support from research scientist Dr. Seyed Reihani, graduate research assistants Ha Hoang Bui and Jaemin Yang, and undergraduate researchers Nalin Gadihoke and Nimay Desai.
This material is based on work supported by the United States National Science Foundation (NSF) under Grant No. 1535167 . Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The authors would like to thank graduate student, Jooho Lee, for her support in data pre-processing. The authors would also like to thank all members of the Socio-Technical Risk Analysis (SoTeRiA) Laboratory (http://soteria.npre.illinois.edu/) for their feedback, and especially appreciate the support from research scientist Dr. Seyed Reihani, graduate research assistants Ha Hoang Bui and Jaemin Yang, and undergraduate researchers Nalin Gadihoke and Nimay Desai. Appendix A
PY - 2020/4
Y1 - 2020/4
N2 - This paper is a product of a line of research that uses the Socio-Technical Risk Analysis (SoTeRiA) theoretical framework and Integrated PRA (I-PRA) methodological framework to theorize and quantify underlying organizational mechanisms contributing to socio-technical system risk scenarios. I-PRA has an input module that executes the Data-Theoretic (DT) approach, where “data analytics” can be guided by “theory.” The DT input module of I-PRA has two sub-modules: (1) DT-BASE, for developing detailed grounded theory-based causal relationships in SoTeRiA, equipped with a software-supported BASEline quantification utilizing information extracted from academic articles, industry procedures, and regulatory standards, and (2) DT-SITE, using data analytics to refine and measure the causal factors of SoTeRiA based on industry event databases and using Bayesian analysis to update the baseline quantification. This paper focuses on the advancement of DT-SITE, contributing to the integration of text mining with the measurement of organizational factors for PRA, and demonstrating the following methodological elements and steps in DT-SITE: (Element 2.1) Text mining: (Step i) collect and pre-process unstructured text data, (Step ii) identify theory-based seed terms based on DT-BASE causal model, (Step iii) generate features, and (Step iv) build and evaluate classifiers (e.g., by using Support Vector Machine [SVM]); and (Element 2.2) Estimating probabilities and their associated uncertainties. The DT-SITE methodology is applied in a case study targeting the “training system” in Nuclear Power Plants (NPPs) and using Licensee Event Reports (LERs) from the U.S. nuclear power industry, where LER-specific data extraction and pre-processing tools are developed.
AB - This paper is a product of a line of research that uses the Socio-Technical Risk Analysis (SoTeRiA) theoretical framework and Integrated PRA (I-PRA) methodological framework to theorize and quantify underlying organizational mechanisms contributing to socio-technical system risk scenarios. I-PRA has an input module that executes the Data-Theoretic (DT) approach, where “data analytics” can be guided by “theory.” The DT input module of I-PRA has two sub-modules: (1) DT-BASE, for developing detailed grounded theory-based causal relationships in SoTeRiA, equipped with a software-supported BASEline quantification utilizing information extracted from academic articles, industry procedures, and regulatory standards, and (2) DT-SITE, using data analytics to refine and measure the causal factors of SoTeRiA based on industry event databases and using Bayesian analysis to update the baseline quantification. This paper focuses on the advancement of DT-SITE, contributing to the integration of text mining with the measurement of organizational factors for PRA, and demonstrating the following methodological elements and steps in DT-SITE: (Element 2.1) Text mining: (Step i) collect and pre-process unstructured text data, (Step ii) identify theory-based seed terms based on DT-BASE causal model, (Step iii) generate features, and (Step iv) build and evaluate classifiers (e.g., by using Support Vector Machine [SVM]); and (Element 2.2) Estimating probabilities and their associated uncertainties. The DT-SITE methodology is applied in a case study targeting the “training system” in Nuclear Power Plants (NPPs) and using Licensee Event Reports (LERs) from the U.S. nuclear power industry, where LER-specific data extraction and pre-processing tools are developed.
KW - Machine learning
KW - Organizational factors
KW - Probabilistic risk assessment
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85077758063&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077758063&partnerID=8YFLogxK
U2 - 10.1016/j.ssci.2019.104574
DO - 10.1016/j.ssci.2019.104574
M3 - Article
AN - SCOPUS:85077758063
SN - 0925-7535
VL - 124
JO - Safety Science
JF - Safety Science
M1 - 104574
ER -