Data-theoretic approach for socio-technical risk analysis: Text mining licensee event reports of U.S. nuclear power plants

Justin Pence, Pegah Farshadmanesh, Jinmo Kim, Cathy Blake, Zahra Mohaghegh

Research output: Contribution to journalArticlepeer-review


This paper is a product of a line of research that uses the Socio-Technical Risk Analysis (SoTeRiA) theoretical framework and Integrated PRA (I-PRA) methodological framework to theorize and quantify underlying organizational mechanisms contributing to socio-technical system risk scenarios. I-PRA has an input module that executes the Data-Theoretic (DT) approach, where “data analytics” can be guided by “theory.” The DT input module of I-PRA has two sub-modules: (1) DT-BASE, for developing detailed grounded theory-based causal relationships in SoTeRiA, equipped with a software-supported BASEline quantification utilizing information extracted from academic articles, industry procedures, and regulatory standards, and (2) DT-SITE, using data analytics to refine and measure the causal factors of SoTeRiA based on industry event databases and using Bayesian analysis to update the baseline quantification. This paper focuses on the advancement of DT-SITE, contributing to the integration of text mining with the measurement of organizational factors for PRA, and demonstrating the following methodological elements and steps in DT-SITE: (Element 2.1) Text mining: (Step i) collect and pre-process unstructured text data, (Step ii) identify theory-based seed terms based on DT-BASE causal model, (Step iii) generate features, and (Step iv) build and evaluate classifiers (e.g., by using Support Vector Machine [SVM]); and (Element 2.2) Estimating probabilities and their associated uncertainties. The DT-SITE methodology is applied in a case study targeting the “training system” in Nuclear Power Plants (NPPs) and using Licensee Event Reports (LERs) from the U.S. nuclear power industry, where LER-specific data extraction and pre-processing tools are developed.

Original languageEnglish (US)
Article number104574
JournalSafety Science
StatePublished - Apr 2020


  • Machine learning
  • Organizational factors
  • Probabilistic risk assessment
  • Text mining

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Safety Research
  • Public Health, Environmental and Occupational Health


Dive into the research topics of 'Data-theoretic approach for socio-technical risk analysis: Text mining licensee event reports of U.S. nuclear power plants'. Together they form a unique fingerprint.

Cite this