TY - GEN
T1 - Identifying Failing Point Machines from Sensor-Free Train System Logs
AU - Yang, Ying
AU - Lou, Xin
AU - Chen, Binbin
AU - Winslett, Marianne
AU - Kalbarczyk, Zbigniew
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/10
Y1 - 2020/12/10
N2 - A great many train systems worldwide are legacy systems, without modern sensors whose data can be mined to detect and predict failures. In this paper, we show how to support failure identification in a legacy system with no sensors, using alarm and natural-language described event logs as the only data sources. With too few failures in a mass of log data to train a traditional machine learning model, we propose a new approach called SA-HMM (Survival Analysis-Hidden Markov Model). After enriching the event logs with Word2vec, SA-HMM uses HMMs and survival analysis to identify failure trends in individual assets and failure tendencies in types of assets, respectively, then combines the two part in a weighted sum that indicates the priority of each asset for preventative maintenance. Our evaluation of SA-HMM with a large amount of urban train data shows that SA-HMM greatly outperforms naive method, HMM, and one-class SVM methods in terms of precision and recall in identifying failing assets, while also offering a tunable balance between those two aspects of performance.
AB - A great many train systems worldwide are legacy systems, without modern sensors whose data can be mined to detect and predict failures. In this paper, we show how to support failure identification in a legacy system with no sensors, using alarm and natural-language described event logs as the only data sources. With too few failures in a mass of log data to train a traditional machine learning model, we propose a new approach called SA-HMM (Survival Analysis-Hidden Markov Model). After enriching the event logs with Word2vec, SA-HMM uses HMMs and survival analysis to identify failure trends in individual assets and failure tendencies in types of assets, respectively, then combines the two part in a weighted sum that indicates the priority of each asset for preventative maintenance. Our evaluation of SA-HMM with a large amount of urban train data shows that SA-HMM greatly outperforms naive method, HMM, and one-class SVM methods in terms of precision and recall in identifying failing assets, while also offering a tunable balance between those two aspects of performance.
KW - Cyber-physical system
KW - failure identification
KW - hidden Markov model
KW - survival analysis
KW - train system
UR - http://www.scopus.com/inward/record.url?scp=85103828816&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103828816&partnerID=8YFLogxK
U2 - 10.1109/BigData50022.2020.9377811
DO - 10.1109/BigData50022.2020.9377811
M3 - Conference contribution
AN - SCOPUS:85103828816
T3 - Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
SP - 1424
EP - 1429
BT - Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
A2 - Wu, Xintao
A2 - Jermaine, Chris
A2 - Xiong, Li
A2 - Hu, Xiaohua Tony
A2 - Kotevska, Olivera
A2 - Lu, Siyuan
A2 - Xu, Weijia
A2 - Aluru, Srinivas
A2 - Zhai, Chengxiang
A2 - Al-Masri, Eyhab
A2 - Chen, Zhiyuan
A2 - Saltz, Jeff
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th IEEE International Conference on Big Data, Big Data 2020
Y2 - 10 December 2020 through 13 December 2020
ER -