TY - GEN
T1 - The Methodological Pitfall of Dataset-Driven Research on Deep Learning
T2 - 2022 IEEE Military Communications Conference, MILCOM 2022
AU - Wang, Tianshi
AU - Kara, Denizhan
AU - Li, Jinyang
AU - Liu, Shengzhong
AU - Abdelzaher, Tarek
AU - Jalaian, Brian
N1 - Research reported in this paper was sponsored in part by the U.S. DEVCOM Army Research Laboratory under Cooperative Agreement W911NF-17-20196, NSF CNS 20-38817, and the Boeing Company.
Research reported in this paper was sponsored in part by the U.S. DEV-COM Army Research Laboratory under Cooperative Agreement W911NF-17-20196, NSF CNS 20-38817, and the Boeing Company.
Research reported in this paper was sponsored in part by the Army Research Laboratory under Cooperative Agreement W911NF-17-20196, NSF CNS 20-38817, the IBM-Illinois Discovery Acceleration Institute, and the Boeing Company.
PY - 2022
Y1 - 2022
N2 - In this paper, we highlight a dangerous pitfall in the state-of-the-art evaluation methodology of deep learning algorithms. It results in deceptively good evaluation outcomes on test datasets, whereas the underlying algorithms remain prone to catastrophic failure in practice. We illustrate the pitfall in the context of an Internet-of-Things (IoT) application example and show that it occurs despite the use of cross-validation that breaks down the data into separate training, validation, and testing sets. The pitfall is illustrated by designing two target detection and classification algorithms. One is based on a recently proposed neural network architecture for embedded AI, and the other is based on a traditional machine learning approach with domain-inspired feature engineering. The neural network approach outperforms the traditional one on test data. Yet, it fails in deployment. The mechanics behind the failure are explained and linked to the way the algorithms are trained. Suggestions are presented to avoid the pitfall. The paper is a 'call to arms' to improve the evaluation methodology of machine learning algorithms for mission-critical systems.
AB - In this paper, we highlight a dangerous pitfall in the state-of-the-art evaluation methodology of deep learning algorithms. It results in deceptively good evaluation outcomes on test datasets, whereas the underlying algorithms remain prone to catastrophic failure in practice. We illustrate the pitfall in the context of an Internet-of-Things (IoT) application example and show that it occurs despite the use of cross-validation that breaks down the data into separate training, validation, and testing sets. The pitfall is illustrated by designing two target detection and classification algorithms. One is based on a recently proposed neural network architecture for embedded AI, and the other is based on a traditional machine learning approach with domain-inspired feature engineering. The neural network approach outperforms the traditional one on test data. Yet, it fails in deployment. The mechanics behind the failure are explained and linked to the way the algorithms are trained. Suggestions are presented to avoid the pitfall. The paper is a 'call to arms' to improve the evaluation methodology of machine learning algorithms for mission-critical systems.
UR - http://www.scopus.com/inward/record.url?scp=85147324687&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147324687&partnerID=8YFLogxK
U2 - 10.1109/MILCOM55135.2022.10017612
DO - 10.1109/MILCOM55135.2022.10017612
M3 - Conference contribution
AN - SCOPUS:85147324687
T3 - Proceedings - IEEE Military Communications Conference MILCOM
SP - 1082
EP - 1087
BT - MILCOM 2022 - 2022 IEEE Military Communications Conference
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 28 November 2022 through 2 December 2022
ER -