TY - JOUR
T1 - Troubleshooting interactive complexity bugs in wireless sensor networks using data mining techniques
AU - Khan, Mohammad Maifi Hasan
AU - Khac Le, Hieu
AU - Ahmadi, Hossein
AU - Abdelzaher, Tarek F.
AU - Han, Jiawei
PY - 2014/1
Y1 - 2014/1
N2 - This article presents a tool for uncovering bugs due to interactive complexity in networked sensing applications. Such bugs are not localized to one component that is faulty, but rather result from complex and unexpected interactions between multiple often individually nonfaulty components. Moreover, the manifestations of these bugs are often not repeatable, making them particularly hard to find, as the particular sequence of events that invokes the bug may not be easy to reconstruct. Because of the distributed nature of failure scenarios, our tool looks for sequences of events that may be responsible for faulty behavior, as opposed to localized bugs such as a bad pointer in a module. We identified several challenges in applying discriminative sequence mining for root cause analysis when the system fails to perform as expected and presented our solutions to those challenges. We also present two alternative schemes, namely, two-stage mining and the progressive discriminative sequence mining to address the scalability challenge. An extensible framework is developed where a front-end collects runtime data logs of the system being debugged and an offline back-end uses frequent discriminative pattern mining to uncover likely causes of failure. We provided several case studies where we applied our tool successfully to troubleshoot the cause of the problem. We uncovered a kernel-level race condition bug in the LiteOS operating system and a protocol design bug in the directed diffusion protocol. We also presented a case study of debugging a multichannel MAC protocol that was found to exhibit corner cases of poor performance (worse than single-channel MAC). The tool helped to uncover event sequences that lead to a highly degraded mode of operation. Fixing the problem significantly improved the performance of the protocol. We also evaluated the extensions presented in this article. Finally, we provided a detailed analysis of tool overhead in terms ofmemory requirements and impact on the running application.
AB - This article presents a tool for uncovering bugs due to interactive complexity in networked sensing applications. Such bugs are not localized to one component that is faulty, but rather result from complex and unexpected interactions between multiple often individually nonfaulty components. Moreover, the manifestations of these bugs are often not repeatable, making them particularly hard to find, as the particular sequence of events that invokes the bug may not be easy to reconstruct. Because of the distributed nature of failure scenarios, our tool looks for sequences of events that may be responsible for faulty behavior, as opposed to localized bugs such as a bad pointer in a module. We identified several challenges in applying discriminative sequence mining for root cause analysis when the system fails to perform as expected and presented our solutions to those challenges. We also present two alternative schemes, namely, two-stage mining and the progressive discriminative sequence mining to address the scalability challenge. An extensible framework is developed where a front-end collects runtime data logs of the system being debugged and an offline back-end uses frequent discriminative pattern mining to uncover likely causes of failure. We provided several case studies where we applied our tool successfully to troubleshoot the cause of the problem. We uncovered a kernel-level race condition bug in the LiteOS operating system and a protocol design bug in the directed diffusion protocol. We also presented a case study of debugging a multichannel MAC protocol that was found to exhibit corner cases of poor performance (worse than single-channel MAC). The tool helped to uncover event sequences that lead to a highly degraded mode of operation. Fixing the problem significantly improved the performance of the protocol. We also evaluated the extensions presented in this article. Finally, we provided a detailed analysis of tool overhead in terms ofmemory requirements and impact on the running application.
KW - Distributed protocol debugging
KW - Wireless sensor networks
UR - http://www.scopus.com/inward/record.url?scp=84893359386&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893359386&partnerID=8YFLogxK
U2 - 10.1145/2530290
DO - 10.1145/2530290
M3 - Article
AN - SCOPUS:84893359386
SN - 1550-4859
VL - 10
JO - ACM Transactions on Sensor Networks
JF - ACM Transactions on Sensor Networks
IS - 2
M1 - 2530290
ER -