TY - JOUR
T1 - Chameleon
T2 - A software infrastructure for adaptive fault tolerance
AU - Kalbarczyk, Zbigniew T.
AU - Iyer, Ravishankar K.
AU - Bagchi, Saurabh
AU - Whisnant, Keith
N1 - Funding Information:
This work was supported in part by Jet Propulsion Laboratory (JPL)—NASA under contract JPL961345 and in part by a grant from Tandem Computers (now a division of Compaq). We thank B. Horst (formerly of Tandem Computers) and our colleagues on REE team from JPL (especially R. Lee, J. Beahan, R. Some, D. Rennels, and F. Mathur) for many insightful discussions. We would like to thank J. Wang and M. Kalyanakrishnan for their contributions in developing the application and porting the environment to Windows NT.
PY - 1999
Y1 - 1999
N2 - This paper presents Chameleon, an adaptive infrastructure, which allows different levels of availability requirements to be simultaneously supported in a networked environment. Chameleon provides dependability through the use of special ARMORs - Adaptive, Reconfigurable, and Mobile Objects for Reliability - that control all operations in the Chameleon environment. Three broad classes of ARMORs are defined. 1) Managers oversee other ARMORs and recover from failures in their subordinates. 2) Daemons provide communication gateways to the ARMORs at the host node. They also make available a host's resources to the Chameleon environment. 3) Common ARMORs implement specific techniques for providing application-required dependability. Employing ARMORs, Chameleon makes available different fault-tolerant configurations and maintains run-time adaptation to changes in the availability requirements of an application. Flexible ARMOR architecture allows their composition to be reconfigured at run-time, i.e., the ARMORs may dynamically adapt to changing application requirements. In this paper, we describe ARMOR architecture, including ARMOR class hierarchy, basic building blocks, ARMOR composition, and use of ARMOR factories. We present how ARMORs can be reconfigured and reengineered and demonstrate how the architecture serves our objective of providing an adaptive software infrastructure. To our knowledge, Chameleon is one of the few real implementations which enables multiple fault tolerance strategies to exist in the same environment and supports fault-tolerant execution of substantially off-the-shelf applications via a software infrastructure only. Chameleon provides fault tolerance from the application's point of view as well as from the software infrastructure's point of view. To demonstrate the Chameleon capabilities, we have implemented a prototype infrastructure which provides set of ARMORs to initialize the environment and to support the dual and TMR application execution modes. Through this testbed environment, we measure the execution overhead and recovery times from failures in the user application, the Chameleon ARMORs, the hardware, and the operating system.
AB - This paper presents Chameleon, an adaptive infrastructure, which allows different levels of availability requirements to be simultaneously supported in a networked environment. Chameleon provides dependability through the use of special ARMORs - Adaptive, Reconfigurable, and Mobile Objects for Reliability - that control all operations in the Chameleon environment. Three broad classes of ARMORs are defined. 1) Managers oversee other ARMORs and recover from failures in their subordinates. 2) Daemons provide communication gateways to the ARMORs at the host node. They also make available a host's resources to the Chameleon environment. 3) Common ARMORs implement specific techniques for providing application-required dependability. Employing ARMORs, Chameleon makes available different fault-tolerant configurations and maintains run-time adaptation to changes in the availability requirements of an application. Flexible ARMOR architecture allows their composition to be reconfigured at run-time, i.e., the ARMORs may dynamically adapt to changing application requirements. In this paper, we describe ARMOR architecture, including ARMOR class hierarchy, basic building blocks, ARMOR composition, and use of ARMOR factories. We present how ARMORs can be reconfigured and reengineered and demonstrate how the architecture serves our objective of providing an adaptive software infrastructure. To our knowledge, Chameleon is one of the few real implementations which enables multiple fault tolerance strategies to exist in the same environment and supports fault-tolerant execution of substantially off-the-shelf applications via a software infrastructure only. Chameleon provides fault tolerance from the application's point of view as well as from the software infrastructure's point of view. To demonstrate the Chameleon capabilities, we have implemented a prototype infrastructure which provides set of ARMORs to initialize the environment and to support the dual and TMR application execution modes. Through this testbed environment, we measure the execution overhead and recovery times from failures in the user application, the Chameleon ARMORs, the hardware, and the operating system.
UR - http://www.scopus.com/inward/record.url?scp=0032686475&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0032686475&partnerID=8YFLogxK
U2 - 10.1109/71.774907
DO - 10.1109/71.774907
M3 - Article
AN - SCOPUS:0032686475
SN - 1045-9219
VL - 10
SP - 560
EP - 579
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 6
ER -