TY - JOUR
T1 - Active replication of multithreaded applications
AU - Basile, Claudio
AU - Kalbarczyk, Zbigniew
AU - Iyer, Ravishankar K.
N1 - Funding Information:
This work is supported in part by a Vodafone fellowship, US National Science Foundation (NSF) grants CCR 99-02026 and CCR 00-86096 ITR, MURI grant N00014-01-1-0576, and the Gigascale Systems Research Center (GSRC/MARCO), and the Motorola Corporation as part of the Motorola Center. The authors thank Fran Baker for insightful editing of this manuscript.
PY - 2006/5
Y1 - 2006/5
N2 - Software-based active replication is expensive in terms of performance overhead. Multithreading can help improve performance; however, thread scheduling is a source of nondeterminism in replica behavior. To achieve strong replica consistency in multithreaded environments, this paper proposes intercepting mutex lock/unlock operations performed by threads on accessing the shared data and contributes with two algorithmic solutions: 1) a Loose Synchronization Algorithm (LSA), which captures the natural concurrency in a leader replica and projects it on follower replicas through interreplica communication, and 2) a Preemptive Deterministic Scheduler (PDS) algorithm, which removes the need for interreplica communication through the notion of round and by suspending threads when it is unable (yet) to schedule them deterministically. Failure behavior and performance of LSA and PDS implementations are evaluated in a triplicated system and compared with existing solutions. A performance evaluation indicates that LSA and PDS outperform existing solutions, with PDS offering lower throughput than LSA. A fault-injection campaign shows that PDS is more robust to errors due to the absence of interreplica communication. Hence, LSA and PDS represent a trade-off between performance and dependability. Finally, LSA and PDS are demonstrated in replicating the Apache Web server, a substantial real-world application.
AB - Software-based active replication is expensive in terms of performance overhead. Multithreading can help improve performance; however, thread scheduling is a source of nondeterminism in replica behavior. To achieve strong replica consistency in multithreaded environments, this paper proposes intercepting mutex lock/unlock operations performed by threads on accessing the shared data and contributes with two algorithmic solutions: 1) a Loose Synchronization Algorithm (LSA), which captures the natural concurrency in a leader replica and projects it on follower replicas through interreplica communication, and 2) a Preemptive Deterministic Scheduler (PDS) algorithm, which removes the need for interreplica communication through the notion of round and by suspending threads when it is unable (yet) to schedule them deterministically. Failure behavior and performance of LSA and PDS implementations are evaluated in a triplicated system and compared with existing solutions. A performance evaluation indicates that LSA and PDS outperform existing solutions, with PDS offering lower throughput than LSA. A fault-injection campaign shows that PDS is more robust to errors due to the absence of interreplica communication. Hence, LSA and PDS represent a trade-off between performance and dependability. Finally, LSA and PDS are demonstrated in replicating the Apache Web server, a substantial real-world application.
KW - Fault injection
KW - Fault tolerance
KW - Multithreading
KW - Nondeterminism
KW - Replication
UR - http://www.scopus.com/inward/record.url?scp=33645820246&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33645820246&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2006.56
DO - 10.1109/TPDS.2006.56
M3 - Article
AN - SCOPUS:33645820246
SN - 1045-9219
VL - 17
SP - 448
EP - 465
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 5
ER -