TY - GEN
T1 - A dynamic replica selection algorithm for tolerating timing faults
AU - Krishnamurthy, Sudha
AU - Sanders, William H.
AU - Cukier, Michel
PY - 2001
Y1 - 2001
N2 - Server replication is commonly used to improve the fault tolerance and response time of distributed services. An important problem when executing time-critical applications in a replicated environment is that of preventing timing failures by dynamically selecting the replicas that can satisfy, a client's timing requirement, even when the quality of service is degraded due to replica failures and excess load on the server In this paper, we describe the approach we have used to solve this problem in AQUA, a CORBA-based middleware that transparently replicates objects across a local area network. The approach we use estimates a replica's response time distribution based on performance measurements regularly broadcast by the replica. An online model uses these measurements to predict the probability with which a replica can prevent a timing failure for a client. A selection algorithm then uses this prediction to choose a subset of replicas that can together meet the client's timing constraints with at least the probability requested by the client. We conclude with experimental results based on our implementation.
AB - Server replication is commonly used to improve the fault tolerance and response time of distributed services. An important problem when executing time-critical applications in a replicated environment is that of preventing timing failures by dynamically selecting the replicas that can satisfy, a client's timing requirement, even when the quality of service is degraded due to replica failures and excess load on the server In this paper, we describe the approach we have used to solve this problem in AQUA, a CORBA-based middleware that transparently replicates objects across a local area network. The approach we use estimates a replica's response time distribution based on performance measurements regularly broadcast by the replica. An online model uses these measurements to predict the probability with which a replica can prevent a timing failure for a client. A selection algorithm then uses this prediction to choose a subset of replicas that can together meet the client's timing constraints with at least the probability requested by the client. We conclude with experimental results based on our implementation.
UR - http://www.scopus.com/inward/record.url?scp=0035789431&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0035789431&partnerID=8YFLogxK
U2 - 10.1109/DSN.2001.941397
DO - 10.1109/DSN.2001.941397
M3 - Conference contribution
AN - SCOPUS:0035789431
SN - 0769511015
SN - 9780769511016
T3 - Proceedings of the International Conference on Dependable Systems and Networks
SP - 107
EP - 116
BT - Proceedings of the International Conference on Dependable Systems and Networks
A2 - Young, D.C.
A2 - Young, D.C.
T2 - Proceedings of the International Conference on Dependable Systems and Networks
Y2 - 1 July 2001 through 4 July 2001
ER -