TY - GEN
T1 - ReViveI/O
T2 - Twelfth International Symposium on High-Performance Computer Architecture, 2006
AU - Nakano, Jun
AU - Montesinos, Pablo
AU - Gharachorloo, Kourosh
AU - Torrellas, Josep
PY - 2006
Y1 - 2006
N2 - The increasing demand for reliable computers has led to proposals for hardware-assisted rollback of memory state. Such approach promises major reductions in Mean Time To Repair (MTTR). The benefits are especially compelling for database servers, where existing recovery software typically leads to down-times of tens of minutes. Unfortunately, adoption of such proposals is hindered by the lack of efficient mechanisms for I/O recovery. This paper presents and evaluates ReViveI/O, a scheme for I/O undo and redo that is compatible with mechanisms for hardware-assisted rollback of memory state. We have fully implemented a Linux-based prototype that shows that low-overhead, low-MTTR recovery of I/O is feasible. For 20-120 ms between checkpoints, a throughput-oriented workload such as TPC-C has negligible over-head. Moreover, for 50 ms or less between checkpoints, the response time of a latency-bound workload such as WebStone remains tolerable. In all cases, the recovery time of ReViveI/O is practically negligible. The result is a cost-effective highly-available server.
AB - The increasing demand for reliable computers has led to proposals for hardware-assisted rollback of memory state. Such approach promises major reductions in Mean Time To Repair (MTTR). The benefits are especially compelling for database servers, where existing recovery software typically leads to down-times of tens of minutes. Unfortunately, adoption of such proposals is hindered by the lack of efficient mechanisms for I/O recovery. This paper presents and evaluates ReViveI/O, a scheme for I/O undo and redo that is compatible with mechanisms for hardware-assisted rollback of memory state. We have fully implemented a Linux-based prototype that shows that low-overhead, low-MTTR recovery of I/O is feasible. For 20-120 ms between checkpoints, a throughput-oriented workload such as TPC-C has negligible over-head. Moreover, for 50 ms or less between checkpoints, the response time of a latency-bound workload such as WebStone remains tolerable. In all cases, the recovery time of ReViveI/O is practically negligible. The result is a cost-effective highly-available server.
UR - http://www.scopus.com/inward/record.url?scp=33748873046&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33748873046&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2006.1598129
DO - 10.1109/HPCA.2006.1598129
M3 - Conference contribution
AN - SCOPUS:33748873046
SN - 0780393686
SN - 9780780393684
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 203
EP - 214
BT - Proceedings - Twelfth International Symposium on High-Performance Computer Architecture, 2006
Y2 - 11 February 2006 through 15 February 2006
ER -