TY - GEN
T1 - Single-VDD and single-VT super-drowsy techniques for low-leakage high-performance instruction caches
AU - Kim, Nam Sung
AU - Flautner, Krisztián
AU - Blaauw, David
AU - Mudge, Trevor
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2004
Y1 - 2004
N2 - In this paper, we present a circuit technique that supports a super-drowsy mode with a single-VDD. In addition, we perform a detailed working set analysis for various cache line update policies for placing lines in a drowsy state. The analysis presents a policy for an instruction cache and shows it is as good as or better than more complex schemes proposed in the past. Furthermore, as an alternative to using high-threshold devices to reduce the bitline leakage through access transistors in drowsy caches, we propose a gated bitline precharge technique. A single threshold process is now sufficient. The gated precharge employs a simple but effective predictor that almost completely hides any performance loss incurred by the transitions between sub-banks. A 64-entry predictor with 3 bits per entry reduces the run-time increase by 78%, which is as effective as previous proposals that used content addressable predictors with 40 bits per entry. Overall, the combination of the proposed techniques reduces the leakage power by 72% with negligible (0.4%) run-time increase.
AB - In this paper, we present a circuit technique that supports a super-drowsy mode with a single-VDD. In addition, we perform a detailed working set analysis for various cache line update policies for placing lines in a drowsy state. The analysis presents a policy for an instruction cache and shows it is as good as or better than more complex schemes proposed in the past. Furthermore, as an alternative to using high-threshold devices to reduce the bitline leakage through access transistors in drowsy caches, we propose a gated bitline precharge technique. A single threshold process is now sufficient. The gated precharge employs a simple but effective predictor that almost completely hides any performance loss incurred by the transitions between sub-banks. A 64-entry predictor with 3 bits per entry reduces the run-time increase by 78%, which is as effective as previous proposals that used content addressable predictors with 40 bits per entry. Overall, the combination of the proposed techniques reduces the leakage power by 72% with negligible (0.4%) run-time increase.
KW - Leakage current
KW - Low power
UR - http://www.scopus.com/inward/record.url?scp=16244419042&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=16244419042&partnerID=8YFLogxK
U2 - 10.1145/1013235.1013254
DO - 10.1145/1013235.1013254
M3 - Conference contribution
AN - SCOPUS:16244419042
SN - 1581139292
SN - 9781581139297
T3 - Proceedings of the 2004 International Symposium on Lower Power Electronics and Design, ISLPED'04
SP - 54
EP - 57
BT - Proceedings of the 2004 International Symposium on Lower Power Electronics and Design, ISLPED'04
PB - Association for Computing Machinery
T2 - Proceedings of the 2004 International Symposium on Lower Power Electronics and Design, ISLPED'04
Y2 - 9 August 2004 through 11 August 2004
ER -