TY - GEN
T1 - Scalable, accurate multicore simulation in the 1000-core era
AU - Lis, Mieszko
AU - Ren, Pengju
AU - Cho, Myong Hyon
AU - Shim, Keun Sup
AU - Fletcher, Christopher W.
AU - Khan, Omer
AU - Devadas, Srinivas
PY - 2011
Y1 - 2011
N2 - We present HORNET, a parallel, highly configurable, cycle-level multicore simulator based on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine offers cycle-accurate as well as periodic synchronization; while preserving functional accuracy, this permits tradeoffs between perfect timing accuracy and high speed with very good accuracy. When run on 6 separate physical cores on a single die, speedups can exceed a factor of over 5, and when run on a two-die 12-core system with 2-way hyperthreading, speedups exceed 11x. Most hardware parameters are configurable, including memory hierarchy, interconnect geometry, bandwidth, crossbar dimensions, and parameters driving power and thermal effects. A highly parametrized table-based NoC design allows a variety of routing and virtual channel allocation algorithms out of the box, ranging from simple DOR routing to complex Valiant, ROMM, or PROM schemes, BSOR, and adaptive routing. HORNET can run in network-only mode using synthetic traffic or traces, directly emulate a MIPS-based multicore, or function as the memory subsystem for native applications executed under the Pin instrumentation tool. HORNET is freely available under the open-source MIT license at http://csg.csail.mit.edu/hornet/.
AB - We present HORNET, a parallel, highly configurable, cycle-level multicore simulator based on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine offers cycle-accurate as well as periodic synchronization; while preserving functional accuracy, this permits tradeoffs between perfect timing accuracy and high speed with very good accuracy. When run on 6 separate physical cores on a single die, speedups can exceed a factor of over 5, and when run on a two-die 12-core system with 2-way hyperthreading, speedups exceed 11x. Most hardware parameters are configurable, including memory hierarchy, interconnect geometry, bandwidth, crossbar dimensions, and parameters driving power and thermal effects. A highly parametrized table-based NoC design allows a variety of routing and virtual channel allocation algorithms out of the box, ranging from simple DOR routing to complex Valiant, ROMM, or PROM schemes, BSOR, and adaptive routing. HORNET can run in network-only mode using synthetic traffic or traces, directly emulate a MIPS-based multicore, or function as the memory subsystem for native applications executed under the Pin instrumentation tool. HORNET is freely available under the open-source MIT license at http://csg.csail.mit.edu/hornet/.
UR - http://www.scopus.com/inward/record.url?scp=79957509670&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79957509670&partnerID=8YFLogxK
U2 - 10.1109/ISPASS.2011.5762734
DO - 10.1109/ISPASS.2011.5762734
M3 - Conference contribution
AN - SCOPUS:79957509670
SN - 9781612843681
T3 - ISPASS 2011 - IEEE International Symposium on Performance Analysis of Systems and Software
SP - 175
EP - 185
BT - ISPASS 2011 - IEEE International Symposium on Performance Analysis of Systems and Software
T2 - IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2011
Y2 - 10 April 2011 through 12 April 2011
ER -