Carpool: A bufferless on-chip network supporting adaptive multicast and hotspot alleviation

Xiyue Xiang, Wentao Shi, Saugata Ghose, Lu Peng, Onur Mutlu, Nian Feng Tzeng

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Modern chip multiprocessors (CMPs) employ on-chip networks to enable communication between the individual cores. Operations such as coherence and synchronization generate a significant amount of the on-chip network traffic, and often create network requests that have one-to-many (i.e., a core multicasting a message to several cores) or many-to-one (i.e., several cores sending the same message to a common hotspot destination core) Thows. As the number of cores in a CMP increases, one-to-many and many-toone thows result in greater congestion on the network. To alleviate this congestion, prior work provides hardware support for efficient one-to-many and many-to-one flows in buffered on-chip networks. Unfortunately, this hardware support cannot be used in bufferless on-chip networks, which are shown to have lower hardware complexity and higher energy efficiency than buffered networks, and thus are likely a good fit for large-scale CMPs. We propose Carpool, the first bufferless on-chip network optimized for one-to-many (i.e., multicast) and many-to-one (i.e., hotspot) traffic. Carpool is based on three key ideas: it (1) adaptively forks multicast flit replicas; (2) merges hotspot flits; and (3) employs a novel parallel port allocation mechanism within its routers, which reduces the router critical path latency by 5.7% over a bufferless network router without multicast support. We evaluate Carpool using synthetic traffic workloads that emulate the range of rates at which multithreaded applications inject multicast and hotspot requests due to coherence and synchronization. Our evaluation shows that for an 8×8 mesh network, Carpool reduces the average packet latency by 43.1% and power consumption by 8.3% over a bufferless network without multicast or hotspot support. We also find that Carpool reduces the average packet latency by 26.4% and power consumption by 50.5% over a buffered network with multicast support, while consuming 63.5% less area for each router.

Original languageEnglish (US)
Title of host publicationICS 2017
Subtitle of host publicationInternational Conference on Supercomputing
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450350204
StatePublished - Jun 14 2017
Externally publishedYes
Event31st ACM International Conference on Supercomputing, ICS 2017 - Chicago, United States
Duration: Jun 13 2017Jun 16 2017

Publication series

NameProceedings of the International Conference on Supercomputing
VolumePart F128411


Other31st ACM International Conference on Supercomputing, ICS 2017
Country/TerritoryUnited States


  • Bufferless networks
  • Coherence
  • Deflection routing
  • Hotspot traffic
  • Multicast
  • On-chip networks
  • Router design
  • Synchronization

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'Carpool: A bufferless on-chip network supporting adaptive multicast and hotspot alleviation'. Together they form a unique fingerprint.

Cite this