CASPAR: Breaking serialization in lock-free multicore synchronization

Tanmay Gangwani, Adam Morrison, Josep Torrellas

Research output: Contribution to journalArticlepeer-review


In multicores, performance-critical synchronization is increasingly performed in a lock-free manner using atomic instructions such as CAS or LL/SC. However, when many processors synchronize on the same variable, performance can still degrade significantly. Contending writes get serialized, creating a non-scalable condition. Past proposals that build hardware queues of synchronizing processors do not fundamentally solve this problem - -at best, they help to efficiently serialize the contending writes. This paper proposes a novel architecture that breaks the serialization of hardware queues and enables the queued processors to perform lock-free synchronization in parallel. The architecture, called CASPAR, is able to (1) execute the CASes in the queued-up processors in parallel through eager forwarding of expected values, and (2) validate the CASes in parallel and dequeue groups of processors at a time. The result is highly-scalable synchronization. We evaluate CASPAR with simulations of a 64-core chip. Compared to existing proposals with hardware queues, CASPAR improves the throughput of kernels by 32% on average, and reduces the execution time of the sections considered in lock-free versions of applications by 47% on average. This makes these sections 2.5x faster than in the original applications.

Original languageEnglish (US)
Pages (from-to)789-804
Number of pages16
JournalACM SIGPLAN Notices
Issue number4
StatePublished - Apr 2016


  • Lock-free synchronization
  • Serialization

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'CASPAR: Breaking serialization in lock-free multicore synchronization'. Together they form a unique fingerprint.

Cite this