While deterministic replay of parallel programs is a powerful technique, current proposals have shortcomings. Specifically, software-based replay systems have high overheads on multiprocessors, while hardware-based proposals focus only on basic hardware-level mechanisms, ignoring the overall replay system. To be practical, hardware-based replay systems need to support an environment with multiple parallel jobs running concurrently - some being recorded, others being replayed and even others running without recording or replay. They also need to manage limited-size log buffers. This paper addresses these shortcomings by introducing, for the first time, a set of abstractions and a softwarehardware interface for practical hardware-assisted replay of multiprocessor systems. The approach, called Capo, introduces the novel abstraction of the Replay Sphere to separate the responsibilities of the hardware and software components of the replay system. In this paper, we also design and build CapoOne, a prototype of a deterministic multiprocessor replay system that implements Capo using Linux and simulated DeLorean hardware. Our evaluation of 4- processor executions shows that CapoOne largely records with the efficiency of hardware-based schemes and the flexibility of software-based schemes.