We demonstrate a video-rate stereo matching system implemented on a hybrid CPU+FPGA platform (Convey HC-1). Emerging applications such as 3D gesture recognition and automotive navigation demand fast and high quality stereo vision. We describe a custom hardware-accelerated Markov Random Field inference system for this task. Starting from a core architecture for streaming tree-reweighted message passing (TRW-S) inference, we describe the end-to-end system engineering needed to move from this single frame message update to full stereo video. We partition the stereo matching procedure across the CPU and the FPGAs, and apply both function-level pipelining and frame-level parallelism to achieve the required speed. Experimental results show that our system achieves a speed of 12 frames per second for challenging video stereo matching tasks. We note that this appears to be the first implementation of TRW-S inference at video rates, and that our system is also significantly faster than several recent GPU implementations of similar stereo inference methods based on belief propagation (BP).