With technology scaling leading to reliability problems and a proliferation of hardware accelerators, there is a need for cost-effective techniques to detect errors in complex datapaths. Modulo (residue) arithmetic is useful for creating a shadow datapath to check the computation of an arithmetic datapath and involves three key steps: reduction of the inputs to modulo shadow inputs, computation with those shadow values, and checking the outputs for consistency with the shadow outputs. The focus of this paper is new gate-level architectures and algorithms to reduce the cost of modulo shadow datapaths. We introduce low-cost architectures for all four key functional units in a shadow datapath: (1) a modulo reduction algorithm that generates architectures consisting entirely of full-adder standard cells; (2) minimum-area modulo adder and subtractor architectures; (3) an array-based modulo multiplier design; and (4) a modulo equality comparator that handles the residue encoding produced by the above. We compare our functional units to the previous state-of-the-art approach, observing a 12.5% reduction in area and a 47.1% reduction in delay for a 32-bit mod-3 reducer; that our reducer costs, which tend to dominate shadow datapath costs, do not increase with larger modulo bases; and that for modulo-15 and above, all of our modulo functional units have better area and delay then their previous counterparts. We also demonstrate the practicality of our approach by designing a custom shadow datapath for error detection of a multiply accumulate functional unit, which has an area overhead of only 12% for a 32-bit main datapath and 2-bit modulo-3 shadow datapath.