Proof-of-Work~(PoW) based blockchains typically allocate only a tiny fraction (e.g., less than 1% for Ethereum) of the average interarrival time~$\mathbbI $ between blocks for validating smart contracts present in transactions. In such systems, block validation and PoW mining are typically performed sequentially, the former by CPUs and the latter by ASICs. A trivial increase in validation time~$(t)$ introduces the popularly known Verifier's Dilemma, and as we demonstrate, causes more forking and hurts fairness. Large t also reduces the tolerance for safety against a Byzantine adversary. Solutions that offload validation to a set of non-chain nodes (a.k.a. off-chain approaches) suffer from trust and performance issues that are non-trivial to resolve. In this paper, we present Tuxedo, the first on-chain protocol to theoretically scale t/\mathbbI \approx 1$ in PoW blockchains. The key innovation in Tuxedo is to perform CPU-based block processing in \em parallel to ASIC mining. We achieve this by allowing miners to delay validation of transactions in a block by up to ? blocks, where ? is a system parameter. We perform security analysis of Tuxedo considering all possible adversarial strategies in a synchronous network with maximum end-to-end delay ?and demonstrate that Tuxedo achieves security equivalent to known results for longest chain PoW Nakamoto consensus. Our prototype implementation of Tuxedo atop Ethereum demonstrates that it can scale t without suffering the harmful effects of naïve scaling up of t/\mathbbI $ in existing blockchains.