We present a Many-core Approach to Re-configurable Computing (MARC), enabling efficient high-performance computing for applications expressed using parallel programming models such as OpenCL. The MARC system exploits abundant special FPGA resources such as distributed block memories and DSP blocks to implement complete single-chip high efficiency many-core microarchitectures. The key benefits of MARC are that it (i) allows programmers to easily express parallelism through the API defined in a high-level programming language, (ii) supports coarse-grain multithreading and dataflow-style fine-grain threading while permitting bit-level resource control, and (iii) greatly reduces the effort required to re-purpose the hardware system for different algorithms or different applications. A MARC prototype machine with 48 processing nodes was implemented using a Virtex-5 (XCV5LX155T-2) FPGA for a well known Bayesian network inference problem. We compare the runtime of the MARC machine against a manually optimized implementation. With fully synthesized application-specific processing cores, our MARC machine comes within a factor of 3 of the performance of a fully optimized FPGA solution but with a considerable reduction in development effort and a significant increase in retargetability.