A desirable feature of a development tool for SoC design is that, given the important applications in the domain to be targeted by the SoC, a powerful hardware-software partitioning engine is available to determine which function(s) shall be mapped to hardware. However, to provide high-quality partitioning, this engine must be able to consider a rich design space of possible alternate hardware and software implementations for each program region candidate for hardware acceleration, in turn making the task of finding the optimal mapping very difficult given the number of design points to consider and the need for accurate modeling of latency, power and area. In this work we propose a novel framework to enable hardware acceleration of performance-critical parts of an application, by addressing the problem of hardware/software partitioning under power and area constraints to minimize the overall program latency. Our flow is based on the LLVM compiler, and focuses on building a scalable compile-Time partitioning algorithm while considering large sets of alternative hardware and software implementations for a particular region. To this end we develop a hybrid approach based on mixing semi-random selection of hardware design points and an Integer Linear Programming formulation of the mapping decision, along with iterative refinements of the solution. Experimental results demonstrate the capability of our approach to consider complex designs and yet output near-optimal partitioning decision. Our package is named RIP (Randomized ILP-based Partitioning), and is open source to benefit the research community.