An increasing demand for high-performance systems has been observed in the domain of both general purpose and real-time systems, pushing the industry towards a pervasive transition to multi-core platforms. Unfortunately, well-known and efficient scheduling results for single-core systems do not scale well to the multi-core domain. This justifies the adoption of more computationally intensive algorithms, but the complexity and computational overhead of these algorithms impact their applicability to real OSes. We propose an architecture to migrate the burden of multi-core scheduling to a dedicated hardware component. We show that it is possible to mitigate the overhead of complex algorithms, while achieving power efficiency and optimizing processors utilization. We develop the idea of 'active monitoring' to continuously track the evolution of scheduling parameters as tasks execute on processors. This allows reducing the gap between implementable scheduling techniques and the ideal fluid scheduling model, under the constraints of realistic hardware.