Dynamic voltage and frequency scaling (DVFS) is a well-known technique to reduce the power and/or energy consumption of various applications. While most processors provide chip-level DVFS, where the frequency and voltage of the cores in a chip can only be changed all together; core-level DVFS, where each core can be controlled independently, requires core-level voltage regulators in hardware and only is supported in production in Haswell generation among Intel processors. The finer grained control that per-core DVFS provides can lead to higher energy efficiency compared to chip-level DVFS especially for the unsynchronized, unstructured parallel applications when carefully applied. Ability to do per-core DVFS opens up new doors for different optimizations within runtime systems. We implement an intelligent energy efficient runtime module which uses a fine-grained function level per-core DVFS approach. Our module finds the energy-optimal frequency for each phase/function/kernel of the application over the first few iterations and applies the optimal frequency for each function. We test our implementation on Haswell processors and show that our algorithm enables 4% to 35% energy reduction over chip-level DVFS with as much as performance.