Consumers of personal devices such as desktops, tablets, or smart phones run applications based on image or video processing, as they enable a natural computer-user interaction. The challenge with these computationally demanding applications is to execute them efficiently. One way to address this problem is to use on-chip heterogeneous systems, where tasks can execute in the device where they run more efficiently. In this paper, we discuss the optimization of a feature tracking application, written in OpenCL, when running on an on-chip heterogeneous platform. Our results show that OpenCL can facilitate programming of these heterogeneous systems because it provides a unified programming paradigm and at the same time can deliver significant performance improvements. We show that, after optimization, our feature tracking application runs 3.2, 2.6, and 4.3 times faster and consumes 2.2, 3.1, and 2.7 times less energy when running on the multicore, the GPU, or both the CPU and the GPU of an Intel i7, respectively.