A new object tracking framework with online appearance learning ability is proposed in this paper. The object appearances are modeled as a set of Probability Mass Functions (PMF), defined as "object-model-set". The averaged object appearance in the video is also modeled as a PMF, named as "universal model". Given an initial template of the target object, which is the only element in the initial object-model-set, the framework tries to track the object by looking into the whole input video sequence. The Dynamic Programming (DP) is applied to achieve a best spacial-scale matching between the observations and the current model set, across the whole input video. The object-model-set is iteratively updated if the prediction of the matched image patch using current object-model-set is less than its prior computed using universal model. The PMF's of such matched image patches are added into the object-model-set. Thus the object appearance, which is modeled as the set of PMF's of typical views, is learned online. The tracking results can be further refined given the updated object-model-set. This make the proposed tracking framework robust to the appearance variation caused by 3d motion, partial occlusion and illumination. Also, the learned typical views facilitate other vision tasks such as recognition or 3d reconstruction. Tracking results and the learned typical view on the challenging video sequence experimentally show the robustness and strong online learning ability of the proposed frame work.