In order to avoid the notorious drifting problem for tracking system, a new integrated appearance learning framework is proposed in this paper. Previous tracking frameworks with appearance learning ability [3, 11] either require supervised offline training or will fail inevitably if the tracker locks on the background. While in our framework, no of-fline training is required. Given the location of the object in the first frame of the video sequence, we model the foreground (the image patch containing the object)/background difference as the transition cost in our tracking objective function. An tracker based on Dynamic Programming (DP) and template prediction  is carried out on the pixels with high foreground-likelihood. The typical views (i.e. appearance model) proposed by the tracker are used to initialize the states of a Hidden Markov Model (HMM). With the learned HMM, the tracking results and the appearance model can be further refined until the video sequence and all of these estimated parameters/hidden variables can be well explained by the HMM. Through this iterative procedure, typical views of the object, transition probabilities between the typical views, and location of the object are simultaneously estimated with strong confidence. The experiments show that the proposed framework achieves fairly satisfied results for several challenging video sequences and there-fore has many potential applications for video analysis.