Building face models is an essential task in face recognition, tracking and etc. However, most of the current techniques require hand-labelling or special machinery such as cyberscanner to extract the face model. In the paper, we propose an unsupervised algorithm to learn the face texture from video. The proposed approach models the video sequence as a mixture of dynamic face-layers and background layers, where the dynamic face-layers may undergo 3D motions in the video. The hidden variables and their generating process is represented by probabilistic graphical model. The model is learnt by EM algorithm with variational approximation. The proposed approach offers several advantage over existing algorithms. First, it derive its learning power by a generative model which naturally represents the generating process of videos. Second, it does not require any labelling or face detection algorithm. Third, the application domain of the proposed algorithm is not restricted to extracting face texture and it can be adapted to model other objects as well. The experimental results demonstrate that our model is capable of learning the appearance model of faces with complex 3D motions in the video.