In this paper, an improved model-based automatic face/head tracking algorithm is presented. The input to the system is a video sequence including a head-and-shoulders scene. The outputs are the detected global head movements and the local facial feature motions. To estimate the global head position, the 2D image coordinates of feature points are mapped to 3D by assuming the projection is approximately scaled orthographic. After this initial estimation, Kalman filter is employed to improve the temporal stability. For non-rigid local facial motion tracking, a probabilistic network is constructed to encode the information about the relative positions and velocities among various facial feature points. This network is trained in a supervised fashion and is applied later as structural constraints to incorporate with the traditional template matching method. Currently, the conditional distributions employed in the network are two-dimensional. They are obtained bp learning front front-view sequences. To apply this network to 3D face/head tracking, pose compensation must be performed based on the estimated head poses.