A novel framework of multimodal human-machine or human-human interaction via real-time humanoid avatar communication is proposed for real-world mobile application. It integrates audio-visual analysis and synthesis modules to realize real-time head tracking, multichannel and runtime animations, visual TTS and real-time viseme detection and rendering. The 3-D avatar provides customized modeling for low-bit rate virtual communication by adopting M3G standard and supports MPEG-4 FAPs. A robust user head tracker and the associated head pose and motion estimation scheme are developed for real-time avatar animation control at remote locations. The framework is recognized as an effective design for realistic industrial products of human-to-human mobile communication.