Emotive audio-visual avatars have the potential of significantly improving the quality of Human-Computer Interaction (HCI). In this paper, the various technical approaches of a novel framework leading to a text-driven 3D Emotive Audio-Visual Avatar (EAVA) are proposed. Primary work is focused on 3D face modeling, realistic emotional facial expression animation, emotive speech synthesis, and the co-articulation of speech gestures (i.e., lip movements due to speech production) and facial expressions. Experimental results clearly indicate that a certain degree of naturalness and expressiveness has been achieved by EAVA in both audio and visual aspects. Promising potential improvements can be expected by incorporating various data-driven statistical learning models into the framework.