People’s eye movements and performance accuracy were recorded as they attempted to understand sentences spoken by two talkers under two conditions: vision only and vision plus low‐intensity sound. Percent word‐correct scores were higher for the vision‐plus‐sound than for the vision‐only presentation and for the male compared to the female talker. Eye movement records showed a tendency to gaze at the talker’s eyes when the talker was not speaking, but to shift the gaze to the mouth and make long eye fixations when the talker was speaking, particularly under vision‐only conditions and for the female talker. In a task requiring verbatim word identification, people with average speech‐reading proficiency direct their gaze to the talker’s mouth most of the time during the talker’s speech production, contrary to the finding of Vatikiotis‐Bateson, Eigsti, Yano, and Munhall (1998), and they produce very long eye fixations. For these people, the gaze is drawn to the mouth, not by facial motion alone, but also on some other basis that is assumed to be prior knowledge of the location of critical visual cues, with an accompanying suppression of saccadic activity.