Prior work has demonstrated that distributional dependencies between word or morpheme-like entities in artificial and naturalistic language can detect clusters of words which broadly conform to the categories of the adult language (Brent & Siskind, 2001; Mintz, 2002; Redington & Chater, 1998). In this work, we examine the hypothesis that the distributional statistics useful for the discovery of the noun category are more useful in speech to younger children compared to older children (approximately 1–3 vs 3–6 years of age). First, using a novel method for quantifying the extent that nouns occur in mutually shared contexts, we demonstrate an advantage for speech to younger compared to older children. Second, we develop a theoretical framework for understanding why caregiver speech might be scaffolded in this way, and test its predictions against information theoretic patterns computed on child-directed speech. Our account, based on entropy maximization, and anchoring originally proposed by Cameron-Faulkner, Lieven, and Tomasello (2003), clarifies issues in incremental learning from nonstationary input—the problem faced by language learners—and paves the way toward integrating the scaffolded organization of children's early language environment into computational models of acquisition.