Within the prevailing Fisher-Neyman-Rubin framework of causal inference, causal effects are defined as comparisons of potential outcomes under different treatments. In most contexts, it is impossible or impractical to observe multiple outcomes (realizations of the variable of interest) for any given unit. Given this fundamental problem of causality (Holland 1986), experimentalists approximate the hypothetical treatment effect by comparing averages of groups or, sometimes, averages of differences of matched cases. Hence, they often use (Ȳ|t = 1) − (Ȳ|t = 0) to estimate E[(Yi|t = 1) − (Yi|t = 0)], labeling the former quantity the treatment effect or, more accurately, the average treatment effect.
The rationale for substituting group averages originates in the logic of the random assignment experiment: each unit has different potential outcomes; units are randomly assigned to one treatment or another; and, in expectation, control and treatment groups should be identically distributed. To make causal inferences in this manner requires that one unit's outcomes not be affected by another unit's treatment assignment. This requirement has come to be known as the stable unit treatment value assumption.
Until recently, experimenters have reported average treatment effects as a matter of routine. Unfortunately, this difference of averages often masks as much as it reveals. Most crucially, it ignores heterogeneity in treatment effects, whereby the treatment affects (or would affect if it were actually experienced) some units differently from others.