Testing Causal Theories with Learned Proxies

Dean Knox, Christopher Lucas, Wendy K.Tam Cho

Research output: Contribution to journalReview articlepeer-review


Social scientists commonly use computational models to estimate proxies of unobserved concepts, then incorporate these proxies into subsequent tests of their theories. The consequences of this practice, which occurs in over two-Thirds of recent computational work in political science, are underappreciated. Imperfect proxies can reflect noise and contamination from other concepts, producing biased point estimates and standard errors. We demonstrate how analysts can use causal diagrams to articulate theoretical concepts and their relationships to estimated proxies, then apply straightforward rules to assess which conclusions are rigorously supportable. We formalize and extend common heuristics for quot signing the bias quot mdash a technique for reasoning about unobserved confounding mdash to scenarios with imperfect proxies. Using these tools, we demonstrate how, in often-encountered research settings, proxy-based analyses allow for valid tests for the existence and direction of theorized effects. We conclude with best-practice recommendations for the rapidly growing literature using learned proxies to test causal theories.

Original languageEnglish (US)
Pages (from-to)419-441
Number of pages23
JournalAnnual Review of Political Science
StatePublished - Feb 1 2022
Externally publishedYes


  • causal inference
  • machine learning
  • measurement
  • proxies
  • supervised learning

ASJC Scopus subject areas

  • Sociology and Political Science

Cite this