Abstract

Understanding how useful any particular set of event data might be for conflict research requires appropriate methods for assessing validity when ground truth data about the population of interest do not exist. We argue that a total error framework can provide better leverage on these critical questions than previous methods have been able to deliver. We first define a total event data error approach for identifying 19 types of error that can affect the validity of event data. We then address the challenge of applying a total error framework when authoritative ground truth about the actual distribution of relevant events is lacking. We argue that carefully constructed gold standard datasets can effectively benchmark validity problems even in the absence of ground truth data about event populations. To illustrate the limitations of conventional strategies for validating event data, we present a case study of Boko Haram activity in Nigeria over a 3-month offensive in 2015 that compares events generated by six prominent event extraction pipelines—ACLED, SCAD, ICEWS, GDELT, PETRARCH, and the Cline Center’s SPEED project. We conclude that conventional ways of assessing validity in event data using only published datasets offer little insight into potential sources of error or bias. Finally, we illustrate the benefits of validating event data using a total error approach by showing how the gold standard approach used to validate SPEED data offers a clear and robust method for detecting and evaluating the severity of temporal errors in event data.

Original languageEnglish (US)
Pages (from-to)603-624
Number of pages22
JournalAmerican Behavioral Scientist
Volume66
Issue number5
DOIs
StatePublished - May 2022

Keywords

  • event data
  • gold standard
  • ground truth
  • total error paradigm
  • validity

ASJC Scopus subject areas

  • Social Psychology
  • Cultural Studies
  • Education
  • Sociology and Political Science
  • General Social Sciences

Fingerprint

Dive into the research topics of 'A Total Error Approach for Validating Event Data'. Together they form a unique fingerprint.

Cite this