A Progressive Supervised-learning Approach to Generating Rich Civil Strife Data

Peter F. Nardulli, Scott L. Althaus, Matthew Hayes

Research output: Contribution to journalArticle

Abstract

“Big data” in the form of unstructured text pose challenges and opportunities to social scientists committed to advancing research frontiers. Because machine-based and human-centric approaches to content analysis have different strengths for extracting information from unstructured text, the authors argue for a collaborative, hybrid approach that combines their comparative advantages. The notion of a progressive supervised-learning approach that combines data science techniques and human coders is developed and illustrated using the Social, Political and Economic Event Database (SPEED) project’s Societal Stability Protocol. SPEED’s rich event data on civil strife reveal that conventional machine-based approaches for generating event data miss a great deal of within-category variance, while conventional human-based efforts to categorize periods of civil war or political instability routinely misspecify periods of calm and unrest. To demonstrate the potential of hybrid data collection methods, SPEED data on event intensities and origins are used to trace the changing role of political, socioeconomic, and sociocultural factors in generating global civil strife in the post–World War II era.

Original languageEnglish (US)
Pages (from-to)148-183
Number of pages36
JournalSociological Methodology
Volume45
Issue number1
DOIs
StatePublished - Aug 1 2015

    Fingerprint

Keywords

  • automated learning
  • civil strife
  • content analysis
  • event data
  • unstructured data

ASJC Scopus subject areas

  • Sociology and Political Science

Cite this