A Progressive Supervised-learning Approach to Generating Rich Civil Strife Data

Peter F. Nardulli, Scott Althaus, Matthew Hayes

Research output: Contribution to journalArticle

Abstract

“Big data” in the form of unstructured text pose challenges and opportunities to social scientists committed to advancing research frontiers. Because machine-based and human-centric approaches to content analysis have different strengths for extracting information from unstructured text, the authors argue for a collaborative, hybrid approach that combines their comparative advantages. The notion of a progressive supervised-learning approach that combines data science techniques and human coders is developed and illustrated using the Social, Political and Economic Event Database (SPEED) project’s Societal Stability Protocol. SPEED’s rich event data on civil strife reveal that conventional machine-based approaches for generating event data miss a great deal of within-category variance, while conventional human-based efforts to categorize periods of civil war or political instability routinely misspecify periods of calm and unrest. To demonstrate the potential of hybrid data collection methods, SPEED data on event intensities and origins are used to trace the changing role of political, socioeconomic, and sociocultural factors in generating global civil strife in the post–World War II era.

Original languageEnglish (US)
Pages (from-to)148-183
Number of pages36
JournalSociological Methodology
Volume45
Issue number1
DOIs
StatePublished - Aug 1 2015

Fingerprint

event
learning
sociocultural factors
data collection method
socioeconomic factors
political factors
social scientist
civil war
economics
content analysis
science

Keywords

  • automated learning
  • civil strife
  • content analysis
  • event data
  • unstructured data

ASJC Scopus subject areas

  • Sociology and Political Science

Cite this

A Progressive Supervised-learning Approach to Generating Rich Civil Strife Data. / Nardulli, Peter F.; Althaus, Scott; Hayes, Matthew.

In: Sociological Methodology, Vol. 45, No. 1, 01.08.2015, p. 148-183.

Research output: Contribution to journalArticle

Nardulli, Peter F. ; Althaus, Scott ; Hayes, Matthew. / A Progressive Supervised-learning Approach to Generating Rich Civil Strife Data. In: Sociological Methodology. 2015 ; Vol. 45, No. 1. pp. 148-183.
@article{63ec3ec71b6d4bb3add78f9b72637290,
title = "A Progressive Supervised-learning Approach to Generating Rich Civil Strife Data",
abstract = "“Big data” in the form of unstructured text pose challenges and opportunities to social scientists committed to advancing research frontiers. Because machine-based and human-centric approaches to content analysis have different strengths for extracting information from unstructured text, the authors argue for a collaborative, hybrid approach that combines their comparative advantages. The notion of a progressive supervised-learning approach that combines data science techniques and human coders is developed and illustrated using the Social, Political and Economic Event Database (SPEED) project’s Societal Stability Protocol. SPEED’s rich event data on civil strife reveal that conventional machine-based approaches for generating event data miss a great deal of within-category variance, while conventional human-based efforts to categorize periods of civil war or political instability routinely misspecify periods of calm and unrest. To demonstrate the potential of hybrid data collection methods, SPEED data on event intensities and origins are used to trace the changing role of political, socioeconomic, and sociocultural factors in generating global civil strife in the post–World War II era.",
keywords = "automated learning, civil strife, content analysis, event data, unstructured data",
author = "Nardulli, {Peter F.} and Scott Althaus and Matthew Hayes",
year = "2015",
month = "8",
day = "1",
doi = "10.1177/0081175015581378",
language = "English (US)",
volume = "45",
pages = "148--183",
journal = "Sociological Methodology",
issn = "0081-1750",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - A Progressive Supervised-learning Approach to Generating Rich Civil Strife Data

AU - Nardulli, Peter F.

AU - Althaus, Scott

AU - Hayes, Matthew

PY - 2015/8/1

Y1 - 2015/8/1

N2 - “Big data” in the form of unstructured text pose challenges and opportunities to social scientists committed to advancing research frontiers. Because machine-based and human-centric approaches to content analysis have different strengths for extracting information from unstructured text, the authors argue for a collaborative, hybrid approach that combines their comparative advantages. The notion of a progressive supervised-learning approach that combines data science techniques and human coders is developed and illustrated using the Social, Political and Economic Event Database (SPEED) project’s Societal Stability Protocol. SPEED’s rich event data on civil strife reveal that conventional machine-based approaches for generating event data miss a great deal of within-category variance, while conventional human-based efforts to categorize periods of civil war or political instability routinely misspecify periods of calm and unrest. To demonstrate the potential of hybrid data collection methods, SPEED data on event intensities and origins are used to trace the changing role of political, socioeconomic, and sociocultural factors in generating global civil strife in the post–World War II era.

AB - “Big data” in the form of unstructured text pose challenges and opportunities to social scientists committed to advancing research frontiers. Because machine-based and human-centric approaches to content analysis have different strengths for extracting information from unstructured text, the authors argue for a collaborative, hybrid approach that combines their comparative advantages. The notion of a progressive supervised-learning approach that combines data science techniques and human coders is developed and illustrated using the Social, Political and Economic Event Database (SPEED) project’s Societal Stability Protocol. SPEED’s rich event data on civil strife reveal that conventional machine-based approaches for generating event data miss a great deal of within-category variance, while conventional human-based efforts to categorize periods of civil war or political instability routinely misspecify periods of calm and unrest. To demonstrate the potential of hybrid data collection methods, SPEED data on event intensities and origins are used to trace the changing role of political, socioeconomic, and sociocultural factors in generating global civil strife in the post–World War II era.

KW - automated learning

KW - civil strife

KW - content analysis

KW - event data

KW - unstructured data

UR - http://www.scopus.com/inward/record.url?scp=84954565034&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84954565034&partnerID=8YFLogxK

U2 - 10.1177/0081175015581378

DO - 10.1177/0081175015581378

M3 - Article

AN - SCOPUS:84954565034

VL - 45

SP - 148

EP - 183

JO - Sociological Methodology

JF - Sociological Methodology

SN - 0081-1750

IS - 1

ER -