Assessing writing quality using crowdsourced non-expert comparative judgement ratings

Scott A. Crossley, Minkyung Kim, Qian Wan, Laura K. Allen, Rurik Tywoniw, Danielle McNamara

Research output: Contribution to journalArticlepeer-review

Abstract

This study examines the potential to use non-expert, crowd-sourced raters to score essays by comparing expert raters’ and crowd-sourced raters’ assessments of writing quality. Expert raters and crowd-sourced raters scored 400 essays using a standardised holistic rubric and comparative judgement (pairwise ratings) scoring techniques, respectively. The findings indicated that 92% of non-expert, pairwise ratings were sufficiently reliable and raters’ alignment with overall rankings was 67.9%. Additionally, the non-expert ratings were moderately correlated (r =.397) with expert ratings. Further, the linguistic features of the essays were computed to predict expert and non-expert pairwise ratings, revealing that the predictive models of essay quality for both expert and non-expert scores accounted for around 30–35% of the variance. The two models also shared similar linguistic features. The results collectively demonstrate similarities between non-expert pairwise raters and expert raters when assessing essay quality.

Original languageEnglish (US)
Pages (from-to)33-59
Number of pages27
JournalAssessment in Education: Principles, Policy and Practice
Volume32
Issue number1
DOIs
StatePublished - 2025

Keywords

  • Crowdsourcing
  • corpus linguistics
  • natural language processing
  • pairwise comparisons
  • writing assessment

ASJC Scopus subject areas

  • Education

Fingerprint

Dive into the research topics of 'Assessing writing quality using crowdsourced non-expert comparative judgement ratings'. Together they form a unique fingerprint.

Cite this