Debiasing crowdsourced batches

Honglei Zhuang, Aditya Parameswaran, Dan Roth, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Crowdsourcing is the de-facto standard for gathering annotated data. While, in theory, data annotation tasks are assumed to be attempted by workers independently, in practice, data annotation tasks are often grouped into batches to be presented and annotated by workers together, in order to save on the time or cost overhead of providing instructions or necessary background. Thus, even though independence is usually assumed between annotations on data items within the same batch, in most cases, a worker's judgment on a data item can still be affected by other data items within the batch, leading to additional errors in collected labels. In this paper, we study the data annotation bias when data items are presented as batches to be judged by workers simultaneously. We propose a novel worker model to characterize the annotating behavior on data batches, and present how to train the worker model on annotation data sets. We also present a debiasing technique to remove the effect of such annotation bias from adversely affecting the accuracy of labels obtained. Our experimental results on synthetic and real-world data sets demonstrate that our proposed method can achieve up to +57% improvement in F1-score compared to the standard majority voting baseline.

Original languageEnglish (US)
Title of host publicationKDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Number of pages10
ISBN (Electronic)9781450336642
StatePublished - Aug 10 2015
Event21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015 - Sydney, Australia
Duration: Aug 10 2015Aug 13 2015

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining


Other21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015


  • Annotation bias
  • Crowdsourcing
  • Worker accuracy model

ASJC Scopus subject areas

  • Software
  • Information Systems


Dive into the research topics of 'Debiasing crowdsourced batches'. Together they form a unique fingerprint.

Cite this