Evaluating rater quality and rating difficulty in online annotation activities

Peter Organisciak, Miles Efron, Katrina Fenlon, Megan Senseney

Research output: Contribution to journalArticlepeer-review


Gathering annotations from non-expert online raters is an attractive method for quickly completing large-scale annotation tasks, but the increased possibility of unreliable annotators and diminished work quality remains a cause for concern. In the context of information retrieval, where human-encoded relevance judgments underlie the evaluation of new systems and methods, the ability to quickly and reliably collect trustworthy annotations allows for quicker development and iteration of research. In the context of paid online workers, this study evaluates indicators of non-expert performance along three lines: temporality, experience, and agreement. It is found that user performance is a key indicator for future performance. Additionally, the time spent by raters familiarizing themselves with a new set of tasks is important for rater quality, as is long-term familiarity with a topic being rated. These findings may inform large-scale digital collections' use of non-expert raters for performing more purposive and affordable online annotation activities.

Original languageEnglish (US)
JournalProceedings of the ASIST Annual Meeting
Issue number1
StatePublished - 2012


  • Annotation
  • Crowdsourcing
  • Label uncertainty
  • Non-expert rating

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences


Dive into the research topics of 'Evaluating rater quality and rating difficulty in online annotation activities'. Together they form a unique fingerprint.

Cite this