Assessor differences and user preferences in tweet timeline generation

Yulu Wang, Garrick Sherman, Jimmy Lin, Miles Efron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In information retrieval evaluation, when presented with an effectiveness difference between two systems, there are three relevant questions one might ask. First, are the differences statistically significant? Second, is the comparison stable with respect to assessor differences? Finally, is the difference actually meaningful to a user? This paper tackles the last two questions about assessor differences and user preferences in the context of the newly-introduced tweet timeline generation task in the TREC 2014 Microblog track, where the system's goal is to construct an informative summary of non-redundant tweets that addresses the user's information need. Central to the evaluation methodology is humangenerated semantic clusters of tweets that contain substantively similar information. We show that the evaluation is stable with respect to assessor differences in clustering and that user preferences generally correlate with effectiveness metrics even though users are not explicitly aware of the semantic clustering being performed by the systems. Although our analyses are limited to this particular task, we believe that lessons learned could generalize to other evaluations based on establishing semantic equivalence between information units, such as nugget-based evaluations in question answering and temporal summarization.

Original languageEnglish (US)
Title of host publicationSIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages615-624
Number of pages10
ISBN (Electronic)9781450336215
DOIs
StatePublished - Aug 9 2015
Event38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015 - Santiago, Chile
Duration: Aug 9 2015Aug 13 2015

Publication series

NameSIGIR 2015 - Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015
Country/TerritoryChile
CitySantiago
Period8/9/158/13/15

Keywords

  • Microblog search
  • TREC evaluation
  • User study

ASJC Scopus subject areas

  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Assessor differences and user preferences in tweet timeline generation'. Together they form a unique fingerprint.

Cite this