On run diversity in "evaluation as a service"

Ellen M. Voorhees, Jimmy Lin, Miles James Efron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

"Evaluation as a service" (EaaS) is a new methodology that enables community-wide evaluations and the construction of test collections on documents that cannot be distributed. The basic idea is that evaluation organizers provide a service API through which the evaluation task can be completed. However, this concept violates some of the premises of traditional pool-based collection building and thus calls into question the quality of the resulting test collection. In particular, the service API might restrict the diversity of runs that contribute to the pool: this might hamper innovation by researchers and lead to incomplete judgment pools that affect the reusability of the collection. This paper shows that the distinctiveness of the retrieval runs used to construct the first test collection built using EaaS, the TREC 2013 Microblog collection, is not substantially different from that of the TREC-8 ad hoc collection, a high-quality collection built using traditional pooling. Further analysis using the leave out uniques' test suggests that pools from the Microblog 2013 collection are less complete than those from TREC-8, although both collections benefit from the presence of distinctive and effective manual runs. Although we cannot yet generalize to all EaaS implementations, our analyses reveal no obvious aws in the test collection built using the methodology in the TREC 2013 Microblog track.

Original languageEnglish (US)
Title of host publicationSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages959-962
Number of pages4
ISBN (Print)9781450322591
DOIs
StatePublished - Jan 1 2014
Event37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014 - Gold Coast, QLD, Australia
Duration: Jul 6 2014Jul 11 2014

Publication series

NameSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014
CountryAustralia
CityGold Coast, QLD
Period7/6/147/11/14

    Fingerprint

Keywords

  • Meta-evaluation
  • Reusability
  • Test collections

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Voorhees, E. M., Lin, J., & Efron, M. J. (2014). On run diversity in "evaluation as a service". In SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 959-962). (SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery. https://doi.org/10.1145/2600428.2609484