TY - GEN
T1 - Aggregation of multiple judgments for evaluating ordered lists
AU - Kim, Hyun Duk
AU - Zhai, Chengxiang
AU - Han, Jiawei
PY - 2010
Y1 - 2010
N2 - Many tasks (e.g., search and summarization) result in an ordered list of items. In order to evaluate such an ordered list of items, we need to compare it with an ideal ordered list created by a human expert for the same set of items. To reduce any bias, multiple human experts are often used to create multiple ideal ordered lists. An interesting challenge in such an evaluation method is thus how to aggregate these different ideal lists to compute a single score for an ordered list to be evaluated. In this paper, we propose three new methods for aggregating multiple order judgments to evaluate ordered lists: weighted correlation aggregation, rank-based aggregation, and frequent sequential pattern-based aggregation. Experiment results on ordering sentences for text summarization show that all the three new methods outperform the state of the art average correlation methods in terms of discriminativeness and robustness against noise. Among the three proposed methods, the frequent sequential pattern-based method performs the best due to the flexible modeling of agreements and disagreements among human experts at various levels of granularity.
AB - Many tasks (e.g., search and summarization) result in an ordered list of items. In order to evaluate such an ordered list of items, we need to compare it with an ideal ordered list created by a human expert for the same set of items. To reduce any bias, multiple human experts are often used to create multiple ideal ordered lists. An interesting challenge in such an evaluation method is thus how to aggregate these different ideal lists to compute a single score for an ordered list to be evaluated. In this paper, we propose three new methods for aggregating multiple order judgments to evaluate ordered lists: weighted correlation aggregation, rank-based aggregation, and frequent sequential pattern-based aggregation. Experiment results on ordering sentences for text summarization show that all the three new methods outperform the state of the art average correlation methods in terms of discriminativeness and robustness against noise. Among the three proposed methods, the frequent sequential pattern-based method performs the best due to the flexible modeling of agreements and disagreements among human experts at various levels of granularity.
KW - Evaluation
KW - Frequent sequential pattern mining
KW - Judgment aggregation
KW - Sentence ordering
UR - http://www.scopus.com/inward/record.url?scp=77952304509&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952304509&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-12275-0_17
DO - 10.1007/978-3-642-12275-0_17
M3 - Conference contribution
AN - SCOPUS:77952304509
SN - 3642122744
SN - 9783642122743
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 166
EP - 178
BT - Advances in Information Retrieval - 32nd European Conference on IR Research, ECIR 2010, Proceedings
PB - Springer
T2 - 32nd European Conference on Information Retrieval, ECIR 2010
Y2 - 28 March 2010 through 31 March 2010
ER -