TY - GEN
T1 - Towards globally optimal crowdsourcing quality management
T2 - 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
AU - Sarma, Akash Das
AU - Parameswaran, Aditya
AU - Widom, Jennifer
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/6/26
Y1 - 2016/6/26
N2 - We study crowdsourcing quality management, that is, given worker responses to a set of tasks, our goal is to jointly estimate the true answers for the tasks, as well as the quality of the workers. Prior work on this problem relies primarily on applying Expectation-Maximization (EM) on the underlying maximum likelihood problem to estimate true answers as well as worker quality. Unfortunately, EM only provides a locally optimal solution rather than a globally optimal one. Other solutions to the problem (that do not leverage EM) fail to provide global optimality guarantees as well. In this paper, we focus on filtering, where tasks require the evaluation of a yes/no predicate, and rating, where tasks elicit integer scores from a finite domain. We design algorithms for finding the global optimal estimates of correct task answers and worker quality for the underlying maximum likelihood problem, and characterize the complexity of these algorithms. Our algorithms conceptually consider all mappings from tasks to true answers (typically a very large number), leveraging two key ideas to reduce, by several orders of magnitude, the number of mappings under consideration, while preserving optimality. We also demonstrate that these algorithms often find more accurate estimates than EM-based algorithms. This paper makes an important contribution towards understanding the inherent complexity of globally optimal crowdsourcing quality management.
AB - We study crowdsourcing quality management, that is, given worker responses to a set of tasks, our goal is to jointly estimate the true answers for the tasks, as well as the quality of the workers. Prior work on this problem relies primarily on applying Expectation-Maximization (EM) on the underlying maximum likelihood problem to estimate true answers as well as worker quality. Unfortunately, EM only provides a locally optimal solution rather than a globally optimal one. Other solutions to the problem (that do not leverage EM) fail to provide global optimality guarantees as well. In this paper, we focus on filtering, where tasks require the evaluation of a yes/no predicate, and rating, where tasks elicit integer scores from a finite domain. We design algorithms for finding the global optimal estimates of correct task answers and worker quality for the underlying maximum likelihood problem, and characterize the complexity of these algorithms. Our algorithms conceptually consider all mappings from tasks to true answers (typically a very large number), leveraging two key ideas to reduce, by several orders of magnitude, the number of mappings under consideration, while preserving optimality. We also demonstrate that these algorithms often find more accurate estimates than EM-based algorithms. This paper makes an important contribution towards understanding the inherent complexity of globally optimal crowdsourcing quality management.
UR - http://www.scopus.com/inward/record.url?scp=84979700847&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84979700847&partnerID=8YFLogxK
U2 - 10.1145/2882903.2882953
DO - 10.1145/2882903.2882953
M3 - Conference contribution
AN - SCOPUS:84979700847
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 47
EP - 62
BT - SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 26 June 2016 through 1 July 2016
ER -