TY - GEN
T1 - Modeling truth existence in truth discovery
AU - Zhi, Shi
AU - Zhao, Bo
AU - Tong, Wenzhu
AU - Gao, Jing
AU - Yu, Dian
AU - Ji, Heng
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/8/10
Y1 - 2015/8/10
N2 - When integrating information from multiple sources, it is common to encounter conflicting answers to the same question. Truth discovery is to infer the most accurate and complete integrated answers from conflicting sources. In some cases, there exist questions for which the true answers are excluded from the candidate answers provided by all sources. Without any prior knowledge, these questions, named no-truth questions, are difficult to be distinguished from the questions that have true answers, named hastruth questions. In particular, these no-truth questions degrade the precision of the answer integration system. We address such a challenge by introducing source quality, which is made up of three finegrained measures: silent rate, false spoken rate and true spoken rate. By incorporating these three measures, we propose a probabilistic graphical model, which simultaneously infers truth as well as source quality without any a priori training involving ground truth answers. Moreover, since inferring this graphical model requires parameter tuning of the prior of truth, we propose an initialization scheme based upon a quantity named truth existence score, which synthesizes two indicators, namely, participation rate and consistency rate. Compared with existing methods, our method can effectively filter out no-truth questions, which results in more accurate source quality estimation. Consequently, our method provides more accurate and complete answers to both has-truth and no-truth questions. Experiments on three real-world datasets illustrate the notable advantage of our method over existing state-of-the-art truth discovery methods.
AB - When integrating information from multiple sources, it is common to encounter conflicting answers to the same question. Truth discovery is to infer the most accurate and complete integrated answers from conflicting sources. In some cases, there exist questions for which the true answers are excluded from the candidate answers provided by all sources. Without any prior knowledge, these questions, named no-truth questions, are difficult to be distinguished from the questions that have true answers, named hastruth questions. In particular, these no-truth questions degrade the precision of the answer integration system. We address such a challenge by introducing source quality, which is made up of three finegrained measures: silent rate, false spoken rate and true spoken rate. By incorporating these three measures, we propose a probabilistic graphical model, which simultaneously infers truth as well as source quality without any a priori training involving ground truth answers. Moreover, since inferring this graphical model requires parameter tuning of the prior of truth, we propose an initialization scheme based upon a quantity named truth existence score, which synthesizes two indicators, namely, participation rate and consistency rate. Compared with existing methods, our method can effectively filter out no-truth questions, which results in more accurate source quality estimation. Consequently, our method provides more accurate and complete answers to both has-truth and no-truth questions. Experiments on three real-world datasets illustrate the notable advantage of our method over existing state-of-the-art truth discovery methods.
UR - http://www.scopus.com/inward/record.url?scp=84954101653&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84954101653&partnerID=8YFLogxK
U2 - 10.1145/2783258.2783339
DO - 10.1145/2783258.2783339
M3 - Conference contribution
AN - SCOPUS:84954101653
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1543
EP - 1552
BT - KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
Y2 - 10 August 2015 through 13 August 2015
ER -