A confidence-aware approach for truth discovery on long-tail data

Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, Jiawei Han

Research output: Contribution to journalConference article

Abstract

In many real world applications, the same item may be described by multiple sources. As a consequence, conflicts among these sources are inevitable, which leads to an important task: how to identify which piece of information is trustworthy, i.e., the truth discovery task. Intuitively, if the piece of information is from a reliable source, then it is more trustworthy, and the source that provides trustworthy information is more reliable. Based on this principle, truth discovery approaches have been proposed to infer source reliability degrees and the most trustworthy information (i.e., the truth) simultaneously. However, existing approaches overlook the ubiquitous long-tail phenomenon in the tasks, i.e., most sources only provide a few claims and only a few sources make plenty of claims, which causes the source reliability estimation for small sources to be unreasonable. To tackle this challenge, we propose a confidence-aware truth discovery (CATD) method to automatically detect truths from conflicting data with long-tail phenomenon. The proposed method not only estimates source reliability, but also considers the confidence interval of the estimation, so that it can effectively reflect real source reliability for sources with various levels of participation. Experiments on four real world tasks as well as simulated multi-source long-tail datasets demonstrate that the proposed method outperforms existing state-of-the-art truth discovery approaches by successful discounting the effect of small sources.

Original languageEnglish (US)
Pages (from-to)425-436
Number of pages12
JournalProceedings of the VLDB Endowment
Volume8
Issue number4
DOIs
StatePublished - Dec 2014
Event3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, Korea, Republic of
Duration: Sep 11 2006Sep 11 2006

Fingerprint

Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

A confidence-aware approach for truth discovery on long-tail data. / Li, Qi; Li, Yaliang; Gao, Jing; Su, Lu; Zhao, Bo; Demirbas, Murat; Fan, Wei; Han, Jiawei.

In: Proceedings of the VLDB Endowment, Vol. 8, No. 4, 12.2014, p. 425-436.

Research output: Contribution to journalConference article

Li, Q, Li, Y, Gao, J, Su, L, Zhao, B, Demirbas, M, Fan, W & Han, J 2014, 'A confidence-aware approach for truth discovery on long-tail data', Proceedings of the VLDB Endowment, vol. 8, no. 4, pp. 425-436. https://doi.org/10.14778/2735496.2735505
Li, Qi ; Li, Yaliang ; Gao, Jing ; Su, Lu ; Zhao, Bo ; Demirbas, Murat ; Fan, Wei ; Han, Jiawei. / A confidence-aware approach for truth discovery on long-tail data. In: Proceedings of the VLDB Endowment. 2014 ; Vol. 8, No. 4. pp. 425-436.
@article{36b53aae36f14c978725bba40f394f88,
title = "A confidence-aware approach for truth discovery on long-tail data",
abstract = "In many real world applications, the same item may be described by multiple sources. As a consequence, conflicts among these sources are inevitable, which leads to an important task: how to identify which piece of information is trustworthy, i.e., the truth discovery task. Intuitively, if the piece of information is from a reliable source, then it is more trustworthy, and the source that provides trustworthy information is more reliable. Based on this principle, truth discovery approaches have been proposed to infer source reliability degrees and the most trustworthy information (i.e., the truth) simultaneously. However, existing approaches overlook the ubiquitous long-tail phenomenon in the tasks, i.e., most sources only provide a few claims and only a few sources make plenty of claims, which causes the source reliability estimation for small sources to be unreasonable. To tackle this challenge, we propose a confidence-aware truth discovery (CATD) method to automatically detect truths from conflicting data with long-tail phenomenon. The proposed method not only estimates source reliability, but also considers the confidence interval of the estimation, so that it can effectively reflect real source reliability for sources with various levels of participation. Experiments on four real world tasks as well as simulated multi-source long-tail datasets demonstrate that the proposed method outperforms existing state-of-the-art truth discovery approaches by successful discounting the effect of small sources.",
author = "Qi Li and Yaliang Li and Jing Gao and Lu Su and Bo Zhao and Murat Demirbas and Wei Fan and Jiawei Han",
year = "2014",
month = "12",
doi = "10.14778/2735496.2735505",
language = "English (US)",
volume = "8",
pages = "425--436",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "4",

}

TY - JOUR

T1 - A confidence-aware approach for truth discovery on long-tail data

AU - Li, Qi

AU - Li, Yaliang

AU - Gao, Jing

AU - Su, Lu

AU - Zhao, Bo

AU - Demirbas, Murat

AU - Fan, Wei

AU - Han, Jiawei

PY - 2014/12

Y1 - 2014/12

N2 - In many real world applications, the same item may be described by multiple sources. As a consequence, conflicts among these sources are inevitable, which leads to an important task: how to identify which piece of information is trustworthy, i.e., the truth discovery task. Intuitively, if the piece of information is from a reliable source, then it is more trustworthy, and the source that provides trustworthy information is more reliable. Based on this principle, truth discovery approaches have been proposed to infer source reliability degrees and the most trustworthy information (i.e., the truth) simultaneously. However, existing approaches overlook the ubiquitous long-tail phenomenon in the tasks, i.e., most sources only provide a few claims and only a few sources make plenty of claims, which causes the source reliability estimation for small sources to be unreasonable. To tackle this challenge, we propose a confidence-aware truth discovery (CATD) method to automatically detect truths from conflicting data with long-tail phenomenon. The proposed method not only estimates source reliability, but also considers the confidence interval of the estimation, so that it can effectively reflect real source reliability for sources with various levels of participation. Experiments on four real world tasks as well as simulated multi-source long-tail datasets demonstrate that the proposed method outperforms existing state-of-the-art truth discovery approaches by successful discounting the effect of small sources.

AB - In many real world applications, the same item may be described by multiple sources. As a consequence, conflicts among these sources are inevitable, which leads to an important task: how to identify which piece of information is trustworthy, i.e., the truth discovery task. Intuitively, if the piece of information is from a reliable source, then it is more trustworthy, and the source that provides trustworthy information is more reliable. Based on this principle, truth discovery approaches have been proposed to infer source reliability degrees and the most trustworthy information (i.e., the truth) simultaneously. However, existing approaches overlook the ubiquitous long-tail phenomenon in the tasks, i.e., most sources only provide a few claims and only a few sources make plenty of claims, which causes the source reliability estimation for small sources to be unreasonable. To tackle this challenge, we propose a confidence-aware truth discovery (CATD) method to automatically detect truths from conflicting data with long-tail phenomenon. The proposed method not only estimates source reliability, but also considers the confidence interval of the estimation, so that it can effectively reflect real source reliability for sources with various levels of participation. Experiments on four real world tasks as well as simulated multi-source long-tail datasets demonstrate that the proposed method outperforms existing state-of-the-art truth discovery approaches by successful discounting the effect of small sources.

UR - http://www.scopus.com/inward/record.url?scp=84938779276&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938779276&partnerID=8YFLogxK

U2 - 10.14778/2735496.2735505

DO - 10.14778/2735496.2735505

M3 - Conference article

AN - SCOPUS:84938779276

VL - 8

SP - 425

EP - 436

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 4

ER -