Truth discovery with multiple conflicting information providers on the Web

Xiaoxin Yin, Jiawei Han, Philip S. Yu

Research output: Contribution to journalArticlepeer-review

Abstract

The world-wide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web. Moreover, different web sites often provide conflicting information on a subject, such as different specifications for the same product. In this paper we propose a new problem called Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called TruthFinder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. An iterative method is used to infer the trustworthiness of web sites and the correctness of information from each other. Our experiments show that TruthFinder successfully finds true facts among conflicting information, and identifies trustworthy web sites better than the popular search engines.

Original languageEnglish (US)
Article number4415269
Pages (from-to)796-808
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume20
Issue number6
DOIs
StatePublished - Jun 2008

Keywords

  • Data quality
  • Link analysis
  • Web mining

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Truth discovery with multiple conflicting information providers on the Web'. Together they form a unique fingerprint.

Cite this