Veracity analysis and object distinction

Xiaoxin Yin, Jiawei Han, Philip S. Yu

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The World Wide Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web, and different web sites often provide conflicting information on a subject. In this section we study two problems about correctness of information on the web. The first one is Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called TruthFinder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. The second problem is object distinction, i.e., how to distinguish different people or objects sharing identical names. This is a nontrivial task, especially when only very limited information is associated with each person or object. We develop a general object distinction methodology called DISTINCT, which combines two complementary measures for relational similarity: set resemblance of neighbor tuples and random walk probability, and analyze subtle linkages effectively. The method takes a set of distinguishable objects in the database as training set without seeking for manually labeled data and applies SVM to weigh different types of linkages.

Original languageEnglish (US)
Title of host publicationLink Mining
Subtitle of host publicationModels, Algorithms, and Applications
PublisherSpringer New York
Pages283-304
Number of pages22
Volume9781441965158
ISBN (Electronic)9781441965158
ISBN (Print)9781441965141
DOIs
StatePublished - Jan 1 2010

Fingerprint

Internet
Names
Databases

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Yin, X., Han, J., & Yu, P. S. (2010). Veracity analysis and object distinction. In Link Mining: Models, Algorithms, and Applications (Vol. 9781441965158, pp. 283-304). Springer New York. https://doi.org/10.1007/978-1-4419-6515-8-11

Veracity analysis and object distinction. / Yin, Xiaoxin; Han, Jiawei; Yu, Philip S.

Link Mining: Models, Algorithms, and Applications. Vol. 9781441965158 Springer New York, 2010. p. 283-304.

Research output: Chapter in Book/Report/Conference proceedingChapter

Yin, X, Han, J & Yu, PS 2010, Veracity analysis and object distinction. in Link Mining: Models, Algorithms, and Applications. vol. 9781441965158, Springer New York, pp. 283-304. https://doi.org/10.1007/978-1-4419-6515-8-11
Yin X, Han J, Yu PS. Veracity analysis and object distinction. In Link Mining: Models, Algorithms, and Applications. Vol. 9781441965158. Springer New York. 2010. p. 283-304 https://doi.org/10.1007/978-1-4419-6515-8-11
Yin, Xiaoxin ; Han, Jiawei ; Yu, Philip S. / Veracity analysis and object distinction. Link Mining: Models, Algorithms, and Applications. Vol. 9781441965158 Springer New York, 2010. pp. 283-304
@inbook{342ac462d265469988a3966d9e58d756,
title = "Veracity analysis and object distinction",
abstract = "The World Wide Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web, and different web sites often provide conflicting information on a subject. In this section we study two problems about correctness of information on the web. The first one is Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called TruthFinder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. The second problem is object distinction, i.e., how to distinguish different people or objects sharing identical names. This is a nontrivial task, especially when only very limited information is associated with each person or object. We develop a general object distinction methodology called DISTINCT, which combines two complementary measures for relational similarity: set resemblance of neighbor tuples and random walk probability, and analyze subtle linkages effectively. The method takes a set of distinguishable objects in the database as training set without seeking for manually labeled data and applies SVM to weigh different types of linkages.",
author = "Xiaoxin Yin and Jiawei Han and Yu, {Philip S.}",
year = "2010",
month = "1",
day = "1",
doi = "10.1007/978-1-4419-6515-8-11",
language = "English (US)",
isbn = "9781441965141",
volume = "9781441965158",
pages = "283--304",
booktitle = "Link Mining",
publisher = "Springer New York",

}

TY - CHAP

T1 - Veracity analysis and object distinction

AU - Yin, Xiaoxin

AU - Han, Jiawei

AU - Yu, Philip S.

PY - 2010/1/1

Y1 - 2010/1/1

N2 - The World Wide Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web, and different web sites often provide conflicting information on a subject. In this section we study two problems about correctness of information on the web. The first one is Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called TruthFinder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. The second problem is object distinction, i.e., how to distinguish different people or objects sharing identical names. This is a nontrivial task, especially when only very limited information is associated with each person or object. We develop a general object distinction methodology called DISTINCT, which combines two complementary measures for relational similarity: set resemblance of neighbor tuples and random walk probability, and analyze subtle linkages effectively. The method takes a set of distinguishable objects in the database as training set without seeking for manually labeled data and applies SVM to weigh different types of linkages.

AB - The World Wide Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web, and different web sites often provide conflicting information on a subject. In this section we study two problems about correctness of information on the web. The first one is Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called TruthFinder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. The second problem is object distinction, i.e., how to distinguish different people or objects sharing identical names. This is a nontrivial task, especially when only very limited information is associated with each person or object. We develop a general object distinction methodology called DISTINCT, which combines two complementary measures for relational similarity: set resemblance of neighbor tuples and random walk probability, and analyze subtle linkages effectively. The method takes a set of distinguishable objects in the database as training set without seeking for manually labeled data and applies SVM to weigh different types of linkages.

UR - http://www.scopus.com/inward/record.url?scp=84919838439&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84919838439&partnerID=8YFLogxK

U2 - 10.1007/978-1-4419-6515-8-11

DO - 10.1007/978-1-4419-6515-8-11

M3 - Chapter

AN - SCOPUS:84919838439

SN - 9781441965141

VL - 9781441965158

SP - 283

EP - 304

BT - Link Mining

PB - Springer New York

ER -