TY - GEN
T1 - Object distinction
T2 - 23rd International Conference on Data Engineering, ICDE 2007
AU - Xiaoxin, Yin
AU - Jiawei, Han
AU - Allen, Gabrielle Dawn
PY - 2007
Y1 - 2007
N2 - Different people or objects may share identical names in the real world, which causes confusion in many applications. It is a nontrivial task to distinguish those objects, especially when there is only very limited information associated with each of them. In this paper, we develop a general object distinction methodology called DISTINCT, which combines two complementary measures for relational similarity: set resemblance of neighbor tuples and random walk probability, and uses SVM to weigh different types of linkages without manually labeled training data. Experiments show that DISTINCT can accurately distinguish different objects with identical names in real databases.
AB - Different people or objects may share identical names in the real world, which causes confusion in many applications. It is a nontrivial task to distinguish those objects, especially when there is only very limited information associated with each of them. In this paper, we develop a general object distinction methodology called DISTINCT, which combines two complementary measures for relational similarity: set resemblance of neighbor tuples and random walk probability, and uses SVM to weigh different types of linkages without manually labeled training data. Experiments show that DISTINCT can accurately distinguish different objects with identical names in real databases.
UR - http://www.scopus.com/inward/record.url?scp=34548791703&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34548791703&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2007.368983
DO - 10.1109/ICDE.2007.368983
M3 - Conference contribution
AN - SCOPUS:34548791703
SN - 1424408032
SN - 9781424408030
T3 - Proceedings - International Conference on Data Engineering
SP - 1242
EP - 1246
BT - 23rd International Conference on Data Engineering, ICDE 2007
PB - IEEE Computer Society
Y2 - 15 April 2007 through 20 April 2007
ER -