TY - GEN
T1 - An efficient multi-relational naïve Bayesian classifier based on semantic relationship graph
AU - Liu, Hongyan
AU - Yin, Xiaoxin
AU - Han, Jiawei
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant No. 70471006 and 70321001, and by the U.S. National Science Foundation NSF IIS-02-09199 and IIS-03-08215.
PY - 2005/8/21
Y1 - 2005/8/21
N2 - Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple tables and managed by relational database systems. As transferring data from multiple tables into a single one usually causes many problems, development of multi-relational classification algorithms becomes important and attracts many researchers' interests. Existing works about extending Naïve Bayes to deal with multi-relational data either have to transform data stored in tables to mainmemory Prolog facts, or limit the search space to only a small subset of real world applications. In this work, we aim at solving these problems and building an efficient, accurate Naïve Bayesian classifier to deal with data in multiple tables directly. We propose an algorithm named Graph-NB, which upgrades Naïve Bayesian classifier to deal with multiple tables directly. In order to take advantage of linkage relationships among tables, and treat different tables linked to the target table differently, a semantic relationship graph is developed to describe the relationship and to avoid unnecessary joins. Furthermore, to improve accuracy, a pruning strategy is given to simplify the graph to avoid examining too many weakly linked tables. Experimental study on both realworld and synthetic databases shows its high efficiency and good accuracy.
AB - Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple tables and managed by relational database systems. As transferring data from multiple tables into a single one usually causes many problems, development of multi-relational classification algorithms becomes important and attracts many researchers' interests. Existing works about extending Naïve Bayes to deal with multi-relational data either have to transform data stored in tables to mainmemory Prolog facts, or limit the search space to only a small subset of real world applications. In this work, we aim at solving these problems and building an efficient, accurate Naïve Bayesian classifier to deal with data in multiple tables directly. We propose an algorithm named Graph-NB, which upgrades Naïve Bayesian classifier to deal with multiple tables directly. In order to take advantage of linkage relationships among tables, and treat different tables linked to the target table differently, a semantic relationship graph is developed to describe the relationship and to avoid unnecessary joins. Furthermore, to improve accuracy, a pruning strategy is given to simplify the graph to avoid examining too many weakly linked tables. Experimental study on both realworld and synthetic databases shows its high efficiency and good accuracy.
KW - Classification
KW - Data Mining
KW - Naïve Bayes
UR - http://www.scopus.com/inward/record.url?scp=84957882560&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84957882560&partnerID=8YFLogxK
U2 - 10.1145/1090193.1090200
DO - 10.1145/1090193.1090200
M3 - Conference contribution
AN - SCOPUS:84957882560
T3 - Proceedings of the 4th International Workshop on Multi-Relational Data Mining, MRDM 2005 - 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2005
SP - 39
EP - 48
BT - Proceedings of the 4th International Workshop on Multi-Relational Data Mining, MRDM 2005 - 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2005
A2 - Blockeel, Hendrik
A2 - Dzeroski, Saso
PB - Association for Computing Machinery, Inc
T2 - 4th International Workshop on Multi-Relational Data Mining, MRDM 2005
Y2 - 21 August 2005
ER -