TY - GEN
T1 - Discriminative frequent pattern analysis for effective classification
AU - Cheng, Hong
AU - Yan, Xifeng
AU - Han, Jiawei
AU - Hsu, Chih Wei
PY - 2007
Y1 - 2007
N2 - The application of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data, text documents and graphs. In this paper, we conduct a systematic exploration of frequent pattern-based classification, and provide solid reasons supporting this methodology. It was well known that feature combinations (patterns) could capture more underlying semantics than single features. However, inclusion of infrequent patterns may not significantly improve the accuracy due to their limited predictive power. By building a connection between pattern frequency and discriminative measures such as information gain and Fisher score, we develop a strategy to set minimum support in frequent pattern mining for generating useful patterns. Based on this strategy, coupled with a proposed feature selection algorithm, discriminative frequent patterns can be generated for building high quality classifiers. We demonstrate that the frequent pattern-based classification framework can achieve good scalability and high accuracy in classifying large datasets. Empirical studies indicate that significant improvement in classification accuracy is achieved (up to 12% in UCI datasets) using the so-selected discriminative frequent patterns.
AB - The application of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data, text documents and graphs. In this paper, we conduct a systematic exploration of frequent pattern-based classification, and provide solid reasons supporting this methodology. It was well known that feature combinations (patterns) could capture more underlying semantics than single features. However, inclusion of infrequent patterns may not significantly improve the accuracy due to their limited predictive power. By building a connection between pattern frequency and discriminative measures such as information gain and Fisher score, we develop a strategy to set minimum support in frequent pattern mining for generating useful patterns. Based on this strategy, coupled with a proposed feature selection algorithm, discriminative frequent patterns can be generated for building high quality classifiers. We demonstrate that the frequent pattern-based classification framework can achieve good scalability and high accuracy in classifying large datasets. Empirical studies indicate that significant improvement in classification accuracy is achieved (up to 12% in UCI datasets) using the so-selected discriminative frequent patterns.
UR - http://www.scopus.com/inward/record.url?scp=34548741255&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34548741255&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2007.367917
DO - 10.1109/ICDE.2007.367917
M3 - Conference contribution
AN - SCOPUS:34548741255
SN - 1424408032
SN - 9781424408030
T3 - Proceedings - International Conference on Data Engineering
SP - 716
EP - 725
BT - 23rd International Conference on Data Engineering, ICDE 2007
T2 - 23rd International Conference on Data Engineering, ICDE 2007
Y2 - 15 April 2007 through 20 April 2007
ER -