TY - GEN
T1 - Heterogeneous learner for web page classification
AU - Yu, Hwanjo
AU - Chang, Kevin Chen Chuan
AU - Han, Jiawei
PY - 2002
Y1 - 2002
N2 - Classification of an interesting class of Web pages (e.g., personal homepages, resume pages) has been an interesting problem. Typical machine learning algorithms for this problem require two classes of data for training: positive and negative training examples. However, in application to Web page classification, gathering an unbiased sample of negative examples appears to be difficult. We propose a heterogeneous learning framework for classifying Web pages, which (1) eliminates the need for negative training data, and (2) increases classification accuracy by using two heterogeneous learners. Our framework uses two heterogeneous learners - a decision list and a linear separator which complement each other - to eliminate the need for negative training data in the training phase and to increase the accuracy in the testing phase. Our results show that our heterogeneous framework achieves high accuracy without requiring negative training data; it enhances the accuracy of linear separators by reducing the errors on "low-margin data". That is, it classifies more accurately while requiring less human efforts in training.
AB - Classification of an interesting class of Web pages (e.g., personal homepages, resume pages) has been an interesting problem. Typical machine learning algorithms for this problem require two classes of data for training: positive and negative training examples. However, in application to Web page classification, gathering an unbiased sample of negative examples appears to be difficult. We propose a heterogeneous learning framework for classifying Web pages, which (1) eliminates the need for negative training data, and (2) increases classification accuracy by using two heterogeneous learners. Our framework uses two heterogeneous learners - a decision list and a linear separator which complement each other - to eliminate the need for negative training data in the training phase and to increase the accuracy in the testing phase. Our results show that our heterogeneous framework achieves high accuracy without requiring negative training data; it enhances the accuracy of linear separators by reducing the errors on "low-margin data". That is, it classifies more accurately while requiring less human efforts in training.
UR - http://www.scopus.com/inward/record.url?scp=35248847837&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35248847837&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:35248847837
SN - 0769517544
SN - 9780769517544
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 538
EP - 545
BT - Proceedings - 2002 IEEE International Conference on Data Mining, ICDM 2002
T2 - 2nd IEEE International Conference on Data Mining, ICDM '02
Y2 - 9 December 2002 through 12 December 2002
ER -