Abstract

Classification of an interesting class of Web pages (e.g., personal homepages, resume pages) has been an interesting problem. Typical machine learning algorithms for this problem require two classes of data for training: positive and negative training examples. However, in application to Web page classification, gathering an unbiased sample of negative examples appears to be difficult. We propose a heterogeneous learning framework for classifying Web pages, which (1) eliminates the need for negative training data, and (2) increases classification accuracy by using two heterogeneous learners. Our framework uses two heterogeneous learners - a decision list and a linear separator which complement each other - to eliminate the need for negative training data in the training phase and to increase the accuracy in the testing phase. Our results show that our heterogeneous framework achieves high accuracy without requiring negative training data; it enhances the accuracy of linear separators by reducing the errors on "low-margin data". That is, it classifies more accurately while requiring less human efforts in training.

Original languageEnglish (US)
Title of host publicationProceedings - 2002 IEEE International Conference on Data Mining, ICDM 2002
Pages538-545
Number of pages8
StatePublished - Dec 1 2002
Event2nd IEEE International Conference on Data Mining, ICDM '02 - Maebashi, Japan
Duration: Dec 9 2002Dec 12 2002

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other2nd IEEE International Conference on Data Mining, ICDM '02
CountryJapan
CityMaebashi
Period12/9/0212/12/02

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Heterogeneous learner for web page classification'. Together they form a unique fingerprint.

  • Cite this

    Yu, H., Chang, K. C. C., & Han, J. (2002). Heterogeneous learner for web page classification. In Proceedings - 2002 IEEE International Conference on Data Mining, ICDM 2002 (pp. 538-545). (Proceedings - IEEE International Conference on Data Mining, ICDM).