A graph-based consensus maximization approach for combining multiple supervised and unsupervised models

Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, Jiawei Han

Research output: Contribution to journalArticlepeer-review

Abstract

Ensemble learning has emerged as a powerful method for combining multiple models. Well-known methods, such as bagging, boosting, and model averaging, have been shown to improve accuracy and robustness over single models. However, due to the high costs of manual labeling, it is hard to obtain sufficient and reliable labeled data for effective training. Meanwhile, lots of unlabeled data exist in these sources, and we can readily obtain multiple unsupervised models. Although unsupervised models do not directly generate a class label prediction for each object, they provide useful constraints on the joint predictions for a set of related objects. Therefore, incorporating these unsupervised models into the ensemble of supervised models can lead to better prediction performance. In this paper, we study ensemble learning with outputs from multiple supervised and unsupervised models, a topic where little work has been done. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the predictions over the graph, but penalizes the deviations from the initial labeling provided by the supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes and prove the optimality of the solution. The proposed method can be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on different applications with heterogeneous data sources demonstrate the benefits of the proposed method over existing alternatives. (More information, data, and code are available at http://www.cse.buffalo.edu/~jing/integrate.htm.)

Original languageEnglish (US)
Article number6035707
Pages (from-to)15-28
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume25
Issue number1
DOIs
StatePublished - Jan 1 2013

Keywords

  • Knowledge integration
  • clustering ensemble
  • ensemble learning
  • semi-supervised learning

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'A graph-based consensus maximization approach for combining multiple supervised and unsupervised models'. Together they form a unique fingerprint.

Cite this