Abstract
We consider the problem of active learning over the vertices in a graph, without feature representation. Our study is based on the common graph smoothness assumption, which is formulated in a Gaussian random field model. We analyze the probability distribution over the unlabeled vertices conditioned on the label information, which is a multivariate normal with the mean being the harmonic solution over the field. Then we select the nodes to label such that the total variance of the distribution on the unlabeled data, as well as the expected prediction error, is minimized. In this way, the classifier we obtain is theoretically more robust. Compared with existing methods, our algorithm has the advantage of selecting data in a batch offline mode with solid theoretical support. We show improved performance over existing label selection criteria on several real world data sets.
Original language | English (US) |
---|---|
Pages (from-to) | 556-564 |
Number of pages | 9 |
Journal | Journal of Machine Learning Research |
Volume | 22 |
State | Published - 2012 |
Event | 15th International Conference on Artificial Intelligence and Statistics, AISTATS 2012 - La Palma, Spain Duration: Apr 21 2012 → Apr 23 2012 |
ASJC Scopus subject areas
- Control and Systems Engineering
- Software
- Statistics and Probability
- Artificial Intelligence