TY - GEN

T1 - Linear concepts and hidden variables

T2 - 11th Annual Conference on Neural Information Processing Systems, NIPS 1997

AU - Grove, Adam J.

AU - Roth, Dan

PY - 1998

Y1 - 1998

N2 - Some learning techniques for classification tasks work indirectly, by first tiying to fit a full probabilistic model to the observed data. Whether this is a good idea or not depends on the robustness with respect to deviations from the postulated model. We study this question experimentally in a restricted, yet non-trivial and interesting case: we consider a conditionally independent attribute (CIA) model which postulates a single binary-valued hidden variable z on which all other attributes (i.e., the target and the observables) depend. In this model, finding the most likely value of any one variable (given known values for the others) reduces to testing a linear function of the observed values. We learn CIA with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using the CIA model for classification: once the data departs from this model, performance quickly degrades and drops below that of the directly-learned linear classifier.

AB - Some learning techniques for classification tasks work indirectly, by first tiying to fit a full probabilistic model to the observed data. Whether this is a good idea or not depends on the robustness with respect to deviations from the postulated model. We study this question experimentally in a restricted, yet non-trivial and interesting case: we consider a conditionally independent attribute (CIA) model which postulates a single binary-valued hidden variable z on which all other attributes (i.e., the target and the observables) depend. In this model, finding the most likely value of any one variable (given known values for the others) reduces to testing a linear function of the observed values. We learn CIA with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using the CIA model for classification: once the data departs from this model, performance quickly degrades and drops below that of the directly-learned linear classifier.

UR - http://www.scopus.com/inward/record.url?scp=84898974674&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898974674&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84898974674

SN - 0262100762

SN - 9780262100762

T3 - Advances in Neural Information Processing Systems

SP - 500

EP - 506

BT - Advances in Neural Information Processing Systems 10 - Proceedings of the 1997 Conference, NIPS 1997

PB - Neural information processing systems foundation

Y2 - 1 December 1997 through 6 December 1997

ER -