TY - GEN
T1 - Linear concepts and hidden variables
T2 - 11th Annual Conference on Neural Information Processing Systems, NIPS 1997
AU - Grove, Adam J.
AU - Roth, Dan
PY - 1998
Y1 - 1998
N2 - Some learning techniques for classification tasks work indirectly, by first tiying to fit a full probabilistic model to the observed data. Whether this is a good idea or not depends on the robustness with respect to deviations from the postulated model. We study this question experimentally in a restricted, yet non-trivial and interesting case: we consider a conditionally independent attribute (CIA) model which postulates a single binary-valued hidden variable z on which all other attributes (i.e., the target and the observables) depend. In this model, finding the most likely value of any one variable (given known values for the others) reduces to testing a linear function of the observed values. We learn CIA with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using the CIA model for classification: once the data departs from this model, performance quickly degrades and drops below that of the directly-learned linear classifier.
AB - Some learning techniques for classification tasks work indirectly, by first tiying to fit a full probabilistic model to the observed data. Whether this is a good idea or not depends on the robustness with respect to deviations from the postulated model. We study this question experimentally in a restricted, yet non-trivial and interesting case: we consider a conditionally independent attribute (CIA) model which postulates a single binary-valued hidden variable z on which all other attributes (i.e., the target and the observables) depend. In this model, finding the most likely value of any one variable (given known values for the others) reduces to testing a linear function of the observed values. We learn CIA with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using the CIA model for classification: once the data departs from this model, performance quickly degrades and drops below that of the directly-learned linear classifier.
UR - http://www.scopus.com/inward/record.url?scp=84898974674&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84898974674&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84898974674
SN - 0262100762
SN - 9780262100762
T3 - Advances in Neural Information Processing Systems
SP - 500
EP - 506
BT - Advances in Neural Information Processing Systems 10 - Proceedings of the 1997 Conference, NIPS 1997
PB - Neural information processing systems foundation
Y2 - 1 December 1997 through 6 December 1997
ER -