TY - GEN
T1 - Nonparametric Bayesian classification with massive datasets
T2 - 5th Statistical Problems in Particle Physics, Astrophysics and Cosmology Conference, PHYSTAT 2005
AU - Gray, Alexander
AU - Richards, Gordon
AU - Nichol, Robert
AU - Brunner, Robert
AU - Moore, Andrew
PY - 2006
Y1 - 2006
N2 - The kernel discriminant (a nonparametric Bayesian classifier) is appropriate for many scientific tasks because it is highly accurate (it approaches Bayes optimality as you get more data), distribution-free (works for arbitrary data distributions), and it is easy to inject prior domain knowledge into it and interpret what it's doing. Unfortunately, like other highly accurate classifiers, it is computationally infeasible for massive datasets. We present a fast algorithm for performing classification with the kernel discriminant exactly (i.e. without introducing any approximation error). We demonstrate its use for quasar discovery, a problem central to cosmology and astrophysics, tractably using 500K training data and 800K testing data from the Sloan Digital Sky Survey. The resulting catalog of 100K quasars significantly exceeds existing quasar catalogs in both size and quality, opening a number of new scientific possibilities, including the recent empirical confirmation of cosmic magnification which has received wide attention.
AB - The kernel discriminant (a nonparametric Bayesian classifier) is appropriate for many scientific tasks because it is highly accurate (it approaches Bayes optimality as you get more data), distribution-free (works for arbitrary data distributions), and it is easy to inject prior domain knowledge into it and interpret what it's doing. Unfortunately, like other highly accurate classifiers, it is computationally infeasible for massive datasets. We present a fast algorithm for performing classification with the kernel discriminant exactly (i.e. without introducing any approximation error). We demonstrate its use for quasar discovery, a problem central to cosmology and astrophysics, tractably using 500K training data and 800K testing data from the Sloan Digital Sky Survey. The resulting catalog of 100K quasars significantly exceeds existing quasar catalogs in both size and quality, opening a number of new scientific possibilities, including the recent empirical confirmation of cosmic magnification which has received wide attention.
UR - http://www.scopus.com/inward/record.url?scp=84894153365&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894153365&partnerID=8YFLogxK
U2 - 10.1142/9781860948985_0031
DO - 10.1142/9781860948985_0031
M3 - Conference contribution
AN - SCOPUS:84894153365
SN - 1860946496
SN - 9781860946493
T3 - Statistical Problems in Particle Physics, Astrophysics and Cosmology - Proceedings of PHYSTAT 2005
SP - 147
EP - 150
BT - Statistical Problems in Particle Physics, Astrophysics and Cosmology - Proceedings of PHYSTAT 2005
PB - Imperial College Press
Y2 - 12 September 2005 through 15 September 2005
ER -