Nonparametric Bayesian classification with massive datasets: Large-scale quasar discovery

Alexander Gray, Gordon Richards, Robert Nichol, Robert Brunner, Andrew Moore

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The kernel discriminant (a nonparametric Bayesian classifier) is appropriate for many scientific tasks because it is highly accurate (it approaches Bayes optimality as you get more data), distribution-free (works for arbitrary data distributions), and it is easy to inject prior domain knowledge into it and interpret what it's doing. Unfortunately, like other highly accurate classifiers, it is computationally infeasible for massive datasets. We present a fast algorithm for performing classification with the kernel discriminant exactly (i.e. without introducing any approximation error). We demonstrate its use for quasar discovery, a problem central to cosmology and astrophysics, tractably using 500K training data and 800K testing data from the Sloan Digital Sky Survey. The resulting catalog of 100K quasars significantly exceeds existing quasar catalogs in both size and quality, opening a number of new scientific possibilities, including the recent empirical confirmation of cosmic magnification which has received wide attention.

Original languageEnglish (US)
Title of host publicationStatistical Problems in Particle Physics, Astrophysics and Cosmology - Proceedings of PHYSTAT 2005
PublisherImperial College Press
Pages147-150
Number of pages4
ISBN (Print)1860946496, 9781860946493
DOIs
StatePublished - 2006
Event5th Statistical Problems in Particle Physics, Astrophysics and Cosmology Conference, PHYSTAT 2005 - Oxford, United Kingdom
Duration: Sep 12 2005Sep 15 2005

Publication series

NameStatistical Problems in Particle Physics, Astrophysics and Cosmology - Proceedings of PHYSTAT 2005

Other

Other5th Statistical Problems in Particle Physics, Astrophysics and Cosmology Conference, PHYSTAT 2005
CountryUnited Kingdom
CityOxford
Period9/12/059/15/05

ASJC Scopus subject areas

  • Astronomy and Astrophysics

Fingerprint Dive into the research topics of 'Nonparametric Bayesian classification with massive datasets: Large-scale quasar discovery'. Together they form a unique fingerprint.

Cite this