TY - JOUR
T1 - Robust machine learning applied to astronomical data sets. II. Quantifying photometric redshifts for quasars using instance-based learning
AU - Ball, Nicholas M.
AU - Brunner, Robert J.
AU - Myers, Adam D.
AU - Strand, Natalie E.
AU - Alberts, Stacey L.
AU - Tcheng, David
AU - Llorà, Xavier
PY - 2007/7/10
Y1 - 2007/7/10
N2 - We apply instance-based machine learning in the form of a k-nearest neighbor algorithm to the task of estimating photometric redshifts for 55,746 objects spectroscopically classified as quasars in the Fifth Data Release of the Sloan Digital Sky Survey. We compare the results obtained to those from an empirical color-redshift relation (CZR). In contrast to previously published results using CZRs, we find that the instance-based photometric redshifts are assigned with no regions of catastrophic failure. Remaining outliers are simply scattered about the ideal relation, in a manner similar to the pattern seen in the optical for normal galaxies at redshifts z ≲ 1. The instance-based algorithm is trained on a representative sample of the data and pseudo-blind-tested on the remaining unseen data. The variance between the photometric and spectroscopic redshifts is σ2 = 0.123 ± 0.002 (compared to σ2 = 0.265 ± 0.006 for the CZR), and 54.9% ± 0.7%, 73.3% ± 0.6%, and 80.7% ± 0.3% of the objects are within Δz < 0.1, 0.2, and 0.3, respectively. We also match our sample to the Second Data Release of the Galaxy Evolution Explorer legacy data, and the resulting 7642 objects show a further improvement, giving a variance of σ1 = 0.054 ± 0.005, with 70.8% ± 1.2%, 85.8% ± 1.0%, and 90.8% ± 0.7% of objects within Δz < 0.1,0.2, and 0.3. We show that the improvement is indeed due to the extra information provided by GALEX, by training on the same data set using purely SDSS photometry, which has a variance of σ2 = 0.090 ± 0.007. Each set of results represents a realistic standard for application to further data sets for which the spectra are representative.
AB - We apply instance-based machine learning in the form of a k-nearest neighbor algorithm to the task of estimating photometric redshifts for 55,746 objects spectroscopically classified as quasars in the Fifth Data Release of the Sloan Digital Sky Survey. We compare the results obtained to those from an empirical color-redshift relation (CZR). In contrast to previously published results using CZRs, we find that the instance-based photometric redshifts are assigned with no regions of catastrophic failure. Remaining outliers are simply scattered about the ideal relation, in a manner similar to the pattern seen in the optical for normal galaxies at redshifts z ≲ 1. The instance-based algorithm is trained on a representative sample of the data and pseudo-blind-tested on the remaining unseen data. The variance between the photometric and spectroscopic redshifts is σ2 = 0.123 ± 0.002 (compared to σ2 = 0.265 ± 0.006 for the CZR), and 54.9% ± 0.7%, 73.3% ± 0.6%, and 80.7% ± 0.3% of the objects are within Δz < 0.1, 0.2, and 0.3, respectively. We also match our sample to the Second Data Release of the Galaxy Evolution Explorer legacy data, and the resulting 7642 objects show a further improvement, giving a variance of σ1 = 0.054 ± 0.005, with 70.8% ± 1.2%, 85.8% ± 1.0%, and 90.8% ± 0.7% of objects within Δz < 0.1,0.2, and 0.3. We show that the improvement is indeed due to the extra information provided by GALEX, by training on the same data set using purely SDSS photometry, which has a variance of σ2 = 0.090 ± 0.007. Each set of results represents a realistic standard for application to further data sets for which the spectra are representative.
KW - Catalogs
KW - Cosmology: miscellaneous
KW - Methods: data analysis
KW - Quasars: general
UR - http://www.scopus.com/inward/record.url?scp=34547433063&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547433063&partnerID=8YFLogxK
U2 - 10.1086/518362
DO - 10.1086/518362
M3 - Article
AN - SCOPUS:34547433063
SN - 0004-637X
VL - 663
SP - 774
EP - 780
JO - Astrophysical Journal
JF - Astrophysical Journal
IS - 2 I
ER -