TY - JOUR
T1 - Data mining and machine learning in astronomy
AU - Ball, Nicholas M.
AU - Brunner, Robert J.
N1 - Funding Information:
We thank the referee for a useful and comprehensive report. The authors acknowledge support from NASA through grants NN6066H156 and NNG06GF89G, from Microsoft Research, and from the University of Illinois. The authors made extensive use of the storage and computing facilities at the National Center for Supercomputing Applications and thank the technical staff for their assistance in enabling this work. This research has made use of the SAO/NASA Astrophysics Data System.
PY - 2010/7
Y1 - 2010/7
N2 - We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.
AB - We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.
KW - Data mining
KW - Virtual Observatory
KW - astroinformatics
KW - astrostatistics
KW - knowledge discovery in databases
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=77955253091&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955253091&partnerID=8YFLogxK
U2 - 10.1142/S0218271810017160
DO - 10.1142/S0218271810017160
M3 - Article
AN - SCOPUS:77955253091
SN - 0218-2718
VL - 19
SP - 1049
EP - 1106
JO - International Journal of Modern Physics D
JF - International Journal of Modern Physics D
IS - 7
ER -