TY - GEN
T1 - Optimized data loading for a multi-terabyte sky survey repository
AU - Cai, Y. Dora
AU - Aydt, Ruth
AU - Brunner, Robert J.
N1 - Funding Information:
*This work was supported in part by the National Science Foundation grants SCI 0525308, ACI-9619019, ACI-0332116 and by NASA grants NAG5-12578 and NAG5-12580.
Publisher Copyright:
© 2005 IEEE.
PY - 2005
Y1 - 2005
N2 - Advanced instruments in a variety of scientific domains are collecting massive amounts of data that must be post-processed and organized to support research activities. Astronomers have been pioneers in the use of databases to host sky survey data. Increasing data volumes from more powerful telescopes pose enormous challenges to state-of-the-art database systems and data-loading techniques. In this paper we present SkyLoader, our novel framework for data loading that is being used to populate a multi-table, multi-terabyte database repository for the Palomar-Quest sky survey. SkyLoader consists of an efficient algorithm for bulk loading, an effective data structure to support data integrity, optimized parallelism, and guidelines for system tuning. Performance studies show the positive effects of these techniques, with load time for a 40-gigabyte data set reduced from over 20 hours to less than 3 hours. Our framework offers a promising approach for loading other large and complex scientific databases.
AB - Advanced instruments in a variety of scientific domains are collecting massive amounts of data that must be post-processed and organized to support research activities. Astronomers have been pioneers in the use of databases to host sky survey data. Increasing data volumes from more powerful telescopes pose enormous challenges to state-of-the-art database systems and data-loading techniques. In this paper we present SkyLoader, our novel framework for data loading that is being used to populate a multi-table, multi-terabyte database repository for the Palomar-Quest sky survey. SkyLoader consists of an efficient algorithm for bulk loading, an effective data structure to support data integrity, optimized parallelism, and guidelines for system tuning. Performance studies show the positive effects of these techniques, with load time for a 40-gigabyte data set reduced from over 20 hours to less than 3 hours. Our framework offers a promising approach for loading other large and complex scientific databases.
UR - http://www.scopus.com/inward/record.url?scp=33845382628&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33845382628&partnerID=8YFLogxK
U2 - 10.1109/SC.2005.50
DO - 10.1109/SC.2005.50
M3 - Conference contribution
AN - SCOPUS:33845382628
SN - 1595930612
SN - 9781595930613
T3 - Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC'05
BT - Proceedings - Thirteenth International Symposium on Temporal Representation and Reasoning, TIME 2006
T2 - ACM/IEEE 2005 Supercomputing Conference, SC'05
Y2 - 12 November 2005 through 18 November 2005
ER -