Optimized data loading for a multi-terabyte sky survey repository

Y. Dora Cai, Ruth Aydt, Robert J. Brunner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Advanced instruments in a variety of scientific domains are collecting massive amounts of data that must be post-processed and organized to support research activities. Astronomers have been pioneers in the use of databases to host sky survey data. Increasing data volumes from more powerful telescopes pose enormous challenges to state-of-the-art database systems and data-loading techniques. In this paper we present SkyLoader, our novel framework for data loading that is being used to populate a multi-table, multi-terabyte database repository for the Palomar-Quest sky survey. SkyLoader consists of an efficient algorithm for bulk loading, an effective data structure to support data integrity, optimized parallelism, and guidelines for system tuning. Performance studies show the positive effects of these techniques, with load time for a 40-gigabyte data set reduced from over 20 hours to less than 3 hours. Our framework offers a promising approach for loading other large and complex scientific databases.

Original languageEnglish (US)
Title of host publicationProceedings - Thirteenth International Symposium on Temporal Representation and Reasoning, TIME 2006
DOIs
StatePublished - 2005
EventACM/IEEE 2005 Supercomputing Conference, SC'05 - Seatle, WA, United States
Duration: Nov 12 2005Nov 18 2005

Publication series

NameProceedings of the ACM/IEEE 2005 Supercomputing Conference, SC'05
Volume2005

Other

OtherACM/IEEE 2005 Supercomputing Conference, SC'05
Country/TerritoryUnited States
CitySeatle, WA
Period11/12/0511/18/05

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Optimized data loading for a multi-terabyte sky survey repository'. Together they form a unique fingerprint.

Cite this