S/C: Speeding up Data Materialization with Bounded Memory

Zhaoheng Li, Xinyu Pi, Yongjoo Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With data pipeline tools and the expressiveness of SQL, managing interdependent materialized views (MVs) are becoming increasingly easy. These MVs are updated repeatedly upon new data ingestion (e.g., daily), from which database admins can observe performance metrics (e.g., refresh time of each MV, size on disk) in a consistent way for different types of updates (full vs. incremental) and for different systems (single node, distributed, cloud-hosted). One missed opportunity is that existing data systems treat those MV updates as independent SQL statements without fully exploiting their dependency information and performance metrics. However, if we know that the result of a SQL statement will be consumed immediately after for subsequent operations, those subsequent operations do not have to wait until the early results are fully materialized on storage because the results are already readily available in memory. Of course, this may come at a cost because keeping those results in memory (even temporarily) will reduce the amount of available memory; thus, our decision should be careful.In this paper, we introduce a new system, called S/C, which tackles this problem through efficient creation and update of a set of MVs with acyclic dependencies among them. S/C judiciously uses bounded memory to reduce the end-to-end MV refresh time by short-circuiting expensive reads and writes; S/C's objective function accurately estimates the time savings from keeping intermediate data in memory for particular periods. Our solution jointly optimizes an MV refresh order, what data to keep in memory, and when to release the data from memory. At a high level, S/C still materializes all data exactly as defined in MV definitions; thus, it does not impact any service-level agreements. In our experiments with TPC-DS datasets (up to 1TB), we show that S/C's optimization can speedup end-to-end runtime by 1.04×-5.08× with (only) 1.6GB memory.

Original languageEnglish (US)
Title of host publicationProceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
PublisherIEEE Computer Society
Pages1981-1994
Number of pages14
ISBN (Electronic)9798350322279
DOIs
StatePublished - 2023
Externally publishedYes
Event39th IEEE International Conference on Data Engineering, ICDE 2023 - Anaheim, United States
Duration: Apr 3 2023Apr 7 2023

Publication series

NameProceedings - International Conference on Data Engineering
Volume2023-April
ISSN (Print)1084-4627

Conference

Conference39th IEEE International Conference on Data Engineering, ICDE 2023
Country/TerritoryUnited States
CityAnaheim
Period4/3/234/7/23

Keywords

  • Caching
  • Materialized-View
  • Scheduling

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'S/C: Speeding up Data Materialization with Bounded Memory'. Together they form a unique fingerprint.

Cite this