ORPHEUSDB: Bolton versioning for relational databases

Silu Huang, Liqi Xu, Jialin Liu, Aaron J. Elmore, Aditya G Parameswaran

Research output: Contribution to journalConference article

Abstract

Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for a system that can support dataset versioning, enabling such teams to efficiently store, track, and query across dataset versions. We introduce ORPHEUSDB, a dataset version control system that "bolts on" versioning capabilities to a traditional relational database system, thereby gaining the analytics capabilities of the database "for free". We develop and evaluate multiple data models for representing versioned data, as well as a light-weight partitioning scheme, LYRESPLIT, to further optimize the models for reduced query latencies. With LYRESPLIT, ORPHEUSDB is on average 103× faster in finding effective (and better) partitionings than competing approaches, while also reducing the latency of version retrieval by up to 20× relative to schemes without partitioning. LYRESPLIT can be applied in an online fashion as new versions are added, alongside an intelligent migration scheme that reduces migration time by 10× on average.

Original languageEnglish (US)
Pages (from-to)1130-1141
Number of pages12
JournalProceedings of the VLDB Endowment
Volume10
Issue number10
DOIs
StatePublished - Jun 1 2017
Event43rd International Conference on Very Large Data Bases, VLDB 2017 - Munich, Germany
Duration: Aug 28 2017Sep 1 2017

Fingerprint

Relational database systems
Bolts
Data structures
Control systems

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Huang, S., Xu, L., Liu, J., Elmore, A. J., & Parameswaran, A. G. (2017). ORPHEUSDB: Bolton versioning for relational databases. Proceedings of the VLDB Endowment, 10(10), 1130-1141. https://doi.org/10.14778/3115404.3115417

ORPHEUSDB : Bolton versioning for relational databases. / Huang, Silu; Xu, Liqi; Liu, Jialin; Elmore, Aaron J.; Parameswaran, Aditya G.

In: Proceedings of the VLDB Endowment, Vol. 10, No. 10, 01.06.2017, p. 1130-1141.

Research output: Contribution to journalConference article

Huang, S, Xu, L, Liu, J, Elmore, AJ & Parameswaran, AG 2017, 'ORPHEUSDB: Bolton versioning for relational databases', Proceedings of the VLDB Endowment, vol. 10, no. 10, pp. 1130-1141. https://doi.org/10.14778/3115404.3115417
Huang, Silu ; Xu, Liqi ; Liu, Jialin ; Elmore, Aaron J. ; Parameswaran, Aditya G. / ORPHEUSDB : Bolton versioning for relational databases. In: Proceedings of the VLDB Endowment. 2017 ; Vol. 10, No. 10. pp. 1130-1141.
@article{5ec3fe3ca1f9475aa01a694609982134,
title = "ORPHEUSDB: Bolton versioning for relational databases",
abstract = "Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for a system that can support dataset versioning, enabling such teams to efficiently store, track, and query across dataset versions. We introduce ORPHEUSDB, a dataset version control system that {"}bolts on{"} versioning capabilities to a traditional relational database system, thereby gaining the analytics capabilities of the database {"}for free{"}. We develop and evaluate multiple data models for representing versioned data, as well as a light-weight partitioning scheme, LYRESPLIT, to further optimize the models for reduced query latencies. With LYRESPLIT, ORPHEUSDB is on average 103× faster in finding effective (and better) partitionings than competing approaches, while also reducing the latency of version retrieval by up to 20× relative to schemes without partitioning. LYRESPLIT can be applied in an online fashion as new versions are added, alongside an intelligent migration scheme that reduces migration time by 10× on average.",
author = "Silu Huang and Liqi Xu and Jialin Liu and Elmore, {Aaron J.} and Parameswaran, {Aditya G}",
year = "2017",
month = "6",
day = "1",
doi = "10.14778/3115404.3115417",
language = "English (US)",
volume = "10",
pages = "1130--1141",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "10",

}

TY - JOUR

T1 - ORPHEUSDB

T2 - Bolton versioning for relational databases

AU - Huang, Silu

AU - Xu, Liqi

AU - Liu, Jialin

AU - Elmore, Aaron J.

AU - Parameswaran, Aditya G

PY - 2017/6/1

Y1 - 2017/6/1

N2 - Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for a system that can support dataset versioning, enabling such teams to efficiently store, track, and query across dataset versions. We introduce ORPHEUSDB, a dataset version control system that "bolts on" versioning capabilities to a traditional relational database system, thereby gaining the analytics capabilities of the database "for free". We develop and evaluate multiple data models for representing versioned data, as well as a light-weight partitioning scheme, LYRESPLIT, to further optimize the models for reduced query latencies. With LYRESPLIT, ORPHEUSDB is on average 103× faster in finding effective (and better) partitionings than competing approaches, while also reducing the latency of version retrieval by up to 20× relative to schemes without partitioning. LYRESPLIT can be applied in an online fashion as new versions are added, alongside an intelligent migration scheme that reduces migration time by 10× on average.

AB - Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for a system that can support dataset versioning, enabling such teams to efficiently store, track, and query across dataset versions. We introduce ORPHEUSDB, a dataset version control system that "bolts on" versioning capabilities to a traditional relational database system, thereby gaining the analytics capabilities of the database "for free". We develop and evaluate multiple data models for representing versioned data, as well as a light-weight partitioning scheme, LYRESPLIT, to further optimize the models for reduced query latencies. With LYRESPLIT, ORPHEUSDB is on average 103× faster in finding effective (and better) partitionings than competing approaches, while also reducing the latency of version retrieval by up to 20× relative to schemes without partitioning. LYRESPLIT can be applied in an online fashion as new versions are added, alongside an intelligent migration scheme that reduces migration time by 10× on average.

UR - http://www.scopus.com/inward/record.url?scp=85029564496&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029564496&partnerID=8YFLogxK

U2 - 10.14778/3115404.3115417

DO - 10.14778/3115404.3115417

M3 - Conference article

AN - SCOPUS:85029564496

VL - 10

SP - 1130

EP - 1141

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 10

ER -