ORPHEUSDB: Bolton versioning for relational databases

Silu Huang, Liqi Xu, Jialin Liu, Aaron J. Elmore, Aditya Parameswaran

Research output: Contribution to journalConference articlepeer-review


Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for a system that can support dataset versioning, enabling such teams to efficiently store, track, and query across dataset versions. We introduce ORPHEUSDB, a dataset version control system that "bolts on" versioning capabilities to a traditional relational database system, thereby gaining the analytics capabilities of the database "for free". We develop and evaluate multiple data models for representing versioned data, as well as a light-weight partitioning scheme, LYRESPLIT, to further optimize the models for reduced query latencies. With LYRESPLIT, ORPHEUSDB is on average 103× faster in finding effective (and better) partitionings than competing approaches, while also reducing the latency of version retrieval by up to 20× relative to schemes without partitioning. LYRESPLIT can be applied in an online fashion as new versions are added, alongside an intelligent migration scheme that reduces migration time by 10× on average.

Original languageEnglish (US)
Pages (from-to)1130-1141
Number of pages12
JournalProceedings of the VLDB Endowment
Issue number10
StatePublished - Jun 1 2017
Event43rd International Conference on Very Large Data Bases, VLDB 2017 - Munich, Germany
Duration: Aug 28 2017Sep 1 2017

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science


Dive into the research topics of 'ORPHEUSDB: Bolton versioning for relational databases'. Together they form a unique fingerprint.

Cite this