Database Learning: Toward a database that becomes smarter every time

Yongjoo Park, Ahmad Shahab Tajik, Michael Cafarella, Barzan Mozafari

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each queryreveals some degree ofknowledgeabout the answer toanother query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries. We call this novel idea-learning from past query answers-Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that Verdict supports 73.7% of these queries, speeding them up by up to 23.0× for the same accuracy level compared to existing AQP systems.

Original languageEnglish (US)
Title of host publicationSIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages587-602
Number of pages16
ISBN (Electronic)9781450341974
DOIs
StatePublished - May 9 2017
Externally publishedYes
Event2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017 - Chicago, United States
Duration: May 14 2017May 19 2017

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
VolumePart F127746
ISSN (Print)0730-8078

Other

Other2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017
Country/TerritoryUnited States
CityChicago
Period5/14/175/19/17

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Database Learning: Toward a database that becomes smarter every time'. Together they form a unique fingerprint.

Cite this