Fast Compaction Algorithms for NoSQL Databases

Mainak Ghosh, Indranil Gupta, Shalmoli Gupta, Nirman Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Compaction plays a crucial role in NoSQL systems to ensure a high overall read throughput. In this work, we formally define compaction as an optimization problem that attempts to minimize disk I/O. We prove this problem to be NPHard. We then propose a set of algorithms and mathematically analyze upper bounds on worst-case cost. We evaluate the proposed algorithms on real-life workloads. Our results show that our algorithms incur low I/O costs and that a compaction approach using a balanced tree is most preferable.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages452-461
Number of pages10
ISBN (Electronic)9781467372145
DOIs
StatePublished - Jul 22 2015
Event35th IEEE International Conference on Distributed Computing Systems, ICDCS 2015 - Columbus, United States
Duration: Jun 29 2015Jul 2 2015

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2015-July

Other

Other35th IEEE International Conference on Distributed Computing Systems, ICDCS 2015
CountryUnited States
CityColumbus
Period6/29/157/2/15

Fingerprint

Compaction
Costs
Throughput

Keywords

  • compaction
  • greedy approximation algorithm
  • nosql
  • np-hard

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Ghosh, M., Gupta, I., Gupta, S., & Kumar, N. (2015). Fast Compaction Algorithms for NoSQL Databases. In Proceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015 (pp. 452-461). [7164931] (Proceedings - International Conference on Distributed Computing Systems; Vol. 2015-July). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDCS.2015.53

Fast Compaction Algorithms for NoSQL Databases. / Ghosh, Mainak; Gupta, Indranil; Gupta, Shalmoli; Kumar, Nirman.

Proceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 452-461 7164931 (Proceedings - International Conference on Distributed Computing Systems; Vol. 2015-July).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ghosh, M, Gupta, I, Gupta, S & Kumar, N 2015, Fast Compaction Algorithms for NoSQL Databases. in Proceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015., 7164931, Proceedings - International Conference on Distributed Computing Systems, vol. 2015-July, Institute of Electrical and Electronics Engineers Inc., pp. 452-461, 35th IEEE International Conference on Distributed Computing Systems, ICDCS 2015, Columbus, United States, 6/29/15. https://doi.org/10.1109/ICDCS.2015.53
Ghosh M, Gupta I, Gupta S, Kumar N. Fast Compaction Algorithms for NoSQL Databases. In Proceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 452-461. 7164931. (Proceedings - International Conference on Distributed Computing Systems). https://doi.org/10.1109/ICDCS.2015.53
Ghosh, Mainak ; Gupta, Indranil ; Gupta, Shalmoli ; Kumar, Nirman. / Fast Compaction Algorithms for NoSQL Databases. Proceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 452-461 (Proceedings - International Conference on Distributed Computing Systems).
@inproceedings{ed44e77bbc3046eda8211f2cf0b05ab3,
title = "Fast Compaction Algorithms for NoSQL Databases",
abstract = "Compaction plays a crucial role in NoSQL systems to ensure a high overall read throughput. In this work, we formally define compaction as an optimization problem that attempts to minimize disk I/O. We prove this problem to be NPHard. We then propose a set of algorithms and mathematically analyze upper bounds on worst-case cost. We evaluate the proposed algorithms on real-life workloads. Our results show that our algorithms incur low I/O costs and that a compaction approach using a balanced tree is most preferable.",
keywords = "compaction, greedy approximation algorithm, nosql, np-hard",
author = "Mainak Ghosh and Indranil Gupta and Shalmoli Gupta and Nirman Kumar",
year = "2015",
month = "7",
day = "22",
doi = "10.1109/ICDCS.2015.53",
language = "English (US)",
series = "Proceedings - International Conference on Distributed Computing Systems",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "452--461",
booktitle = "Proceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015",
address = "United States",

}

TY - GEN

T1 - Fast Compaction Algorithms for NoSQL Databases

AU - Ghosh, Mainak

AU - Gupta, Indranil

AU - Gupta, Shalmoli

AU - Kumar, Nirman

PY - 2015/7/22

Y1 - 2015/7/22

N2 - Compaction plays a crucial role in NoSQL systems to ensure a high overall read throughput. In this work, we formally define compaction as an optimization problem that attempts to minimize disk I/O. We prove this problem to be NPHard. We then propose a set of algorithms and mathematically analyze upper bounds on worst-case cost. We evaluate the proposed algorithms on real-life workloads. Our results show that our algorithms incur low I/O costs and that a compaction approach using a balanced tree is most preferable.

AB - Compaction plays a crucial role in NoSQL systems to ensure a high overall read throughput. In this work, we formally define compaction as an optimization problem that attempts to minimize disk I/O. We prove this problem to be NPHard. We then propose a set of algorithms and mathematically analyze upper bounds on worst-case cost. We evaluate the proposed algorithms on real-life workloads. Our results show that our algorithms incur low I/O costs and that a compaction approach using a balanced tree is most preferable.

KW - compaction

KW - greedy approximation algorithm

KW - nosql

KW - np-hard

UR - http://www.scopus.com/inward/record.url?scp=84944314724&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944314724&partnerID=8YFLogxK

U2 - 10.1109/ICDCS.2015.53

DO - 10.1109/ICDCS.2015.53

M3 - Conference contribution

AN - SCOPUS:84944314724

T3 - Proceedings - International Conference on Distributed Computing Systems

SP - 452

EP - 461

BT - Proceedings - 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -