Abstract

Numerous scientific disciplines have witnessed tremendous growth in the amount of spatial data produced over the past decade. To handle the volume and velocity of such data, researchers have embraced distributed systems, which partition data among multiple nodes to provide scalability and high availability. Previous work on partitioning large spatiotemporal data focuses on bulk-ingestion and static partitioning, hence is unable to handle dynamic data and querying workloads which is common for real-time data. In this paper we develop GeoBalance as a workload-aware partitioning approach for spatiotemporal data that can adapt partitions on-the-fly without disrupting the data ingestion/retrieval process. GeoBalance employs a spatial evolutionary algorithm to incrementally tune the partitions according to a geo-aware partitioning fitness function. In addition, we perform a rolling migration from one partitioning scheme to another to ensure that data ingestion and retrieval is not compromised during the partition change period. We conduct multiple experiments using a write-intensive hybrid workload of Twitter data and random hotspots, to demonstrate that the GeoBalance partitioning approach outperforms statically defined partitions and other partitioning algorithms such as k-d tree.

Original languageEnglish (US)
Pages (from-to)67-94
Number of pages28
JournalGeoInformatica
Volume26
Issue number1
DOIs
StatePublished - Jan 2022

Keywords

  • Cloud computing
  • Dynamic data partitioning
  • Evolutionary algorithms
  • Geospatial big data
  • Spatiotemporal databases

ASJC Scopus subject areas

  • Geography, Planning and Development
  • Information Systems

Fingerprint

Dive into the research topics of 'GeoBalance: workload-aware partitioning of real-time spatiotemporal data'. Together they form a unique fingerprint.

Cite this