A MapReduce approach to Gi*(d) spatial statistic

Yan Liu, Kaichao Wu, Shaowen Wang, Yanli Zhao, Qian Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Managing and analyzing massive spatial datasets as supported by GIS and spatial analysis is becoming crucial to geospatial problem-solving and decision-making. MapReduce provides a data-centric computational model through which highly scalable spatial analysis computation can be achieved. However, it is challenging to leverage multi-dimensional spatial characteristics on the horizontally-partitioned and transparently managed MapReduce data system for improving the computational performance of spatial analysis. This paper tackles this challenge through the development of MapReduce-based computation of G i*(d) - a spatial statistic for detecting local clustering. Without exploiting spatial characteristics, Gi* (d) computation for a particular location requires pair-wise distance calculation for all points of a given dataset. A spatial locality-based storage and indexing strategy is developed to associate spatial locality with storage locality on MapReduce platform. Based on a spatial indexing method, unnecessary map tasks can be eliminated for a MapReduce job, thus significantly improving the overall computation performance. To leverage underlying parallelism on storage nodes, an application-level load balancing mechanism is developed to produce even loads among map tasks based on adaptive spatial domain decomposition. Experiments show the effectiveness of the developed storage and indexing strategy with different distance parameter settings. Significant reduction on execution time for all-point computation is observed through the use of the application-level load balancing mechanism.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010
Pages11-18
Number of pages8
DOIs
StatePublished - Dec 1 2010
Event18th ACM SIGSPATIAL International Conference on Advances in Geographic Information System, ACM SIGSPATIAL HPDGIS 2010 - San Jose, CA, United States
Duration: Nov 2 2010Nov 2 2010

Publication series

NameProceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010

Other

Other18th ACM SIGSPATIAL International Conference on Advances in Geographic Information System, ACM SIGSPATIAL HPDGIS 2010
CountryUnited States
CitySan Jose, CA
Period11/2/1011/2/10

Fingerprint

Statistics
Resource allocation
Geographic information systems
Decision making
Decomposition
Experiments

Keywords

  • Cloud computing
  • Data-centric computing
  • Spatial statistics

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Liu, Y., Wu, K., Wang, S., Zhao, Y., & Huang, Q. (2010). A MapReduce approach to Gi*(d) spatial statistic. In Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010 (pp. 11-18). (Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010). https://doi.org/10.1145/1869692.1869695

A MapReduce approach to Gi*(d) spatial statistic. / Liu, Yan; Wu, Kaichao; Wang, Shaowen; Zhao, Yanli; Huang, Qian.

Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010. 2010. p. 11-18 (Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, Y, Wu, K, Wang, S, Zhao, Y & Huang, Q 2010, A MapReduce approach to Gi*(d) spatial statistic. in Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010. Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010, pp. 11-18, 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information System, ACM SIGSPATIAL HPDGIS 2010, San Jose, CA, United States, 11/2/10. https://doi.org/10.1145/1869692.1869695
Liu Y, Wu K, Wang S, Zhao Y, Huang Q. A MapReduce approach to Gi*(d) spatial statistic. In Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010. 2010. p. 11-18. (Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010). https://doi.org/10.1145/1869692.1869695
Liu, Yan ; Wu, Kaichao ; Wang, Shaowen ; Zhao, Yanli ; Huang, Qian. / A MapReduce approach to Gi*(d) spatial statistic. Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010. 2010. pp. 11-18 (Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010).
@inproceedings{2fb0a88f46dc43379ffb8046ca8ad3e1,
title = "A MapReduce approach to Gi*(d) spatial statistic",
abstract = "Managing and analyzing massive spatial datasets as supported by GIS and spatial analysis is becoming crucial to geospatial problem-solving and decision-making. MapReduce provides a data-centric computational model through which highly scalable spatial analysis computation can be achieved. However, it is challenging to leverage multi-dimensional spatial characteristics on the horizontally-partitioned and transparently managed MapReduce data system for improving the computational performance of spatial analysis. This paper tackles this challenge through the development of MapReduce-based computation of G i*(d) - a spatial statistic for detecting local clustering. Without exploiting spatial characteristics, Gi* (d) computation for a particular location requires pair-wise distance calculation for all points of a given dataset. A spatial locality-based storage and indexing strategy is developed to associate spatial locality with storage locality on MapReduce platform. Based on a spatial indexing method, unnecessary map tasks can be eliminated for a MapReduce job, thus significantly improving the overall computation performance. To leverage underlying parallelism on storage nodes, an application-level load balancing mechanism is developed to produce even loads among map tasks based on adaptive spatial domain decomposition. Experiments show the effectiveness of the developed storage and indexing strategy with different distance parameter settings. Significant reduction on execution time for all-point computation is observed through the use of the application-level load balancing mechanism.",
keywords = "Cloud computing, Data-centric computing, Spatial statistics",
author = "Yan Liu and Kaichao Wu and Shaowen Wang and Yanli Zhao and Qian Huang",
year = "2010",
month = "12",
day = "1",
doi = "10.1145/1869692.1869695",
language = "English (US)",
isbn = "9781450304320",
series = "Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010",
pages = "11--18",
booktitle = "Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010",

}

TY - GEN

T1 - A MapReduce approach to Gi*(d) spatial statistic

AU - Liu, Yan

AU - Wu, Kaichao

AU - Wang, Shaowen

AU - Zhao, Yanli

AU - Huang, Qian

PY - 2010/12/1

Y1 - 2010/12/1

N2 - Managing and analyzing massive spatial datasets as supported by GIS and spatial analysis is becoming crucial to geospatial problem-solving and decision-making. MapReduce provides a data-centric computational model through which highly scalable spatial analysis computation can be achieved. However, it is challenging to leverage multi-dimensional spatial characteristics on the horizontally-partitioned and transparently managed MapReduce data system for improving the computational performance of spatial analysis. This paper tackles this challenge through the development of MapReduce-based computation of G i*(d) - a spatial statistic for detecting local clustering. Without exploiting spatial characteristics, Gi* (d) computation for a particular location requires pair-wise distance calculation for all points of a given dataset. A spatial locality-based storage and indexing strategy is developed to associate spatial locality with storage locality on MapReduce platform. Based on a spatial indexing method, unnecessary map tasks can be eliminated for a MapReduce job, thus significantly improving the overall computation performance. To leverage underlying parallelism on storage nodes, an application-level load balancing mechanism is developed to produce even loads among map tasks based on adaptive spatial domain decomposition. Experiments show the effectiveness of the developed storage and indexing strategy with different distance parameter settings. Significant reduction on execution time for all-point computation is observed through the use of the application-level load balancing mechanism.

AB - Managing and analyzing massive spatial datasets as supported by GIS and spatial analysis is becoming crucial to geospatial problem-solving and decision-making. MapReduce provides a data-centric computational model through which highly scalable spatial analysis computation can be achieved. However, it is challenging to leverage multi-dimensional spatial characteristics on the horizontally-partitioned and transparently managed MapReduce data system for improving the computational performance of spatial analysis. This paper tackles this challenge through the development of MapReduce-based computation of G i*(d) - a spatial statistic for detecting local clustering. Without exploiting spatial characteristics, Gi* (d) computation for a particular location requires pair-wise distance calculation for all points of a given dataset. A spatial locality-based storage and indexing strategy is developed to associate spatial locality with storage locality on MapReduce platform. Based on a spatial indexing method, unnecessary map tasks can be eliminated for a MapReduce job, thus significantly improving the overall computation performance. To leverage underlying parallelism on storage nodes, an application-level load balancing mechanism is developed to produce even loads among map tasks based on adaptive spatial domain decomposition. Experiments show the effectiveness of the developed storage and indexing strategy with different distance parameter settings. Significant reduction on execution time for all-point computation is observed through the use of the application-level load balancing mechanism.

KW - Cloud computing

KW - Data-centric computing

KW - Spatial statistics

UR - http://www.scopus.com/inward/record.url?scp=78650889463&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650889463&partnerID=8YFLogxK

U2 - 10.1145/1869692.1869695

DO - 10.1145/1869692.1869695

M3 - Conference contribution

AN - SCOPUS:78650889463

SN - 9781450304320

T3 - Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010

SP - 11

EP - 18

BT - Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, ACM SIGSPATIAL HPDGIS 2010

ER -