Spatiotemporal transformation of social media geostreams: A case study of Twitter for flu risk analysis

Myung Hwa Hwang, Shaowen Wang, Guofeng Cao, Anand Padmanabhan, Zhenhua Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Georeferenced social media data streams (social media geostreams) are providing promising opportunities to gain new insights into spatiotemporal aspects of human interactions on cyber space and their relation with real-world activities. In particular, such opportunities are motivating public health researchers to improve the surveillance of disease epidemics by means of spatiotemporal analysis of social media geostreams. One essential requirement in achieving such geostream-based disease surveillance is to establish scalable data infrastructures capable of real-time transformation of massive geostreams into spatiotemporally organized data to which analytical methods are readily applicable. To fulfill this requirement, this study develops a data pipeline solution where multiple computational components are integrated to collect, process, and aggregate social media geostreams in near real time. As a test case, this solution focuses on one well-known social media geostream, the Twitter data stream, and one type of disease epidemics, the flu. The pipeline solution facilitates multiscale spatiotemporal analysis of flu risks by collecting geotagged tweets from the Twitter Streaming API, identifying flu-related tweets through keyword match, aggregating tweets at multiple spatial granularities in near real time, and storing tweets and the aggregate statistics in a distributed NoSQL database. Although developed for the surveillance of flu epidemics, the pipeline would serve as a general framework for building scalable data infrastructures that can support real-time spatiotemporal analysis of social media geostreams in the application domains beyond disease mapping and public health.

Original languageEnglish (US)
Title of host publicationProceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013
Pages12-21
Number of pages10
DOIs
StatePublished - Dec 1 2013
Event4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013 - Orlando, FL, United States
Duration: Nov 5 2013Nov 5 2013

Publication series

NameProceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013

Other

Other4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013
CountryUnited States
CityOrlando, FL
Period11/5/1311/5/13

Fingerprint

Risk analysis
Pipelines
Public health
Application programming interfaces (API)
Statistics

Keywords

  • data pipeline
  • disease surveillance
  • social media geostreams
  • spatiotemporal analysis

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design

Cite this

Hwang, M. H., Wang, S., Cao, G., Padmanabhan, A., & Zhang, Z. (2013). Spatiotemporal transformation of social media geostreams: A case study of Twitter for flu risk analysis. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013 (pp. 12-21). (Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013). https://doi.org/10.1145/2534303.2534310

Spatiotemporal transformation of social media geostreams : A case study of Twitter for flu risk analysis. / Hwang, Myung Hwa; Wang, Shaowen; Cao, Guofeng; Padmanabhan, Anand; Zhang, Zhenhua.

Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013. 2013. p. 12-21 (Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hwang, MH, Wang, S, Cao, G, Padmanabhan, A & Zhang, Z 2013, Spatiotemporal transformation of social media geostreams: A case study of Twitter for flu risk analysis. in Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013. Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013, pp. 12-21, 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013, Orlando, FL, United States, 11/5/13. https://doi.org/10.1145/2534303.2534310
Hwang MH, Wang S, Cao G, Padmanabhan A, Zhang Z. Spatiotemporal transformation of social media geostreams: A case study of Twitter for flu risk analysis. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013. 2013. p. 12-21. (Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013). https://doi.org/10.1145/2534303.2534310
Hwang, Myung Hwa ; Wang, Shaowen ; Cao, Guofeng ; Padmanabhan, Anand ; Zhang, Zhenhua. / Spatiotemporal transformation of social media geostreams : A case study of Twitter for flu risk analysis. Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013. 2013. pp. 12-21 (Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013).
@inproceedings{05f11a4b35674cff868d6cd57011d89e,
title = "Spatiotemporal transformation of social media geostreams: A case study of Twitter for flu risk analysis",
abstract = "Georeferenced social media data streams (social media geostreams) are providing promising opportunities to gain new insights into spatiotemporal aspects of human interactions on cyber space and their relation with real-world activities. In particular, such opportunities are motivating public health researchers to improve the surveillance of disease epidemics by means of spatiotemporal analysis of social media geostreams. One essential requirement in achieving such geostream-based disease surveillance is to establish scalable data infrastructures capable of real-time transformation of massive geostreams into spatiotemporally organized data to which analytical methods are readily applicable. To fulfill this requirement, this study develops a data pipeline solution where multiple computational components are integrated to collect, process, and aggregate social media geostreams in near real time. As a test case, this solution focuses on one well-known social media geostream, the Twitter data stream, and one type of disease epidemics, the flu. The pipeline solution facilitates multiscale spatiotemporal analysis of flu risks by collecting geotagged tweets from the Twitter Streaming API, identifying flu-related tweets through keyword match, aggregating tweets at multiple spatial granularities in near real time, and storing tweets and the aggregate statistics in a distributed NoSQL database. Although developed for the surveillance of flu epidemics, the pipeline would serve as a general framework for building scalable data infrastructures that can support real-time spatiotemporal analysis of social media geostreams in the application domains beyond disease mapping and public health.",
keywords = "data pipeline, disease surveillance, social media geostreams, spatiotemporal analysis",
author = "Hwang, {Myung Hwa} and Shaowen Wang and Guofeng Cao and Anand Padmanabhan and Zhenhua Zhang",
year = "2013",
month = "12",
day = "1",
doi = "10.1145/2534303.2534310",
language = "English (US)",
isbn = "9781450325325",
series = "Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013",
pages = "12--21",
booktitle = "Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013",

}

TY - GEN

T1 - Spatiotemporal transformation of social media geostreams

T2 - A case study of Twitter for flu risk analysis

AU - Hwang, Myung Hwa

AU - Wang, Shaowen

AU - Cao, Guofeng

AU - Padmanabhan, Anand

AU - Zhang, Zhenhua

PY - 2013/12/1

Y1 - 2013/12/1

N2 - Georeferenced social media data streams (social media geostreams) are providing promising opportunities to gain new insights into spatiotemporal aspects of human interactions on cyber space and their relation with real-world activities. In particular, such opportunities are motivating public health researchers to improve the surveillance of disease epidemics by means of spatiotemporal analysis of social media geostreams. One essential requirement in achieving such geostream-based disease surveillance is to establish scalable data infrastructures capable of real-time transformation of massive geostreams into spatiotemporally organized data to which analytical methods are readily applicable. To fulfill this requirement, this study develops a data pipeline solution where multiple computational components are integrated to collect, process, and aggregate social media geostreams in near real time. As a test case, this solution focuses on one well-known social media geostream, the Twitter data stream, and one type of disease epidemics, the flu. The pipeline solution facilitates multiscale spatiotemporal analysis of flu risks by collecting geotagged tweets from the Twitter Streaming API, identifying flu-related tweets through keyword match, aggregating tweets at multiple spatial granularities in near real time, and storing tweets and the aggregate statistics in a distributed NoSQL database. Although developed for the surveillance of flu epidemics, the pipeline would serve as a general framework for building scalable data infrastructures that can support real-time spatiotemporal analysis of social media geostreams in the application domains beyond disease mapping and public health.

AB - Georeferenced social media data streams (social media geostreams) are providing promising opportunities to gain new insights into spatiotemporal aspects of human interactions on cyber space and their relation with real-world activities. In particular, such opportunities are motivating public health researchers to improve the surveillance of disease epidemics by means of spatiotemporal analysis of social media geostreams. One essential requirement in achieving such geostream-based disease surveillance is to establish scalable data infrastructures capable of real-time transformation of massive geostreams into spatiotemporally organized data to which analytical methods are readily applicable. To fulfill this requirement, this study develops a data pipeline solution where multiple computational components are integrated to collect, process, and aggregate social media geostreams in near real time. As a test case, this solution focuses on one well-known social media geostream, the Twitter data stream, and one type of disease epidemics, the flu. The pipeline solution facilitates multiscale spatiotemporal analysis of flu risks by collecting geotagged tweets from the Twitter Streaming API, identifying flu-related tweets through keyword match, aggregating tweets at multiple spatial granularities in near real time, and storing tweets and the aggregate statistics in a distributed NoSQL database. Although developed for the surveillance of flu epidemics, the pipeline would serve as a general framework for building scalable data infrastructures that can support real-time spatiotemporal analysis of social media geostreams in the application domains beyond disease mapping and public health.

KW - data pipeline

KW - disease surveillance

KW - social media geostreams

KW - spatiotemporal analysis

UR - http://www.scopus.com/inward/record.url?scp=84894631570&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84894631570&partnerID=8YFLogxK

U2 - 10.1145/2534303.2534310

DO - 10.1145/2534303.2534310

M3 - Conference contribution

AN - SCOPUS:84894631570

SN - 9781450325325

T3 - Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013

SP - 12

EP - 21

BT - Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS 2013

ER -