TY - GEN
T1 - STREAMCUBE
T2 - 2015 31st IEEE International Conference on Data Engineering, ICDE 2015
AU - Feng, Wei
AU - Zhang, Chao
AU - Zhang, Wei
AU - Han, Jiawei
AU - Wang, Jianyong
AU - Aggarwal, Charu
AU - Huang, Jianbin
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/5/26
Y1 - 2015/5/26
N2 - What is happening around the world? When and where? Mining the geo-tagged Twitter stream makes it possible to answer the above questions in real-time. Although a single tweet can be short and noisy, proper aggregations of tweets can provide meaningful results. In this paper, we focus on hierarchical spatio-temporal hashtag clustering techniques. Our system has the following features: (1) Exploring events (hashtag clusters) with different space granularity. Users can zoom in and out on maps to find out what is happening in a particular area. (2) Exploring events with different time granularity. Users can choose to see what is happening today or in the past week. (3) Efficient single-pass algorithm for event identification, which provides human-readable hashtag clusters. (4) Efficient event ranking which aims to find burst events and localized events given a particular region and time frame. To support aggregation with different space and time granularity, we propose a data structure called STREAMCUBE, which is an extension of the data cube structure from the database community with spatial and temporal hierarchy. To achieve high scalability, we propose a divide-and-conquer method to construct the STREAMCUBE. To support flexible event ranking with different weights, we proposed a top-k based index. Different efficient methods are used to speed up event similarity computations. Finally, we have conducted extensive experiments on a real twitter data. Experimental results show that our framework can provide meaningful results with high scalability.
AB - What is happening around the world? When and where? Mining the geo-tagged Twitter stream makes it possible to answer the above questions in real-time. Although a single tweet can be short and noisy, proper aggregations of tweets can provide meaningful results. In this paper, we focus on hierarchical spatio-temporal hashtag clustering techniques. Our system has the following features: (1) Exploring events (hashtag clusters) with different space granularity. Users can zoom in and out on maps to find out what is happening in a particular area. (2) Exploring events with different time granularity. Users can choose to see what is happening today or in the past week. (3) Efficient single-pass algorithm for event identification, which provides human-readable hashtag clusters. (4) Efficient event ranking which aims to find burst events and localized events given a particular region and time frame. To support aggregation with different space and time granularity, we propose a data structure called STREAMCUBE, which is an extension of the data cube structure from the database community with spatial and temporal hierarchy. To achieve high scalability, we propose a divide-and-conquer method to construct the STREAMCUBE. To support flexible event ranking with different weights, we proposed a top-k based index. Different efficient methods are used to speed up event similarity computations. Finally, we have conducted extensive experiments on a real twitter data. Experimental results show that our framework can provide meaningful results with high scalability.
UR - http://www.scopus.com/inward/record.url?scp=84940838904&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84940838904&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2015.7113425
DO - 10.1109/ICDE.2015.7113425
M3 - Conference contribution
AN - SCOPUS:84940838904
T3 - Proceedings - International Conference on Data Engineering
SP - 1561
EP - 1572
BT - 2015 IEEE 31st International Conference on Data Engineering, ICDE 2015
PB - IEEE Computer Society
Y2 - 13 April 2015 through 17 April 2015
ER -