Abstract
Many novel applications have been built based on analyzing tweets about specific topics. While these applications provide different kinds of analysis, they share a common task of monitoring "target" tweets from the Twitter stream for a topic. The current solution for this task tracks a set of manually selected keywords with Twitter APIs. Obviously, this manual approach has many limitations. In this paper, we propose a data platform to automatically monitor target tweets from the Twitter stream for any given topic. To monitor target tweets in an optimal and continuous way, we design Automatic Topic-focused Monitor (ATM), which iteratively 1) samples tweets from the stream and 2) selects keywords to track based on the samples. To realize ATM, we develop a tweet sampling algorithm to sample sufficient unbiased tweets with available Twitter APIs, and a keyword selection algorithm to efficiently select keywords that have a near-optimal coverage of target tweets under cost constraints. We conduct extensive experiments to show the effectiveness of ATM. E.g., ATM covers 90% of target tweets for a topic and improves the manual approach by 49%.
Original language | English (US) |
---|---|
Pages (from-to) | 1966-1977 |
Number of pages | 12 |
Journal | Proceedings of the VLDB Endowment |
Volume | 6 |
Issue number | 14 |
DOIs | |
State | Published - Sep 2013 |
Event | 39th International Conference on Very Large Data Bases, VLDB 2012 - Trento, Italy Duration: Aug 26 2013 → Aug 30 2013 |
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- General Computer Science