TY - GEN
T1 - Geographical topic discovery and comparison
AU - Yin, Zhijun
AU - Cao, Liangliang
AU - Han, Jiawei
AU - Zhai, Chengxiang
AU - Huang, Thomas
PY - 2011
Y1 - 2011
N2 - This paper studies the problem of discovering and comparing geographical topics from GPS-associated documents. GPS-associated documents become popular with the pervasiveness of location-acquisition technologies. For example, in Flickr, the geo-tagged photos are associated with tags and GPS locations. In Twitter, the locations of the tweets can be identified by the GPS locations from smart phones. Many interesting concepts, including cultures, scenes, and product sales, correspond to specialized geographical distributions. In this paper, we are interested in two questions: (1) how to discover different topics of interests that are coherent in geographical regions? (2) how to compare several topics across different geographical locations? To answer these questions, this paper proposes and compares three ways of modeling geographical topics: location-driven model, text-driven model, and a novel joint model called LGTA (Latent Geographical Topic Analysis) that combines location and text. To make a fair comparison, we collect several representative datasets from Flickr website including Landscape, Activity, Manhattan, National park, Festival, Car, and Food. The results show that the first two methods work in some datasets but fail in others. LGTA works well in all these datasets at not only finding regions of interests but also providing effective comparisons of the topics across different locations. The results confirm our hypothesis that the geographical distributions can help modeling topics, while topics provide important cues to group different geographical regions.
AB - This paper studies the problem of discovering and comparing geographical topics from GPS-associated documents. GPS-associated documents become popular with the pervasiveness of location-acquisition technologies. For example, in Flickr, the geo-tagged photos are associated with tags and GPS locations. In Twitter, the locations of the tweets can be identified by the GPS locations from smart phones. Many interesting concepts, including cultures, scenes, and product sales, correspond to specialized geographical distributions. In this paper, we are interested in two questions: (1) how to discover different topics of interests that are coherent in geographical regions? (2) how to compare several topics across different geographical locations? To answer these questions, this paper proposes and compares three ways of modeling geographical topics: location-driven model, text-driven model, and a novel joint model called LGTA (Latent Geographical Topic Analysis) that combines location and text. To make a fair comparison, we collect several representative datasets from Flickr website including Landscape, Activity, Manhattan, National park, Festival, Car, and Food. The results show that the first two methods work in some datasets but fail in others. LGTA works well in all these datasets at not only finding regions of interests but also providing effective comparisons of the topics across different locations. The results confirm our hypothesis that the geographical distributions can help modeling topics, while topics provide important cues to group different geographical regions.
KW - Geographical topics
KW - Topic comparison
KW - Topic modeling
UR - http://www.scopus.com/inward/record.url?scp=80052686183&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052686183&partnerID=8YFLogxK
U2 - 10.1145/1963405.1963443
DO - 10.1145/1963405.1963443
M3 - Conference contribution
AN - SCOPUS:80052686183
SN - 9781450306324
T3 - Proceedings of the 20th International Conference on World Wide Web, WWW 2011
SP - 247
EP - 256
BT - Proceedings of the 20th International Conference on World Wide Web, WWW 2011
T2 - 20th International Conference on World Wide Web, WWW 2011
Y2 - 28 March 2011 through 1 April 2011
ER -