Exploiting forum thread structures to improve thread clustering

Kumaresh Pattabiraman, Parikshit Sondhi, Chengxiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automated clustering of threads within and across web forums will greatly benefit both users and forum administrators in efficiently seeking, managing, and integrating the huge volume of content being generated. While clustering has been studied for other types of data, little work has been done on clustering forum threads; the informal nature and special structure of forum data make it interesting to study how to effectively cluster forum threads. In this paper, we apply three state of the art clustering methods (i.e., hierarchical agglomerative clustering, k-Means, and probabilistic latent semantic analysis) to cluster forum threads and study how to leverage the structure of threads to improve clustering accuracy. We propose three different methods for assigning weights to the posts in a forum thread to achieve more accurate representation of a thread. We evaluate all the methods on data collected from three different Linux forums for both within-forum and across-forum clustering. Our results show that the state of the art methods perform reasonably well for this task, but the performance can be further improved by exploiting thread structures. In particular, a parabolic weighting method that assigns higher weights for both beginning posts and end posts of a thread is shown to consistently outperform a standard clustering method.

Original languageEnglish (US)
Title of host publicationInternational Conference on the Theory of Information Retrieval, ICTIR 2013 Proceedings
Pages64-71
Number of pages8
DOIs
StatePublished - Oct 30 2013
Event4th International Conference on the Theory of Information Retrieval, ICTIR 2013 - Copenhagen, Denmark
Duration: Sep 29 2013Oct 2 2013

Publication series

NameACM International Conference Proceeding Series

Other

Other4th International Conference on the Theory of Information Retrieval, ICTIR 2013
CountryDenmark
CityCopenhagen
Period9/29/1310/2/13

Keywords

  • Forums
  • K-Means
  • Text mining
  • Thread clustering
  • Web 2.0

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Exploiting forum thread structures to improve thread clustering'. Together they form a unique fingerprint.

Cite this