TY - GEN
T1 - Mining multi-faceted overviews of arbitrary topics in a text collection
AU - Ling, Xu
AU - Mei, Qiaozhu
AU - Zhai, Chengxiang
AU - Schatz, Bruce
PY - 2008
Y1 - 2008
N2 - A common task in many text mining applications is to generate a multi-faceted overview of a topic in a text collection. Such an overview not only directly serves as an informative summary of the topic, but also provides a detailed view of navigation to different facets of the topic. Existing work has cast this problem as a categorization problem and requires training examples for each facet. This has three limitations: (1) All facets are predefined, which may not fit the need of a particular user. (2) Training examples for each facet are often unavailable. (3) Such an approach only works for a predefined type of topics. In this paper, we break these limitations and study a more realistic new setup of the problem, in which we would allow a user to flexibly describe each facet with keywords for an arbitrary topic and attempt to mine a multi-faceted overview in an unsupervised way. We attempt a probabilistic approach to solve this problem. Empirical experiments on different genres of text data show that our approach can effectively generate a multi-faceted overview for arbitrary topics; the generated overviews are comparable with those generated by supervised methods with training examples. They are also more informative than unstructured flat summaries. The method is quite general, thus can be applied to multiple text mining tasks in different application domains.
AB - A common task in many text mining applications is to generate a multi-faceted overview of a topic in a text collection. Such an overview not only directly serves as an informative summary of the topic, but also provides a detailed view of navigation to different facets of the topic. Existing work has cast this problem as a categorization problem and requires training examples for each facet. This has three limitations: (1) All facets are predefined, which may not fit the need of a particular user. (2) Training examples for each facet are often unavailable. (3) Such an approach only works for a predefined type of topics. In this paper, we break these limitations and study a more realistic new setup of the problem, in which we would allow a user to flexibly describe each facet with keywords for an arbitrary topic and attempt to mine a multi-faceted overview in an unsupervised way. We attempt a probabilistic approach to solve this problem. Empirical experiments on different genres of text data show that our approach can effectively generate a multi-faceted overview for arbitrary topics; the generated overviews are comparable with those generated by supervised methods with training examples. They are also more informative than unstructured flat summaries. The method is quite general, thus can be applied to multiple text mining tasks in different application domains.
KW - Algorithams
UR - http://www.scopus.com/inward/record.url?scp=65449189631&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=65449189631&partnerID=8YFLogxK
U2 - 10.1145/1401890.1401952
DO - 10.1145/1401890.1401952
M3 - Conference contribution
AN - SCOPUS:65449189631
SN - 9781605581934
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 497
EP - 505
BT - KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining
T2 - 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
Y2 - 24 August 2008 through 27 August 2008
ER -