TY - GEN
T1 - PTM
T2 - 19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
AU - Zhang, Duo
AU - Sun, Jimeng
AU - Zhai, Chengxiang
AU - Bose, Abhijit
AU - Anerousis, Nikos
PY - 2010
Y1 - 2010
N2 - Many applications generate a large volume of parallel document collections. A parallel document collection consists of two sets of documents where the documents in each set correspond to each other and form semantic pairs (e.g., pairs of problem and solution descriptions in a help-desk setting). Although much work has been done on text mining, little previous work has attempted to mine such a novel kind of text data. In this paper, we propose a new probabilistic topic model, called Probabilistic Topic Mapping (PTM) model, to mine parallel document collections to simultaneously discover latent topics in both sets of documents as well as the mapping of topics in one set to those in the other. We evaluate the PTM model on a parallel document collection in IT service domain. We show that PTM can effectively discover meaningful topics, as well as their mappings, and it's also useful for improving text matching and retrieval when there's a vocabulary gap.
AB - Many applications generate a large volume of parallel document collections. A parallel document collection consists of two sets of documents where the documents in each set correspond to each other and form semantic pairs (e.g., pairs of problem and solution descriptions in a help-desk setting). Although much work has been done on text mining, little previous work has attempted to mine such a novel kind of text data. In this paper, we propose a new probabilistic topic model, called Probabilistic Topic Mapping (PTM) model, to mine parallel document collections to simultaneously discover latent topics in both sets of documents as well as the mapping of topics in one set to those in the other. We evaluate the PTM model on a parallel document collection in IT service domain. We show that PTM can effectively discover meaningful topics, as well as their mappings, and it's also useful for improving text matching and retrieval when there's a vocabulary gap.
KW - Mining parallel document collections
KW - Probabilistic topic mapping
UR - http://www.scopus.com/inward/record.url?scp=78651344392&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78651344392&partnerID=8YFLogxK
U2 - 10.1145/1871437.1871696
DO - 10.1145/1871437.1871696
M3 - Conference contribution
AN - SCOPUS:78651344392
SN - 9781450300995
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1653
EP - 1656
BT - CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
Y2 - 26 October 2010 through 30 October 2010
ER -