PTM: Probabilistic topic mapping model for mining parallel document collections

Duo Zhang, Jimeng Sun, Chengxiang Zhai, Abhijit Bose, Nikos Anerousis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many applications generate a large volume of parallel document collections. A parallel document collection consists of two sets of documents where the documents in each set correspond to each other and form semantic pairs (e.g., pairs of problem and solution descriptions in a help-desk setting). Although much work has been done on text mining, little previous work has attempted to mine such a novel kind of text data. In this paper, we propose a new probabilistic topic model, called Probabilistic Topic Mapping (PTM) model, to mine parallel document collections to simultaneously discover latent topics in both sets of documents as well as the mapping of topics in one set to those in the other. We evaluate the PTM model on a parallel document collection in IT service domain. We show that PTM can effectively discover meaningful topics, as well as their mappings, and it's also useful for improving text matching and retrieval when there's a vocabulary gap.

Original languageEnglish (US)
Title of host publicationCIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
Pages1653-1656
Number of pages4
DOIs
StatePublished - 2010
Event19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10 - Toronto, ON, Canada
Duration: Oct 26 2010Oct 30 2010

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
Country/TerritoryCanada
CityToronto, ON
Period10/26/1010/30/10

Keywords

  • Mining parallel document collections
  • Probabilistic topic mapping

ASJC Scopus subject areas

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Fingerprint

Dive into the research topics of 'PTM: Probabilistic topic mapping model for mining parallel document collections'. Together they form a unique fingerprint.

Cite this