Latent association analysis of document pairs

Gengxin Miao, Ziyu Guan, Louise E. Moser, Xifeng Yan, Shu Tao, Nikos Anerousis, Jimeng Sun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents Latent Association Analysis (LAA), a generative model that analyzes the topics within two document sets simultaneously, as well as the correlations between the two topic structures, by considering the semantic associations among document pairs. LAA defines a correlation factor that represents the connection between two documents, and considers the topic proportion of paired documents based on this factor. Words in the documents are assumed to be randomly generated by particular topic assignments and topic-to-word probability distributions. The paper also presents a new ranking algorithm, based on LAA, that can be used to retrieve target documents that are potentially associated with a given source document. The ranking algorithm uses the latent factor in LAA to rank target documents by the strength of their semantic associations with the source document. We evaluate the LAA algorithm with real datasets, specifically, the IT-Change and the IT-Solution document sets from the IBM IT service environment and the Symptom-Treatment document sets from Google Health. Experimental results demonstrate that the LAA algorithm significantly outperforms existing algorithms.

Original languageEnglish (US)
Title of host publicationKDD'12 - 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages1415-1423
Number of pages9
DOIs
StatePublished - 2012
Externally publishedYes
Event18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012 - Beijing, China
Duration: Aug 12 2012Aug 16 2012

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012
Country/TerritoryChina
CityBeijing
Period8/12/128/16/12

Keywords

  • ranking algorithm
  • topic model
  • variational inference

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Latent association analysis of document pairs'. Together they form a unique fingerprint.

Cite this