TY - GEN
T1 - Latent association analysis of document pairs
AU - Miao, Gengxin
AU - Guan, Ziyu
AU - Moser, Louise E.
AU - Yan, Xifeng
AU - Tao, Shu
AU - Anerousis, Nikos
AU - Sun, Jimeng
PY - 2012
Y1 - 2012
N2 - This paper presents Latent Association Analysis (LAA), a generative model that analyzes the topics within two document sets simultaneously, as well as the correlations between the two topic structures, by considering the semantic associations among document pairs. LAA defines a correlation factor that represents the connection between two documents, and considers the topic proportion of paired documents based on this factor. Words in the documents are assumed to be randomly generated by particular topic assignments and topic-to-word probability distributions. The paper also presents a new ranking algorithm, based on LAA, that can be used to retrieve target documents that are potentially associated with a given source document. The ranking algorithm uses the latent factor in LAA to rank target documents by the strength of their semantic associations with the source document. We evaluate the LAA algorithm with real datasets, specifically, the IT-Change and the IT-Solution document sets from the IBM IT service environment and the Symptom-Treatment document sets from Google Health. Experimental results demonstrate that the LAA algorithm significantly outperforms existing algorithms.
AB - This paper presents Latent Association Analysis (LAA), a generative model that analyzes the topics within two document sets simultaneously, as well as the correlations between the two topic structures, by considering the semantic associations among document pairs. LAA defines a correlation factor that represents the connection between two documents, and considers the topic proportion of paired documents based on this factor. Words in the documents are assumed to be randomly generated by particular topic assignments and topic-to-word probability distributions. The paper also presents a new ranking algorithm, based on LAA, that can be used to retrieve target documents that are potentially associated with a given source document. The ranking algorithm uses the latent factor in LAA to rank target documents by the strength of their semantic associations with the source document. We evaluate the LAA algorithm with real datasets, specifically, the IT-Change and the IT-Solution document sets from the IBM IT service environment and the Symptom-Treatment document sets from Google Health. Experimental results demonstrate that the LAA algorithm significantly outperforms existing algorithms.
KW - ranking algorithm
KW - topic model
KW - variational inference
UR - http://www.scopus.com/inward/record.url?scp=84866015175&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866015175&partnerID=8YFLogxK
U2 - 10.1145/2339530.2339752
DO - 10.1145/2339530.2339752
M3 - Conference contribution
AN - SCOPUS:84866015175
SN - 9781450314626
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1415
EP - 1423
BT - KDD'12 - 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
T2 - 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012
Y2 - 12 August 2012 through 16 August 2012
ER -