Document-topic hierarchies from document graphs

Tim Weninger, Yonatan Bisk, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Topic taxonomies present a multi-level view of a document collection, where general topics live towards the top of the taxonomy and more specific topics live towards the bottom. Topic taxonomies allow users to quickly drill down into their topic of interest to find documents. We show that hierarchies of documents, where documents live at the inner nodes of the hierarchy-tree can also be inferred by combining document text with inter-document links. We present a Bayesian generative model by which an explicit hierarchy of documents is created. Experiments on three document-graph data sets shows that the generated document hierarchies are able to fit the observed data, and that the levels in the constructed document hierarchy represent practical groupings.

Original languageEnglish (US)
Title of host publicationCIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages635-644
Number of pages10
ISBN (Print)9781450311564
DOIs
StatePublished - Jan 1 2012
Event21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States
Duration: Oct 29 2012Nov 2 2012

Publication series

NameACM International Conference Proceeding Series

Other

Other21st ACM International Conference on Information and Knowledge Management, CIKM 2012
CountryUnited States
CityMaui, HI
Period10/29/1211/2/12

Keywords

  • bayesian generative models
  • hierarchical clustering
  • model evaluation
  • topic models

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Document-topic hierarchies from document graphs'. Together they form a unique fingerprint.

Cite this