Automatic construction and ranking of topical keyphrases on collections of short documents

Marina Danilevsky, Chi Wang, Nihit Desai, Xiang Ren, Jingyi Guo, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We introduce a framework for topical keyphrase generation and ranking, based on the output of a topic model run on a collection of short documents. By shifting from the unigramcentric traditional methods of keyphrase extraction and ranking to a phrase-centric approach, we are able to directly compare and rank phrases of different lengths. Our method defines a function to rank topical keyphrases so that more highly ranked keyphrases are considered to be more representative phrases for that topic. We study the performance of our framework on multiple real world document collections, and also show that it is more scalable than comparable phrase-generating models.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2014, SDM 2014
EditorsMohammed J. Zaki, Arindam Banerjee, Srinivasan Parthasarathy, Pang Ning-Tan, Zoran Obradovic, Chandrika Kamath
PublisherSociety for Industrial and Applied Mathematics Publications
Pages398-406
Number of pages9
ISBN (Electronic)9781510811515
DOIs
StatePublished - 2014
Event14th SIAM International Conference on Data Mining, SDM 2014 - Philadelphia, United States
Duration: Apr 24 2014Apr 26 2014

Publication series

NameSIAM International Conference on Data Mining 2014, SDM 2014
Volume1

Other

Other14th SIAM International Conference on Data Mining, SDM 2014
CountryUnited States
CityPhiladelphia
Period4/24/144/26/14

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this