Domain Representative Keywords Selection: A Probabilistic Approach

Pritom Saha Akash, Jie Huang, Kevin Chen Chuan Chang, Yunyao Li, Lucian Popa, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a probabilistic approach to select a subset of a target domain representative key words from a candidate set, contrasting with a context domain. Such a task is crucial for many downstream tasks in natural language process ing. To contrast the target domain and the con text domain, we adapt the two-component mix ture model concept to generate a distribution of candidate keywords. It provides more im portance to the distinctive keywords of the tar get domain than common keywords contrasting with the context domain. To support the repre sentativeness of the selected keywords towards the target domain, we introduce an optimiza tion algorithm for selecting the subset from the generated candidate distribution. We have shown that the optimization algorithm can be efficiently implemented with a near-optimal approximation guarantee. Finally, extensive ex periments on multiple domains demonstrate the superiority of our approach over other baselines for the tasks of keyword summary generation and trending keywords selection.

Original languageEnglish (US)
Title of host publicationACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022
EditorsSmaranda Muresan, Preslav Nakov, Aline Villavicencio
PublisherAssociation for Computational Linguistics (ACL)
Pages679-692
Number of pages14
ISBN (Electronic)9781955917254
DOIs
StatePublished - 2022
EventFindings of the Association for Computational Linguistics: ACL 2022 - Dublin, Ireland
Duration: May 22 2022May 27 2022

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

ConferenceFindings of the Association for Computational Linguistics: ACL 2022
Country/TerritoryIreland
CityDublin
Period5/22/225/27/22

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Domain Representative Keywords Selection: A Probabilistic Approach'. Together they form a unique fingerprint.

Cite this