TY - GEN
T1 - Domain Representative Keywords Selection
T2 - Findings of the Association for Computational Linguistics: ACL 2022
AU - Akash, Pritom Saha
AU - Huang, Jie
AU - Chang, Kevin Chen Chuan
AU - Li, Yunyao
AU - Popa, Lucian
AU - Zhai, Cheng Xiang
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - We propose a probabilistic approach to select a subset of a target domain representative key words from a candidate set, contrasting with a context domain. Such a task is crucial for many downstream tasks in natural language process ing. To contrast the target domain and the con text domain, we adapt the two-component mix ture model concept to generate a distribution of candidate keywords. It provides more im portance to the distinctive keywords of the tar get domain than common keywords contrasting with the context domain. To support the repre sentativeness of the selected keywords towards the target domain, we introduce an optimiza tion algorithm for selecting the subset from the generated candidate distribution. We have shown that the optimization algorithm can be efficiently implemented with a near-optimal approximation guarantee. Finally, extensive ex periments on multiple domains demonstrate the superiority of our approach over other baselines for the tasks of keyword summary generation and trending keywords selection.
AB - We propose a probabilistic approach to select a subset of a target domain representative key words from a candidate set, contrasting with a context domain. Such a task is crucial for many downstream tasks in natural language process ing. To contrast the target domain and the con text domain, we adapt the two-component mix ture model concept to generate a distribution of candidate keywords. It provides more im portance to the distinctive keywords of the tar get domain than common keywords contrasting with the context domain. To support the repre sentativeness of the selected keywords towards the target domain, we introduce an optimiza tion algorithm for selecting the subset from the generated candidate distribution. We have shown that the optimization algorithm can be efficiently implemented with a near-optimal approximation guarantee. Finally, extensive ex periments on multiple domains demonstrate the superiority of our approach over other baselines for the tasks of keyword summary generation and trending keywords selection.
UR - http://www.scopus.com/inward/record.url?scp=85149151359&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149151359&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.findings-acl.56
DO - 10.18653/v1/2022.findings-acl.56
M3 - Conference contribution
AN - SCOPUS:85149151359
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 679
EP - 692
BT - ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022
A2 - Muresan, Smaranda
A2 - Nakov, Preslav
A2 - Villavicencio, Aline
PB - Association for Computational Linguistics (ACL)
Y2 - 22 May 2022 through 27 May 2022
ER -