A Principled Decomposition of Pointwise Mutual Information for Intention Template Discovery

Denghao Ma, Kevin Chen Chuan Chang, Yueguo Chen, Xueqiang Lv, Liang Shen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the rise of Artificial Intelligence (AI), question answering systems have become common for users to interact with computers, e.g., ChatGPT and Siri. These systems require a substantial amount of labeled data to train their models. However, the labeled data is scarce and challenging to be constructed. The construction process typically involves two stages: discovering potential sample candidates and manually labeling these candidates. To discover high-quality candidate samples, we study the intention paraphrase template discovery task: Given some seed questions or templates of an intention, discover new paraphrase templates that describe the intention and are diverse to the seeds enough in text. As the first exploration of the task, we identify the new quality requirements, i.e., relevance, divergence and popularity, and identify the new challenges, i.e., the paradox of divergent yet relevant paraphrases, and the conflict of popular yet relevant paraphrases. To untangle the paradox of divergent yet relevant paraphrases, in which the traditional bag of words falls short, we develop usage-centric modeling, which represents a question/template/answer as a bag of usages that users engaged (e.g., up-votes), and uses a usage-flow graph to interrelate templates, questions and answers. To balance the conflict of popular yet relevant paraphrases, we propose a new and principled decomposition for the well-known Pointwise Mutual Information from the usage perspective (usage-PMI), and then develop a Bayesian inference framework over the usage-flow graph to estimate the usage-PMI. Extensive experiments over three large CQA corpora show strong performance advantage over the baselines adopted from paraphrase identification task. We release 885, 000 paraphrase templates of high quality discovered by our proposed PMI decomposition model, and the data is available in site https://github.com/ParaQuestions/Intention_template_discovery.

Original languageEnglish (US)
Title of host publicationCIKM 2023 - Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages1746-1755
Number of pages10
ISBN (Electronic)9798400701245
DOIs
StatePublished - Oct 21 2023
Event32nd ACM International Conference on Information and Knowledge Management, CIKM 2023 - Birmingham, United Kingdom
Duration: Oct 21 2023Oct 25 2023

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference32nd ACM International Conference on Information and Knowledge Management, CIKM 2023
Country/TerritoryUnited Kingdom
CityBirmingham
Period10/21/2310/25/23

Keywords

  • Bayesian inference
  • Paraphrasing
  • Pointwise mutual information

ASJC Scopus subject areas

  • General Business, Management and Accounting
  • General Decision Sciences

Fingerprint

Dive into the research topics of 'A Principled Decomposition of Pointwise Mutual Information for Intention Template Discovery'. Together they form a unique fingerprint.

Cite this