Faking Fake News for Real Fake News Detection: Propaganda-Loaded Training Data Generation

Kung Hsiang Huang, Kathleen McKeown, Preslav Nakov, Yejin Choi, Heng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Despite recent advances in detecting fake news generated by neural models, their results are not readily applicable to effective detection of human-written disinformation. What limits the successful transfer between them is the sizable gap between machine-generated fake news and human-authored ones, including the notable differences in terms of style and underlying intent. With this in mind, we propose a novel framework for generating training examples that are informed by the known styles and strategies of human-authored propaganda. Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles, while also incorporating propaganda techniques, such as appeal to authority and loaded language. In particular, we create a new training dataset, PROPANEWS, with 2,256 examples, which we release for future use. Our experimental results show that fake news detectors trained on PROPANEWS are better at detecting human-written disinformation by 3.62-7.69% F1 score on two public datasets.

Original languageEnglish (US)
Title of host publicationLong Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages14571-14589
Number of pages19
ISBN (Electronic)9781959429722
StatePublished - 2023
Externally publishedYes
Event61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Duration: Jul 9 2023Jul 14 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityToronto
Period7/9/237/14/23

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Faking Fake News for Real Fake News Detection: Propaganda-Loaded Training Data Generation'. Together they form a unique fingerprint.

Cite this