Leveraging Large Pretrained Models for WebNLG 2020

Xintong Li, Aleksandre Maskharashvili, Symon Jory Stevens-Guille, Michael White

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we report experiments on finetuning large pretrained models to realize resource description framework (RDF) triples to natural language. We provide the details of how to build one of the top-ranked English generation models in WebNLG Challenge 2020. We also show that there appears to be considerable potential for reranking to improve the current state of the art both in terms of statistical metrics and model-based metrics. Our human analyses of the generated texts show that for Russian, pretrained models showed some success, both in terms of lexical and morpho-syntactic choices for generation, as well as for content aggregation. Nevertheless, in a number of cases, the model can be unpredictable, both in terms of failure or success. Omissions of the content and hallucinations, which in many cases occurred at the same time, were major problems. By contrast, the models for English showed near perfect performance on the validation set.
Original languageEnglish (US)
Title of host publicationProceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)
EditorsThiago Castro Ferreira, Claire Gardent, Nikolai Ilinykh, Chris van der Lee, Simon Mille, Diego Moussallem, Anastasia Shimorina
Place of PublicationDublin
PublisherAssociation for Computational Linguistics
Pages117-124
Number of pages8
StatePublished - Dec 2020
Externally publishedYes

Fingerprint

Dive into the research topics of 'Leveraging Large Pretrained Models for WebNLG 2020'. Together they form a unique fingerprint.

Cite this