Towards Robust Neural Retrieval with Source Domain Synthetic Pre-Finetuning

Revanth Gangi Reddy, Vikas Yadav, Md Arafat Sultan, Martin Franz, Vittorio Castelli, Heng Ji, Avirup Sil

Research output: Contribution to journalConference articlepeer-review

Abstract

Research on neural IR has so far been focused primarily on standard supervised learning settings, where it outperforms traditional term matching baselines. Many practical use cases of such models, however, may involve previously unseen target domains. In this paper, we propose to improve the out-of-domain generalization of Dense Passage Retrieval (DPR)—a popular choice for neural IR—through synthetic data augmentation only in the source domain. We empirically show that pre-finetuning DPR with additional synthetic data in its source domain (Wikipedia), which we generate using a fine-tuned sequence-to-sequence generator1, can be a low-cost yet effective first step towards its generalization. Across five different test sets, our augmented model shows more robust performance than DPR in both in-domain and zero-shot out-of-domain evaluation.

Original languageEnglish (US)
Pages (from-to)1065-1070
Number of pages6
JournalProceedings - International Conference on Computational Linguistics, COLING
Volume29
Issue number1
StatePublished - 2022
Event29th International Conference on Computational Linguistics, COLING 2022 - Gyeongju, Korea, Republic of
Duration: Oct 12 2022Oct 17 2022

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Towards Robust Neural Retrieval with Source Domain Synthetic Pre-Finetuning'. Together they form a unique fingerprint.

Cite this