Abstract
Training large deep learning (DL) models with high performance for natural language downstream tasks usually requires rich-labeled data. However, in a real-world application of COVID-19 information service (e.g., misinformation detection, question answering), a fundamental challenge is the lack of the labeled COVID data to enable supervised end-to-end training of the models for different downstream tasks, especially at the early stage of the pandemic. To address this challenge, we propose an unsupervised domain adaptation framework using contrastive learning and adversarial domain mixup to transfer the knowledge from an existing source data domain to the target COVID-19 data domain. In particular, to bridge the gap between the source domain and the target domain, our method reduces a radial basis function (RBF) based discrepancy between these two domains. Moreover, we leverage the power of domain adversarial examples to establish an intermediate domain mixup, where the latent representations of the input text from both domains could be mixed during the training process. In this paper, we focus on two prevailing downstream tasks in mining COVID-19 text data: COVID-19 misinformation detection and COVID-19 news question answering. Extensive domain adaptation experiments on multiple real-world datasets suggest that our method can effectively adapt misinformation detection and question answering systems to the unseen COVID-19 target domain with significant improvements compared to the state-of-the-art baselines.
Original language | English (US) |
---|---|
Pages (from-to) | 1105-1116 |
Number of pages | 12 |
Journal | IEEE Transactions on Emerging Topics in Computing |
Volume | 12 |
Issue number | 4 |
DOIs | |
State | Published - 2024 |
Keywords
- Domain adaptation
- contrastive domain mixup
- misinformation detection
- question answering
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Information Systems
- Human-Computer Interaction
- Computer Science Applications