TY - JOUR
T1 - CONCRETE
T2 - 29th International Conference on Computational Linguistics, COLING 2022
AU - Huang, Kung Hsiang
AU - Zhai, Cheng Xiang
AU - Ji, Heng
N1 - This research is based upon work supported by U.S. DARPA SemaFor Program No. HR001120C0123. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of DARPA, or the U.S. Gov- ernment. The U.S. Government is authorized to reproduce and distribute reprints for governmental
PY - 2022
Y1 - 2022
N2 - Fact-checking has gained increasing attention due to the widespread of falsified information. Most fact-checking approaches focus on claims made in English only due to the data scarcity issue in other languages. The lack of fact-checking datasets in low-resource languages calls for an effective cross-lingual transfer technique for fact-checking. Additionally, trustworthy information in different languages can be complementary and helpful in verifying facts. To this end, we present the first fact-checking framework augmented with cross-lingual retrieval that aggregates evidence retrieved from multiple languages through a cross-lingual retriever. Given the absence of cross-lingual information retrieval datasets with claim-like queries, we train the retriever with our proposed Cross-lingual Inverse Cloze Task (X-ICT), a self-supervised algorithm that creates training instances by translating the title of a passage. The goal for X-ICT is to learn cross-lingual retrieval in which the model learns to identify the passage corresponding to a given translated title. On the X-FACT dataset, our approach achieves 2.23% absolute F1 improvement in the zero-shot cross-lingual setup over prior systems. The source code and data are publicly available at https://github.com/khuangaf/CONCRETE.
AB - Fact-checking has gained increasing attention due to the widespread of falsified information. Most fact-checking approaches focus on claims made in English only due to the data scarcity issue in other languages. The lack of fact-checking datasets in low-resource languages calls for an effective cross-lingual transfer technique for fact-checking. Additionally, trustworthy information in different languages can be complementary and helpful in verifying facts. To this end, we present the first fact-checking framework augmented with cross-lingual retrieval that aggregates evidence retrieved from multiple languages through a cross-lingual retriever. Given the absence of cross-lingual information retrieval datasets with claim-like queries, we train the retriever with our proposed Cross-lingual Inverse Cloze Task (X-ICT), a self-supervised algorithm that creates training instances by translating the title of a passage. The goal for X-ICT is to learn cross-lingual retrieval in which the model learns to identify the passage corresponding to a given translated title. On the X-FACT dataset, our approach achieves 2.23% absolute F1 improvement in the zero-shot cross-lingual setup over prior systems. The source code and data are publicly available at https://github.com/khuangaf/CONCRETE.
UR - http://www.scopus.com/inward/record.url?scp=85159859790&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159859790&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85159859790
SN - 2951-2093
VL - 29
SP - 1024
EP - 1035
JO - Proceedings - International Conference on Computational Linguistics, COLING
JF - Proceedings - International Conference on Computational Linguistics, COLING
IS - 1
Y2 - 12 October 2022 through 17 October 2022
ER -