TY - GEN
T1 - Open-Domain Question Answering with Pre-Constructed Question Spaces
AU - Xiao, Jinfeng
AU - Wang, Lidan
AU - Dernoncourt, Franck
AU - Bui, Trung
AU - Sun, Tong
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics.
PY - 2021
Y1 - 2021
N2 - Open-domain question answering aims at locating the answers to user-generated questions in massive collections of documents. Retriever-readers and knowledge graph approaches are two big families of solutions to this task. A retriever-reader first applies information retrieval techniques to locate a few passages that are likely to be relevant, and then feeds the retrieved text to a neural network reader to extract the answer. Alternatively, knowledge graphs can be constructed and queried to answer users’ questions. We propose an algorithm with a novel reader-retriever design that differs from both families. Our reader-retriever first uses an offline reader to read the corpus and generate collections of all answerable questions associated with their answers, and then uses an online retriever to respond to user queries by searching the pre-constructed question spaces for answers that are most likely to be asked in the given way. We further combine one retriever-reader and two reader-retrievers into a hybrid model called R6 for the best performance. Experiments with large-scale public datasets show that R6 achieves state-of-the-art accuracy.
AB - Open-domain question answering aims at locating the answers to user-generated questions in massive collections of documents. Retriever-readers and knowledge graph approaches are two big families of solutions to this task. A retriever-reader first applies information retrieval techniques to locate a few passages that are likely to be relevant, and then feeds the retrieved text to a neural network reader to extract the answer. Alternatively, knowledge graphs can be constructed and queried to answer users’ questions. We propose an algorithm with a novel reader-retriever design that differs from both families. Our reader-retriever first uses an offline reader to read the corpus and generate collections of all answerable questions associated with their answers, and then uses an online retriever to respond to user queries by searching the pre-constructed question spaces for answers that are most likely to be asked in the given way. We further combine one retriever-reader and two reader-retrievers into a hybrid model called R6 for the best performance. Experiments with large-scale public datasets show that R6 achieves state-of-the-art accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85119701896&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119701896&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85119701896
T3 - NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop
SP - 61
EP - 67
BT - NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - NAACL 2021 Student Research Workshop, SRW 2021, at 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021
Y2 - 6 June 2021 through 11 June 2021
ER -