TY - GEN
T1 - N-Best Hypotheses Reranking for Text-to-SQL Systems
AU - Zeng, Lu
AU - Parthasarathi, Sree Hari Krishnan
AU - Hakkani-Tur, Dilek
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Text-to-SQL task maps natural language utterances to structured queries that can be issued to a database. State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models in conjunction with constrained decoding applying a SQL parser. On the well established Spider dataset, we begin with Oracle studies: specifically, choosing an Oracle hypothesis from a SOTA model's 10-best list, yields a 7.7% absolute improvement in both exact match (EM) and execution (EX) accuracy, showing significant potential improvements with reranking. Identifying coherence and correctness as reranking approaches, we design a model generating a query plan and propose a heuristic schema linking algorithm. Combining both approaches, with T5-Large, we obtain a consistent 1% improvement in EM accuracy, and a 2.5% improvement in EX, establishing a new SOTA for this task. Our comprehensive error studies on DEV data show the underlying difficulty in making progress on this task.
AB - Text-to-SQL task maps natural language utterances to structured queries that can be issued to a database. State-of-the-art (SOTA) systems rely on finetuning large, pre-trained language models in conjunction with constrained decoding applying a SQL parser. On the well established Spider dataset, we begin with Oracle studies: specifically, choosing an Oracle hypothesis from a SOTA model's 10-best list, yields a 7.7% absolute improvement in both exact match (EM) and execution (EX) accuracy, showing significant potential improvements with reranking. Identifying coherence and correctness as reranking approaches, we design a model generating a query plan and propose a heuristic schema linking algorithm. Combining both approaches, with T5-Large, we obtain a consistent 1% improvement in EM accuracy, and a 2.5% improvement in EX, establishing a new SOTA for this task. Our comprehensive error studies on DEV data show the underlying difficulty in making progress on this task.
KW - Semantic parsing
KW - Text-To-SQL
UR - http://www.scopus.com/inward/record.url?scp=85147794364&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147794364&partnerID=8YFLogxK
U2 - 10.1109/SLT54892.2023.10023434
DO - 10.1109/SLT54892.2023.10023434
M3 - Conference contribution
AN - SCOPUS:85147794364
T3 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
SP - 663
EP - 670
BT - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022
Y2 - 9 January 2023 through 12 January 2023
ER -