TY - JOUR
T1 - Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion
AU - Zhang, Shizhuo Dylan
AU - Tigges, Curt
AU - Zhang, Zory
AU - Biderman, Stella
AU - Raginsky, Maxim
AU - Ringer, Talia
N1 - This research was developed with funding from the Defense Advanced Research Projects Agency. The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
PY - 2024
Y1 - 2024
N2 - This paper investigates the ability of transformer-based models to learn structural recursion from examples. Recursion is a universal concept in both natural and formal languages. Structural recursion is central to the programming language and formal mathematics tasks where symbolic tools currently excel beyond neural models, such as inferring semantic re-lations between datatypes and emulating program behavior. We introduce a general conceptual framework that nicely connects the abstract concepts of structural recursion in the programming language domain to concrete sequence modeling problems and learned mod-els’ behavior. The framework includes a representation that captures the general syntax of structural recursion, coupled with two different frameworks for understanding their se-mantics—one that is more natural from a programming languages perspective and one that helps bridge that perspective with a mechanistic understanding of the underlying transformer architecture. With our framework as a powerful conceptual tool, we identify different issues under vari-ous set-ups. The models trained to emulate recursive computations cannot fully capture the recursion yet instead fit short-cut algorithms and thus cannot solve certain edge cases that are under-represented in the training distribution. In addition, it is difficult for state-of-the-art large language models (LLMs) to mine recursive rules from in-context demonstrations. Meanwhile, these LLMs fail in interesting ways when emulating reduction (step-wise com-putation) of the recursive function.
AB - This paper investigates the ability of transformer-based models to learn structural recursion from examples. Recursion is a universal concept in both natural and formal languages. Structural recursion is central to the programming language and formal mathematics tasks where symbolic tools currently excel beyond neural models, such as inferring semantic re-lations between datatypes and emulating program behavior. We introduce a general conceptual framework that nicely connects the abstract concepts of structural recursion in the programming language domain to concrete sequence modeling problems and learned mod-els’ behavior. The framework includes a representation that captures the general syntax of structural recursion, coupled with two different frameworks for understanding their se-mantics—one that is more natural from a programming languages perspective and one that helps bridge that perspective with a mechanistic understanding of the underlying transformer architecture. With our framework as a powerful conceptual tool, we identify different issues under vari-ous set-ups. The models trained to emulate recursive computations cannot fully capture the recursion yet instead fit short-cut algorithms and thus cannot solve certain edge cases that are under-represented in the training distribution. In addition, it is difficult for state-of-the-art large language models (LLMs) to mine recursive rules from in-context demonstrations. Meanwhile, these LLMs fail in interesting ways when emulating reduction (step-wise com-putation) of the recursive function.
UR - http://www.scopus.com/inward/record.url?scp=85219579798&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85219579798&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85219579798
SN - 2835-8856
VL - 2024
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -