TY - GEN
T1 - A Meta-Learning Perspective on Transformers for Causal Language Modeling
AU - Wu, Xinbo
AU - Varshney, Lav R.
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task, by explicating an inner optimization process that may happen within the Transformer. Further, from within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models. Our analysis is supported by experiments conducted on pre-trained large language models and real-world data.
AB - The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task, by explicating an inner optimization process that may happen within the Transformer. Further, from within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models. Our analysis is supported by experiments conducted on pre-trained large language models and real-world data.
UR - http://www.scopus.com/inward/record.url?scp=85205323429&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85205323429&partnerID=8YFLogxK
U2 - 10.18653/v1/2024.findings-acl.922
DO - 10.18653/v1/2024.findings-acl.922
M3 - Conference contribution
AN - SCOPUS:85205323429
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 15612
EP - 15622
BT - 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference
A2 - Ku, Lun-Wei
A2 - Martins, Andre
A2 - Srikumar, Vivek
PB - Association for Computational Linguistics (ACL)
T2 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Y2 - 11 August 2024 through 16 August 2024
ER -