On the Transformer Growth for Progressive BERT Training

Xiaotao Gu, Liyuan Liu, Hongkun Yu, Jing Li, Chen Chen, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Due to the excessive cost of large-scale language model pre-training, considerable efforts have been made to train BERT progressively—start from an inferior but low-cost model and gradually grow the model to increase the computational complexity. Our objective is to advance the understanding of Transformer growth and discover principles that guide progressive training. First, we find that similar to network architecture search, Transformer growth also favors compound scaling. Specifically, while existing methods only conduct network growth in a single dimension, we observe that it is beneficial to use compound growth operators and balance multiple dimensions (e.g., depth, width, and input length of the model). Moreover, we explore alternative growth operators in each dimension via controlled comparison to give operator selection practical guidance. In light of our analyses, the proposed method Compound Grow speeds up BERT pretraining by 73.6% and 82.2% for the base and large models respectively, while achieving comparable performances.

Original languageEnglish (US)
Title of host publicationNAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages5174-5180
Number of pages7
ISBN (Electronic)9781954085466
DOIs
StatePublished - 2021
Event2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021 - Virtual, Online
Duration: Jun 6 2021Jun 11 2021

Publication series

NameNAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Conference

Conference2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021
CityVirtual, Online
Period6/6/216/11/21

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'On the Transformer Growth for Progressive BERT Training'. Together they form a unique fingerprint.

Cite this