TY - GEN
T1 - Word Embeddings Are Steers for Language Models
AU - Han, Chi
AU - Xu, Jialiang
AU - Li, Manling
AU - Fung, Yi
AU - Sun, Chenkai
AU - Jiang, Nan
AU - Abdelzaher, Tarek
AU - Ji, Heng
N1 - This research is partially supported by U.S. DARPA KAIROS Program No. FA8750-19-2-1004, DARPA INCAS Program No. HR001121C0165, U.S. DARPA SemaFor Program No. HR001120C0123, DARPA MIPs Program No. HR00112290105, and DARPA ITM Program No. FA8650-23-C-7316. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of DARPA or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes, notwithstanding any copyright annotation therein.
PY - 2024
Y1 - 2024
N2 - Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain underexplored. In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering language model generation styles. We name such steers LM-Steers and find them existing in LMs of all sizes. It requires learning parameters equal to 0.2% of the original LMs' size for steering each style. On tasks such as language model detoxification and sentiment control, LM-Steers can achieve comparable or superior performance compared with state-of-the-art controlled generation methods while maintaining a better balance with generation quality. The learned LM-Steer serves as a lens in text styles: it reveals that word embeddings are interpretable when associated with language model generations and can highlight text spans that most indicate the style differences. An LM-Steer is transferrable between different language models by an explicit-form calculation. One can also continuously steer LMs simply by scaling the LM-Steer or compose multiple LM-Steers by adding their transformations. Our codes are publicly available at https://github.com/Glaciohound/LM-Steer.
AB - Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain underexplored. In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering language model generation styles. We name such steers LM-Steers and find them existing in LMs of all sizes. It requires learning parameters equal to 0.2% of the original LMs' size for steering each style. On tasks such as language model detoxification and sentiment control, LM-Steers can achieve comparable or superior performance compared with state-of-the-art controlled generation methods while maintaining a better balance with generation quality. The learned LM-Steer serves as a lens in text styles: it reveals that word embeddings are interpretable when associated with language model generations and can highlight text spans that most indicate the style differences. An LM-Steer is transferrable between different language models by an explicit-form calculation. One can also continuously steer LMs simply by scaling the LM-Steer or compose multiple LM-Steers by adding their transformations. Our codes are publicly available at https://github.com/Glaciohound/LM-Steer.
UR - http://www.scopus.com/inward/record.url?scp=85204479348&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85204479348&partnerID=8YFLogxK
U2 - 10.18653/v1/2024.acl-long.864
DO - 10.18653/v1/2024.acl-long.864
M3 - Conference contribution
AN - SCOPUS:85204479348
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 16410
EP - 16430
BT - Long Papers
A2 - Ku, Lun-Wei
A2 - Martins, Andre F. T.
A2 - Srikumar, Vivek
PB - Association for Computational Linguistics (ACL)
T2 - 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Y2 - 11 August 2024 through 16 August 2024
ER -