TY - JOUR
T1 - Multi-LoRA Composition for Image Generation
AU - Zhong, Ming
AU - Shen, Yelong
AU - Wang, Shuohang
AU - Lu, Yadong
AU - Jiao, Yizhu
AU - Ouyang, Siru
AU - Yu, Donghan
AU - Han, Jiawei
AU - Chen, Weizhu
N1 - Publisher Copyright:
© 2024, Transactions on Machine Learning Research. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images. Nonetheless, existing methods face challenges in effectively composing multiple LoRAs, especially as the number of LoRAs to be integrated grows, thus hindering the creation of complex imagery. In this paper, we study multi-LoRA composition through a decoding-centric perspective. We present two training-free methods: LoRA Switch, which alternates between different LoRAs at each denoising step, and LoRA Composite, which simultaneously incorporates all LoRAs to guide more cohesive image synthesis. To evaluate the proposed approaches, we establish ComposLoRA, a new comprehensive testbed as part of this research. It features a diverse range of LoRA categories with 480 composition sets. Utilizing an evaluation framework based on GPT-4V, our findings demonstrate a clear improvement in performance with our methods over the prevalent baseline, particularly evident when increasing the number of LoRAs in a composition. The code, benchmarks, LoRA weights, and all evaluation details are available on our project website.
AB - Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images. Nonetheless, existing methods face challenges in effectively composing multiple LoRAs, especially as the number of LoRAs to be integrated grows, thus hindering the creation of complex imagery. In this paper, we study multi-LoRA composition through a decoding-centric perspective. We present two training-free methods: LoRA Switch, which alternates between different LoRAs at each denoising step, and LoRA Composite, which simultaneously incorporates all LoRAs to guide more cohesive image synthesis. To evaluate the proposed approaches, we establish ComposLoRA, a new comprehensive testbed as part of this research. It features a diverse range of LoRA categories with 480 composition sets. Utilizing an evaluation framework based on GPT-4V, our findings demonstrate a clear improvement in performance with our methods over the prevalent baseline, particularly evident when increasing the number of LoRAs in a composition. The code, benchmarks, LoRA weights, and all evaluation details are available on our project website.
UR - http://www.scopus.com/inward/record.url?scp=85219535760&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85219535760&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85219535760
SN - 2835-8856
VL - 2024
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -