TY - GEN
T1 - Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup
AU - Gao, Luyu
AU - Zhang, Yunyi
AU - Han, Jiawei
AU - Callan, Jamie
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics.
PY - 2021
Y1 - 2021
N2 - Contrastive learning has been applied successfully to learn vector representations of text. Previous research demonstrated that learning high-quality representations benefits from batch-wise contrastive loss with a large number of negatives. In practice, the technique of in-batch negative is used, where for each example in a batch, other batch examples’ positives will be taken as its negatives, avoiding encoding extra negatives. This, however, still conditions each example’s loss on all batch examples and requires fitting the entire large batch into GPU memory. This paper introduces a gradient caching technique that decouples backpropagation between contrastive loss and the encoder, removing encoder backward pass data dependency along the batch dimension. As a result, gradients can be computed for one subset of the batch at a time, leading to almost constant memory usage.
AB - Contrastive learning has been applied successfully to learn vector representations of text. Previous research demonstrated that learning high-quality representations benefits from batch-wise contrastive loss with a large number of negatives. In practice, the technique of in-batch negative is used, where for each example in a batch, other batch examples’ positives will be taken as its negatives, avoiding encoding extra negatives. This, however, still conditions each example’s loss on all batch examples and requires fitting the entire large batch into GPU memory. This paper introduces a gradient caching technique that decouples backpropagation between contrastive loss and the encoder, removing encoder backward pass data dependency along the batch dimension. As a result, gradients can be computed for one subset of the batch at a time, leading to almost constant memory usage.
UR - http://www.scopus.com/inward/record.url?scp=85127078385&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127078385&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85127078385
T3 - RepL4NLP 2021 - 6th Workshop on Representation Learning for NLP, Proceedings of the Workshop
SP - 316
EP - 321
BT - RepL4NLP 2021 - 6th Workshop on Representation Learning for NLP, Proceedings of the Workshop
A2 - Rogers, Anna
A2 - Calixto, Iacer
A2 - Calixto, Iacer
A2 - Vulic, Ivan
A2 - Saphra, Naomi
A2 - Kassner, Nora
A2 - Camburu, Oana-Maria
A2 - Bansal, Trapit
A2 - Shwartz, Vered
PB - Association for Computational Linguistics (ACL)
T2 - 6th Workshop on Representation Learning for NLP, RepL4NLP 2021
Y2 - 6 August 2021
ER -