Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup

Luyu Gao, Yunyi Zhang, Jiawei Han, Jamie Callan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Contrastive learning has been applied successfully to learn vector representations of text. Previous research demonstrated that learning high-quality representations benefits from batch-wise contrastive loss with a large number of negatives. In practice, the technique of in-batch negative is used, where for each example in a batch, other batch examples’ positives will be taken as its negatives, avoiding encoding extra negatives. This, however, still conditions each example’s loss on all batch examples and requires fitting the entire large batch into GPU memory. This paper introduces a gradient caching technique that decouples backpropagation between contrastive loss and the encoder, removing encoder backward pass data dependency along the batch dimension. As a result, gradients can be computed for one subset of the batch at a time, leading to almost constant memory usage.

Original languageEnglish (US)
Title of host publicationRepL4NLP 2021 - 6th Workshop on Representation Learning for NLP, Proceedings of the Workshop
EditorsAnna Rogers, Iacer Calixto, Iacer Calixto, Ivan Vulic, Naomi Saphra, Nora Kassner, Oana-Maria Camburu, Trapit Bansal, Vered Shwartz
PublisherAssociation for Computational Linguistics (ACL)
Pages316-321
Number of pages6
ISBN (Electronic)9781954085725
StatePublished - 2021
Externally publishedYes
Event6th Workshop on Representation Learning for NLP, RepL4NLP 2021 - Virtual, Bangkok, Thailand
Duration: Aug 6 2021 → …

Publication series

NameRepL4NLP 2021 - 6th Workshop on Representation Learning for NLP, Proceedings of the Workshop

Conference

Conference6th Workshop on Representation Learning for NLP, RepL4NLP 2021
Country/TerritoryThailand
CityVirtual, Bangkok
Period8/6/21 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup'. Together they form a unique fingerprint.

Cite this