TY - GEN
T1 - Simple Unsupervised Knowledge Distillation With Space Similarity
AU - Singh, Aditya
AU - Wang, Haohan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - As per recent studies, Self-supervised learning (SSL) does not readily extend to smaller architectures. One direction to mitigate this shortcoming while simultaneously training a smaller network without labels is to adopt unsupervised knowledge distillation (UKD). Existing UKD approaches handcraft preservation worthy inter/intra sample relationships between the teacher and its student. However, this may overlook/ignore other key relationships present in the mapping of a teacher. In this paper, instead of heuristically constructing preservation worthy relationships between samples, we directly motivate the student to model the teacher’s embedding manifold. If the mapped manifold is similar, all inter/intra sample relationships are indirectly conserved. We first demonstrate that prior methods cannot preserve teacher’s latent manifold due to their sole reliance on L2 normalised embedding features. Subsequently, we propose a simple objective to capture the lost information due to normalisation. Our proposed loss component, termed space similarity, motivates each dimension of a student’s feature space to be similar to the corresponding dimension of its teacher. We perform extensive experiments demonstrating strong performance of our proposed approach on various benchmarks.
AB - As per recent studies, Self-supervised learning (SSL) does not readily extend to smaller architectures. One direction to mitigate this shortcoming while simultaneously training a smaller network without labels is to adopt unsupervised knowledge distillation (UKD). Existing UKD approaches handcraft preservation worthy inter/intra sample relationships between the teacher and its student. However, this may overlook/ignore other key relationships present in the mapping of a teacher. In this paper, instead of heuristically constructing preservation worthy relationships between samples, we directly motivate the student to model the teacher’s embedding manifold. If the mapped manifold is similar, all inter/intra sample relationships are indirectly conserved. We first demonstrate that prior methods cannot preserve teacher’s latent manifold due to their sole reliance on L2 normalised embedding features. Subsequently, we propose a simple objective to capture the lost information due to normalisation. Our proposed loss component, termed space similarity, motivates each dimension of a student’s feature space to be similar to the corresponding dimension of its teacher. We perform extensive experiments demonstrating strong performance of our proposed approach on various benchmarks.
KW - Knowledge Distillation
KW - Space Similarity
KW - Unsupervised
UR - http://www.scopus.com/inward/record.url?scp=85208278262&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85208278262&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-72627-9_9
DO - 10.1007/978-3-031-72627-9_9
M3 - Conference contribution
AN - SCOPUS:85208278262
SN - 9783031726262
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 147
EP - 164
BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
A2 - Leonardis, Aleš
A2 - Ricci, Elisa
A2 - Roth, Stefan
A2 - Russakovsky, Olga
A2 - Sattler, Torsten
A2 - Varol, Gül
PB - Springer
T2 - 18th European Conference on Computer Vision, ECCV 2024
Y2 - 29 September 2024 through 4 October 2024
ER -