Abstract
We present a supervised semantic hashing framework, named Label-Consistent Generalized Hashing (LCGH). The main novelty of LCGH is the explicit retention of information which may be irrelevant in training, but possibly useful for generalizing to unseen test classes. This is in stark contrast to typical semantic hashing methods which seek to remove redundant feature information from their hash codes in order to maximize the margin between hash codes of dissimilar data. This typical strategy leaves hash codes narrowly viable for discerning between training classes, and inadequate in discriminating between unseen test classes. Instead of limiting the information content of hash codes to those provided by the training labels, LCGH enhances its codes with information content from both supervised and unsupervised sources, improving their ability to discriminate across a wider range of data. To do so, LCGH builds upon the foundation of first agreeing with the provided training labels (label-consistency) and then incorporating possibly useful information using a reconstruction loss. In this way, LCGH respects the reliably given label information before exploring the addition of possibly useful ones. The outcome is a hashing scheme with slightly weaker within-domain (training and test classes are the same) retrieval performance, but much stronger cross-domain (training and test classes are disjoint) performance.
Original language | English (US) |
---|---|
Pages (from-to) | 4075-4085 |
Number of pages | 11 |
Journal | IEEE Transactions on Information Forensics and Security |
Volume | 18 |
DOIs | |
State | Published - 2023 |
Externally published | Yes |
Keywords
- Hashing
- autoencoder
- generalization
- semantic
- supervised
ASJC Scopus subject areas
- Safety, Risk, Reliability and Quality
- Computer Networks and Communications