TY - GEN
T1 - Improving Multi-label Recognition using Class Co-Occurrence Probabilities
AU - Rawlekar, Samyak
AU - Bhatnagar, Shubhang
AU - Srinivasulu, Vishnuvardhan Pogunulu
AU - Ahuja, Narendra
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such co-occurrences can be captured from the training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR datasets, where our approach outperforms all state-of-the-art methods.
AB - Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such co-occurrences can be captured from the training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR datasets, where our approach outperforms all state-of-the-art methods.
KW - Graph Convolution Networks
KW - Multi-label Recognition
KW - Vision-Language Models
UR - http://www.scopus.com/inward/record.url?scp=85212513271&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85212513271&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-78192-6_28
DO - 10.1007/978-3-031-78192-6_28
M3 - Conference contribution
AN - SCOPUS:85212513271
SN - 9783031781919
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 424
EP - 439
BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
A2 - Antonacopoulos, Apostolos
A2 - Chaudhuri, Subhasis
A2 - Chellappa, Rama
A2 - Liu, Cheng-Lin
A2 - Bhattacharya, Saumik
A2 - Pal, Umapada
PB - Springer
T2 - 27th International Conference on Pattern Recognition, ICPR 2024
Y2 - 1 December 2024 through 5 December 2024
ER -