TY - GEN
T1 - Learning deep structure-preserving image-text embeddings
AU - Wang, Liwei
AU - Li, Yin
AU - Lazebnik, Svetlana
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/9
Y1 - 2016/12/9
N2 - This paper proposes a method for learning joint embeddings of images and text using a two-branch neural network with multiple layers of linear projections followed by nonlinearities. The network is trained using a largemargin objective that combines cross-view ranking constraints with within-view neighborhood structure preservation constraints inspired by metric learning literature. Extensive experiments show that our approach gains significant improvements in accuracy for image-to-text and textto-image retrieval. Our method achieves new state-of-theart results on the Flickr30K and MSCOCO image-sentence datasets and shows promise on the new task of phrase localization on the Flickr30K Entities dataset.
AB - This paper proposes a method for learning joint embeddings of images and text using a two-branch neural network with multiple layers of linear projections followed by nonlinearities. The network is trained using a largemargin objective that combines cross-view ranking constraints with within-view neighborhood structure preservation constraints inspired by metric learning literature. Extensive experiments show that our approach gains significant improvements in accuracy for image-to-text and textto-image retrieval. Our method achieves new state-of-theart results on the Flickr30K and MSCOCO image-sentence datasets and shows promise on the new task of phrase localization on the Flickr30K Entities dataset.
UR - http://www.scopus.com/inward/record.url?scp=84986271102&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84986271102&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2016.541
DO - 10.1109/CVPR.2016.541
M3 - Conference contribution
AN - SCOPUS:84986271102
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 5005
EP - 5013
BT - Proceedings - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
PB - IEEE Computer Society
T2 - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
Y2 - 26 June 2016 through 1 July 2016
ER -