Language models for image captioning: The quirks and what works

Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Two recent approaches have achieved state-of-the-art results in image caption-ing. The first uses a pipelined process where a set of candidate words is gen-erated by a convolutional neural network (CNN) trained on images, and then a max-imum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this pa-per, we compare the merits of these dif-ferent language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine is-sues in the different approaches, includ-ing linguistic irregularities, caption repe-tition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the bench-mark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.

Original languageEnglish (US)
Title of host publicationACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages100-105
Number of pages6
ISBN (Electronic)9781941643730
DOIs
StatePublished - 2015
Externally publishedYes
Event53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015 - Beijing, China
Duration: Jul 26 2015Jul 31 2015

Publication series

NameACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference
Volume2

Other

Other53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015
Country/TerritoryChina
CityBeijing
Period7/26/157/31/15

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Language models for image captioning: The quirks and what works'. Together they form a unique fingerprint.

Cite this