TY - GEN
T1 - Resolving Referring Expressions in Images with Labeled Elements
AU - Wichers, Nevan
AU - Hakkani-Tur, Dilek
AU - Chen, Jindong
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Images may have elements containing text and a bounding box associated with them, for example, text identified via optical character recognition on a computer screen image, or a natural image with labeled objects. We present an end-to-end trainable architecture to incorporate the information from these elements and the image to segment/identify the part of the image a natural language expression is referring to. We calculate an embedding for each element and then project it onto the corresponding location (i.e., the associated bounding box) of the image feature map. We show that this architecture gives an improvement in resolving referring expressions, over only using the image, and other methods that incorporate the element information. We demonstrate experimental results on the referring expression datasets based on COCO, and on a webpage image referring expression dataset that we developed.
AB - Images may have elements containing text and a bounding box associated with them, for example, text identified via optical character recognition on a computer screen image, or a natural image with labeled objects. We present an end-to-end trainable architecture to incorporate the information from these elements and the image to segment/identify the part of the image a natural language expression is referring to. We calculate an embedding for each element and then project it onto the corresponding location (i.e., the associated bounding box) of the image feature map. We show that this architecture gives an improvement in resolving referring expressions, over only using the image, and other methods that incorporate the element information. We demonstrate experimental results on the referring expression datasets based on COCO, and on a webpage image referring expression dataset that we developed.
KW - Deep Learning
KW - Natural Language Processing
KW - Referring Expression Resolution
KW - Segmentation
UR - http://www.scopus.com/inward/record.url?scp=85063088632&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063088632&partnerID=8YFLogxK
U2 - 10.1109/SLT.2018.8639518
DO - 10.1109/SLT.2018.8639518
M3 - Conference contribution
AN - SCOPUS:85063088632
T3 - 2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
SP - 800
EP - 806
BT - 2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE Spoken Language Technology Workshop, SLT 2018
Y2 - 18 December 2018 through 21 December 2018
ER -