DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding

Shuijing Liu, Aamir Hasan, Kaiwen Hong, Runxuan Wang, Peixin Chang, Zachary Mizrachi, Justin Lin, D. Livingston McPherson, Wendy A. Rogers, Katherine Driggs-Campbell

Research output: Contribution to journalArticlepeer-review

Abstract

Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form language to the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner.

Original languageEnglish (US)
Pages (from-to)3712-3719
Number of pages8
JournalIEEE Robotics and Automation Letters
Volume9
Issue number4
DOIs
StatePublished - Apr 1 2024

Keywords

  • AI-enabled robotics
  • Human-centered robotics
  • natural dialog for HRI

ASJC Scopus subject areas

  • Mechanical Engineering
  • Control and Optimization
  • Artificial Intelligence
  • Human-Computer Interaction
  • Control and Systems Engineering
  • Computer Vision and Pattern Recognition
  • Biomedical Engineering
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding'. Together they form a unique fingerprint.

Cite this