WebWISE: Unlocking Web Interface Control for LLMs via Sequential Exploration

Heyi Tao, T. V. Sethuraman, Michal Shlapentokh-Rothman, Tanmay Gupta, Heng Ji, Derek Hoiem

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper investigates using Large Language Models (LLMs) to automatically perform web software tasks using click, scroll, and text input operations. Previous approaches, such as reinforcement learning (RL) or imitation learning, are inefficient to train and task-specific. Our method uses filtered Document Object Model (DOM) elements as observations and performs tasks step-by-step, sequentially generating small programs based on the current observations. We use in-context learning, either benefiting from a single manually provided example, or an automatically generated example based on a successful zero-shot trial. We evaluate our proposed method on the MiniWob++ benchmark. With only one in-context example, our WebWISE method using gpt-3.5-turbo achieves similar or better performance than other methods that require many demonstrations or trials.

Original languageEnglish (US)
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationNAACL 2024 - Findings
EditorsKevin Duh, Helena Gomez, Steven Bethard
PublisherAssociation for Computational Linguistics (ACL)
Pages3693-3711
Number of pages19
ISBN (Electronic)9798891761193
StatePublished - 2024
Event2024 Findings of the Association for Computational Linguistics: NAACL 2024 - Mexico City, Mexico
Duration: Jun 16 2024Jun 21 2024

Publication series

NameFindings of the Association for Computational Linguistics: NAACL 2024 - Findings

Conference

Conference2024 Findings of the Association for Computational Linguistics: NAACL 2024
Country/TerritoryMexico
CityMexico City
Period6/16/246/21/24

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'WebWISE: Unlocking Web Interface Control for LLMs via Sequential Exploration'. Together they form a unique fingerprint.

Cite this