ChatGPT for GTFS: benchmarking LLMs on GTFS semantics.. and retrieval

Saipraneeth Devunuri, Shirin Qiam, Lewis J. Lehe

Research output: Contribution to journalArticlepeer-review


The General Transit Feed Specification (GTFS) standard for publishing transit data is ubiquitous. With the advent of LLMs being used widely, this research explores the possibility of extracting transit information from GTFS through natural language instructions. To evaluate the capabilities and limitations of LLMs, we introduce two benchmarks, namely “GTFS Semantics” and “GTFS Retrieval” that test how well LLMs can “understand” GTFS standards and retrieve relevant transit information. We benchmark OpenAI’s GPT-3.5 Turbo and GPT-4 LLMs, which are backends for the ChatGPT interface. In particular, we use zero-shot, one-shot, chain of thought, and program synthesis techniques with prompt engineering. For our multiple questions, GPT-3.5 Turbo answers 59.7% correctly and GPT-4 answers 73.3% correctly, but they do worse when one of the multiple choice options is replaced by “None of these”. Furthermore, we evaluate how well the LLMs can extract information from a filtered GTFS feed containing four bus routes from the Chicago Transit Authority. Program synthesis techniques outperformed zero-shot approaches, achieving up to 93% (90%) accuracy for simple queries and 61% (41%) for complex ones using GPT-4 (GPT-3.5 Turbo).

Original languageEnglish (US)
JournalPublic Transport
StateAccepted/In press - 2024


  • ChatGPT
  • Generative AI
  • GPT-3.5 Turbo
  • GPT-4
  • GTFS
  • Large language models

ASJC Scopus subject areas

  • Information Systems
  • Transportation
  • Mechanical Engineering
  • Management Science and Operations Research


Dive into the research topics of 'ChatGPT for GTFS: benchmarking LLMs on GTFS semantics.. and retrieval'. Together they form a unique fingerprint.

Cite this