An Iteratively-refined Dataset for High-Level Synthesis Functional Verification through LLM-Aided Bug Injection

Lily Jiaxin Wan, Hanchen Ye, Jinghua Wang, Manvi Jha, Deming Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper explores the application of Large Language Models (LLMs) in the domain of High-Level Synthesis (HLS) for hardware design and verification, focusing on functional verification challenges. The scarcity of open-source HLS codebases, especially those containing bugs, poses a significant challenge, as LLMs require extensive datasets for efficient fine-tuning and evaluation. To tackle this, we introduce an innovative bug injection methodology working with a new dataset that we curated from a wide range of open-source HLS benchmark suites. This dataset features over 1,500 designs, with both the version injected with bugs and the corresponding bug-free version. Our bug injection method synergizes In-Context Learning (ICL) with Retrieval Augmented Generation (RAG), and Chain of Thought (CoT). This approach significantly boosts the dataset's overall validity rate for single-bug injections. We demonstrate our solution quality using GPT-4 Turbo for injecting either logic bugs or non-ideal pragmas (compiler directives) into HLS designs. For logic bugs, we achieve an 84.8% ratio for valid injection attempts. Furthermore, our approach maintains an 88.0% dataset validity rate (the valid bug injection rate). In addition, we also evaluate the quality of HLS pragma injections (focusing on non-ideal pragmas), and achieve a 74.0% attempt and an 87.9% valid injection ratio. Compared to brute-force prompting, our strategy yields a 20.4% and a 54.0% validity improvement for the bug and non-ideal pragma injection, respectively. The Chrysalis dataset is accessible at https://github.com/UIUC-ChenLab/Chrysalis-HLS.

Original languageEnglish (US)
Title of host publication2024 IEEE LLM Aided Design Workshop, LAD 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350376081
DOIs
StatePublished - 2024
Event2024 IEEE International LLM-Aided Design Workshop, LAD 2024 - San Jose, United States
Duration: Jun 28 2024Jun 29 2024

Publication series

Name2024 IEEE LLM Aided Design Workshop, LAD 2024

Conference

Conference2024 IEEE International LLM-Aided Design Workshop, LAD 2024
Country/TerritoryUnited States
CitySan Jose
Period6/28/246/29/24

Keywords

  • dataset
  • functional verification
  • High-Level Synthesis
  • Large Language Models

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications
  • Software
  • Control and Optimization

Fingerprint

Dive into the research topics of 'An Iteratively-refined Dataset for High-Level Synthesis Functional Verification through LLM-Aided Bug Injection'. Together they form a unique fingerprint.

Cite this