TY - GEN
T1 - An Iteratively-refined Dataset for High-Level Synthesis Functional Verification through LLM-Aided Bug Injection
AU - Wan, Lily Jiaxin
AU - Ye, Hanchen
AU - Wang, Jinghua
AU - Jha, Manvi
AU - Chen, Deming
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This paper explores the application of Large Language Models (LLMs) in the domain of High-Level Synthesis (HLS) for hardware design and verification, focusing on functional verification challenges. The scarcity of open-source HLS codebases, especially those containing bugs, poses a significant challenge, as LLMs require extensive datasets for efficient fine-tuning and evaluation. To tackle this, we introduce an innovative bug injection methodology working with a new dataset that we curated from a wide range of open-source HLS benchmark suites. This dataset features over 1,500 designs, with both the version injected with bugs and the corresponding bug-free version. Our bug injection method synergizes In-Context Learning (ICL) with Retrieval Augmented Generation (RAG), and Chain of Thought (CoT). This approach significantly boosts the dataset's overall validity rate for single-bug injections. We demonstrate our solution quality using GPT-4 Turbo for injecting either logic bugs or non-ideal pragmas (compiler directives) into HLS designs. For logic bugs, we achieve an 84.8% ratio for valid injection attempts. Furthermore, our approach maintains an 88.0% dataset validity rate (the valid bug injection rate). In addition, we also evaluate the quality of HLS pragma injections (focusing on non-ideal pragmas), and achieve a 74.0% attempt and an 87.9% valid injection ratio. Compared to brute-force prompting, our strategy yields a 20.4% and a 54.0% validity improvement for the bug and non-ideal pragma injection, respectively. The Chrysalis dataset is accessible at https://github.com/UIUC-ChenLab/Chrysalis-HLS.
AB - This paper explores the application of Large Language Models (LLMs) in the domain of High-Level Synthesis (HLS) for hardware design and verification, focusing on functional verification challenges. The scarcity of open-source HLS codebases, especially those containing bugs, poses a significant challenge, as LLMs require extensive datasets for efficient fine-tuning and evaluation. To tackle this, we introduce an innovative bug injection methodology working with a new dataset that we curated from a wide range of open-source HLS benchmark suites. This dataset features over 1,500 designs, with both the version injected with bugs and the corresponding bug-free version. Our bug injection method synergizes In-Context Learning (ICL) with Retrieval Augmented Generation (RAG), and Chain of Thought (CoT). This approach significantly boosts the dataset's overall validity rate for single-bug injections. We demonstrate our solution quality using GPT-4 Turbo for injecting either logic bugs or non-ideal pragmas (compiler directives) into HLS designs. For logic bugs, we achieve an 84.8% ratio for valid injection attempts. Furthermore, our approach maintains an 88.0% dataset validity rate (the valid bug injection rate). In addition, we also evaluate the quality of HLS pragma injections (focusing on non-ideal pragmas), and achieve a 74.0% attempt and an 87.9% valid injection ratio. Compared to brute-force prompting, our strategy yields a 20.4% and a 54.0% validity improvement for the bug and non-ideal pragma injection, respectively. The Chrysalis dataset is accessible at https://github.com/UIUC-ChenLab/Chrysalis-HLS.
KW - dataset
KW - functional verification
KW - High-Level Synthesis
KW - Large Language Models
UR - http://www.scopus.com/inward/record.url?scp=85206654740&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85206654740&partnerID=8YFLogxK
U2 - 10.1109/LAD62341.2024.10691860
DO - 10.1109/LAD62341.2024.10691860
M3 - Conference contribution
AN - SCOPUS:85206654740
T3 - 2024 IEEE LLM Aided Design Workshop, LAD 2024
BT - 2024 IEEE LLM Aided Design Workshop, LAD 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International LLM-Aided Design Workshop, LAD 2024
Y2 - 28 June 2024 through 29 June 2024
ER -