Analyzing the Performance of Large Language Models on Code Summarization

Rajarshi Haldar, Julia Hockenmaier

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large language models (LLMs) such as Llama 2 perform very well on tasks that involve both natural language and source code, particularly code summarization and code generation. We show that for the task of code summarization, the performance of these models on individual examples often depends on the amount of (subword) token overlap between the code and the corresponding reference natural language descriptions in the dataset. This token overlap arises because the reference descriptions in standard datasets (corresponding to docstrings in large code bases) are often highly similar to the names of the functions they describe. We also show that this token overlap occurs largely in the function names of the code and compare the relative performance of these models after removing function names versus removing code structure. We also show that using multiple evaluation metrics like BLEU and BERTScore gives us very little additional insight since these metrics are highly correlated with each other.

Original languageEnglish (US)
Title of host publication2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
EditorsNicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
PublisherEuropean Language Resources Association (ELRA)
Pages995-1008
Number of pages14
ISBN (Electronic)9782493814104
StatePublished - 2024
EventJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 - Hybrid, Torino, Italy
Duration: May 20 2024May 25 2024

Publication series

Name2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

Conference

ConferenceJoint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Country/TerritoryItaly
CityHybrid, Torino
Period5/20/245/25/24

Keywords

  • Natural Language Generation
  • Neural Language Representation Models
  • Summarisation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computational Theory and Mathematics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Analyzing the Performance of Large Language Models on Code Summarization'. Together they form a unique fingerprint.

Cite this