Combining Code Context and Fine-grained Code Difference for Commit Message Generation

Shengbin Xu, Yuan Yao, Feng Xu, Tianxiao Gu, Hanghang Tong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Generating natural language messages for source code changes is an essential task in software development and maintenance. Existing solutions mainly treat a piece of code difference as natural language, and adopt seq2seq learning to translate it into a commit message. The basic assumption of such solutions lies in the naturalness hypothesis, i.e., source code written by programming languages is to some extent similar to natural language text. However, compared with natural language, source code also bears syntactic regularities. In this paper, we propose to simultaneously model the naturalness and syntactic regularities of source code changes for commit message generation. Specifically, to model syntactic regularities, we first enlarge the input with additional context information, i.e., the code statements that have dependency with the variables in the code difference, and then extract the paths in the corresponding ASTs. Moreover, to better model code difference, we align the two versions of code before and after the committed code change at token level, and annotate their differences with fine-grained edit operations. The context and difference are simultaneously encoded in a learning framework to generate the commit messages. We collected from GitHub a large dataset containing 480 Java projects with over 160k commits, and the experimental results demonstrate the effectiveness of the proposed approach.

Original languageEnglish (US)
Title of host publication13th Asia-Pacific Symposium on Internetware, Internetware 2022 - Proceedings
PublisherAssociation for Computing Machinery
Pages242-251
Number of pages10
ISBN (Electronic)9781450397803
DOIs
StatePublished - Jun 11 2022
Event13th Asia-Pacific Symposium on Internetware, Internetware 2022 - Virtual, Online, China
Duration: Jun 11 2022Jun 12 2022

Publication series

NameACM International Conference Proceeding Series

Conference

Conference13th Asia-Pacific Symposium on Internetware, Internetware 2022
Country/TerritoryChina
CityVirtual, Online
Period6/11/226/12/22

Keywords

  • Commit message generation
  • code regularity
  • software naturalness

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Combining Code Context and Fine-grained Code Difference for Commit Message Generation'. Together they form a unique fingerprint.

Cite this