Inferring specifications for resources from natural language API documentation

Hao Zhong, Lu Zhang, Tao Xie, Hong Mei

Research output: Contribution to journalArticlepeer-review


Many software libraries, especially those commercial ones, provide API documentation in natural languages to describe correct API usages. However, developers may still write code that is inconsistent with API documentation, partially because many developers are reluctant to carefully read API documentation as shown by existing research. As these inconsistencies may indicate defects, researchers have proposed various detection approaches, and these approaches need many known specifications. As it is tedious to write specifications manually for all APIs, various approaches have been proposed to mine specifications automatically. In the literature, most existing mining approaches rely on analyzing client code, so these mining approaches would fail to mine specifications when client code is not sufficient. Instead of analyzing client code, we propose an approach, called Doc2Spec, that infers resource specifications from API documentation in natural languages. We evaluated our approach on the Javadocs of five libraries. The results show that our approach performs well on real scale libraries, and infers various specifications with relatively high precisions, recalls, and F-scores. We further used inferred specifications to detect defects in open source projects. The results show that specifications inferred by Doc2Spec are useful to detect real defects in existing projects.

Original languageEnglish (US)
Pages (from-to)227-261
Number of pages35
JournalAutomated Software Engineering
Issue number3-4
StatePublished - Dec 2011
Externally publishedYes


  • API documentation
  • Inferring specifications

ASJC Scopus subject areas

  • Software


Dive into the research topics of 'Inferring specifications for resources from natural language API documentation'. Together they form a unique fingerprint.

Cite this