Language values for DataCite dataset records



This dataset was extracted from a set of metadata files harvested from the DataCite metadata store ( during December 2015. Metadata records for items with a resourceType of dataset were collected. 1,647,949 total records were collected.

This dataset contains four files:

1) readme.txt: a readme file.
2) language-results.csv: A CSV file containing three columns: DOI, DOI prefix, and language text contents
3) language-counts.csv: A CSV file containing counts for unique language text content values.
4) language-grouped-counts.txt: A text file containing the results of manually grouping these language codes.
Date made availableJun 23 2016
PublisherUniversity of Illinois Urbana-Champaign


  • datacite
  • metadata
  • language codes
  • repository data

Cite this