Skip to main navigation Skip to search Skip to main content

What is a Data Document? Analyzing Four Emerging Data Documentation Frameworks in AI/ML

Research output: Contribution to journalArticlepeer-review

Abstract

Documentation of datasets is a longstanding concern for data curation and research data management. Additionally, work in AI ethics, fairness, accountability, and transparency has taken on the challenge of documenting and describing datasets used to train or test machine learning models, though with few overlaps or points of intersection with information science approaches. In order to foster increased conversation and collaboration across fields, I aim here to both assess the current landscape of AI's data documentation frameworks and understand where shared interests with RDM might be possible and fruitful. I analyze four prominent frameworks for documenting AI datasets, considering: (a) their goals, influences and precedents; (b) formal qualities of their materiality, and, (c) noting where and how each framework has been adopted and applied. Results reveal some common features of documentation frameworks, as well as diverging goals and constraints. I close by reflecting on ways that information science might learn from these approaches stemming from AI to inform future work on documentation of datasets for scientific research and beyond.

Original languageEnglish (US)
Pages (from-to)458-471
Number of pages14
JournalProceedings of the Association for Information Science and Technology
Volume62
Issue number1
DOIs
StatePublished - Oct 2025

Keywords

  • data curation
  • data documentation
  • datasheets for datasets
  • metadata
  • research data management

ASJC Scopus subject areas

  • General Computer Science
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'What is a Data Document? Analyzing Four Emerging Data Documentation Frameworks in AI/ML'. Together they form a unique fingerprint.

Cite this