Abstract
Documentation of datasets is a longstanding concern for data curation and research data management. Additionally, work in AI ethics, fairness, accountability, and transparency has taken on the challenge of documenting and describing datasets used to train or test machine learning models, though with few overlaps or points of intersection with information science approaches. In order to foster increased conversation and collaboration across fields, I aim here to both assess the current landscape of AI's data documentation frameworks and understand where shared interests with RDM might be possible and fruitful. I analyze four prominent frameworks for documenting AI datasets, considering: (a) their goals, influences and precedents; (b) formal qualities of their materiality, and, (c) noting where and how each framework has been adopted and applied. Results reveal some common features of documentation frameworks, as well as diverging goals and constraints. I close by reflecting on ways that information science might learn from these approaches stemming from AI to inform future work on documentation of datasets for scientific research and beyond.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 458-471 |
| Number of pages | 14 |
| Journal | Proceedings of the Association for Information Science and Technology |
| Volume | 62 |
| Issue number | 1 |
| DOIs | |
| State | Published - Oct 2025 |
Keywords
- data curation
- data documentation
- datasheets for datasets
- metadata
- research data management
ASJC Scopus subject areas
- General Computer Science
- Library and Information Sciences
Fingerprint
Dive into the research topics of 'What is a Data Document? Analyzing Four Emerging Data Documentation Frameworks in AI/ML'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS