A first-principles algebraic approach to data transformations in data cleaning: Understanding provenance from the ground up

Santiago Núñez-Corrales, Lan Li, Bertram Ludäscher

Research output: Contribution to conferencePaperpeer-review

Abstract

We provide a model describing data transformation workflows on tables constructed from first principles, namely by defining datasets as structures with functions and sets for which certain morphisms correspond to data transformations. We define rigid and deep data transformations depending on whether the geometry of the dataset is preserved or not. Finally, we add a model of concurrency using meet and join operations. Our work suggests that algebraic structures and homotopy type theory provide a more general context than other formalisms to reason about data cleaning, data transformations and their provenance.

Original languageEnglish (US)
StatePublished - 2020
Externally publishedYes
Event12th International Workshop on Theory and Practice of Provenance, TaPP 2020 - Virtual, Online
Duration: Jun 22 2020 → …

Conference

Conference12th International Workshop on Theory and Practice of Provenance, TaPP 2020
CityVirtual, Online
Period6/22/20 → …

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'A first-principles algebraic approach to data transformations in data cleaning: Understanding provenance from the ground up'. Together they form a unique fingerprint.

Cite this