TY - CONF
T1 - A first-principles algebraic approach to data transformations in data cleaning
T2 - 12th International Workshop on Theory and Practice of Provenance, TaPP 2020
AU - Núñez-Corrales, Santiago
AU - Li, Lan
AU - Ludäscher, Bertram
N1 - Funding Information:
The authors acknowledge support by the Center for Informatics Research in Science & Scholarship at the School of Information Sciences. This work has been partially funded by NSF award 1541450 (Whole Tale) and a ACM/Intel SIGHPC Computational and Data Science Fellowship (2017). We thank reviewers for their thoughtful and extensive comments.
PY - 2020
Y1 - 2020
N2 - We provide a model describing data transformation workflows on tables constructed from first principles, namely by defining datasets as structures with functions and sets for which certain morphisms correspond to data transformations. We define rigid and deep data transformations depending on whether the geometry of the dataset is preserved or not. Finally, we add a model of concurrency using meet and join operations. Our work suggests that algebraic structures and homotopy type theory provide a more general context than other formalisms to reason about data cleaning, data transformations and their provenance.
AB - We provide a model describing data transformation workflows on tables constructed from first principles, namely by defining datasets as structures with functions and sets for which certain morphisms correspond to data transformations. We define rigid and deep data transformations depending on whether the geometry of the dataset is preserved or not. Finally, we add a model of concurrency using meet and join operations. Our work suggests that algebraic structures and homotopy type theory provide a more general context than other formalisms to reason about data cleaning, data transformations and their provenance.
UR - http://www.scopus.com/inward/record.url?scp=85094857454&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094857454&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85094857454
Y2 - 22 June 2020
ER -