TY - GEN
T1 - DCM Explorer
T2 - 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022, held in conjunction with SIGMOD 2022
AU - Parulian, Nikolaus Nova
AU - Ludäscher, Bertram
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/6/17
Y1 - 2022/6/17
N2 - Data cleaning and preparation are essential phases of data science and machine learning (ML) workflows. Unfortunately, data cleaning processes are rarely well documented, despite the fact that they are error-prone and often involve hundreds of individual transformation steps. We have developed DCM (Data Cleaning Model) which captures provenance information for data cleaning. In this paper, we present DCM Explorer, a companion tool for DCM to explore and use data cleaning provenance. With DCM Explorer, a user can query and visualize the data cleaning workflows that are "hidden"in recorded provenance information, show different states of the data (as it underwent cleaning), explore an individual cell's history, etc. Through query-driven provenance reports, DCM Explorer adds valuable process documentation, making data cleaning more transparent, self-explanatory, and reusable.
AB - Data cleaning and preparation are essential phases of data science and machine learning (ML) workflows. Unfortunately, data cleaning processes are rarely well documented, despite the fact that they are error-prone and often involve hundreds of individual transformation steps. We have developed DCM (Data Cleaning Model) which captures provenance information for data cleaning. In this paper, we present DCM Explorer, a companion tool for DCM to explore and use data cleaning provenance. With DCM Explorer, a user can query and visualize the data cleaning workflows that are "hidden"in recorded provenance information, show different states of the data (as it underwent cleaning), explore an individual cell's history, etc. Through query-driven provenance reports, DCM Explorer adds valuable process documentation, making data cleaning more transparent, self-explanatory, and reusable.
KW - data cleaning
KW - data provenance
KW - scientific workflows
KW - transparency
UR - http://www.scopus.com/inward/record.url?scp=85133812215&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133812215&partnerID=8YFLogxK
U2 - 10.1145/3530800.3534539
DO - 10.1145/3530800.3534539
M3 - Conference contribution
AN - SCOPUS:85133812215
T3 - Proceedings of 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022
SP - 56
EP - 61
BT - Proceedings of 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022
PB - Association for Computing Machinery
Y2 - 17 June 2022
ER -