TY - GEN
T1 - DataDiff
T2 - 44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018
AU - Yilmaz, Gunce Su
AU - Wattanawaroon, Tana
AU - Xu, Liqi
AU - Nigam, Abhishek
AU - Elmore, Aaron J.
AU - Parameswaran, Aditya
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/5/27
Y1 - 2018/5/27
N2 - Interest in collaborative dataset versioning has emerged due to complex, ad-hoc, and collaborative nature of data science, and the need to record and reason about data at various stages of pre-processing, cleaning, and analysis. To support effective collaborative dataset versioning, one critical operation is differentiation: to succinctly describe what has changed from one dataset to the next. Differentiation, or diffing, allows users to understand changes between two versions, to better understand the evolution process, or to support effective merging or conflict detection across versions. We demonstrate DataDiff, a practical and concise data-diff tool that provides human-interpretable explanations of changes between datasets without reliance on the operations that led to the changes.
AB - Interest in collaborative dataset versioning has emerged due to complex, ad-hoc, and collaborative nature of data science, and the need to record and reason about data at various stages of pre-processing, cleaning, and analysis. To support effective collaborative dataset versioning, one critical operation is differentiation: to succinctly describe what has changed from one dataset to the next. Differentiation, or diffing, allows users to understand changes between two versions, to better understand the evolution process, or to support effective merging or conflict detection across versions. We demonstrate DataDiff, a practical and concise data-diff tool that provides human-interpretable explanations of changes between datasets without reliance on the operations that led to the changes.
KW - Differentiation
KW - Versioning
UR - http://www.scopus.com/inward/record.url?scp=85048796083&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048796083&partnerID=8YFLogxK
U2 - 10.1145/3183713.3193564
DO - 10.1145/3183713.3193564
M3 - Conference contribution
AN - SCOPUS:85048796083
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1769
EP - 1772
BT - SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data
A2 - Das, Gautam
A2 - Jermaine, Christopher
A2 - Eldawy, Ahmed
A2 - Bernstein, Philip
PB - Association for Computing Machinery
Y2 - 10 June 2018 through 15 June 2018
ER -