DataDiff: User-interpretable data transformation summaries for collaborative data analysis

Gunce Su Yilmaz, Tana Wattanawaroon, Liqi Xu, Abhishek Nigam, Aaron J. Elmore, Aditya Parameswaran

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Interest in collaborative dataset versioning has emerged due to complex, ad-hoc, and collaborative nature of data science, and the need to record and reason about data at various stages of pre-processing, cleaning, and analysis. To support effective collaborative dataset versioning, one critical operation is differentiation: to succinctly describe what has changed from one dataset to the next. Differentiation, or diffing, allows users to understand changes between two versions, to better understand the evolution process, or to support effective merging or conflict detection across versions. We demonstrate DataDiff, a practical and concise data-diff tool that provides human-interpretable explanations of changes between datasets without reliance on the operations that led to the changes.

Original languageEnglish (US)
Title of host publicationSIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data
EditorsGautam Das, Christopher Jermaine, Ahmed Eldawy, Philip Bernstein
PublisherAssociation for Computing Machinery
Pages1769-1772
Number of pages4
ISBN (Electronic)9781450317436
DOIs
StatePublished - May 27 2018
Event44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018 - Houston, United States
Duration: Jun 10 2018Jun 15 2018

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018
CountryUnited States
CityHouston
Period6/10/186/15/18

Keywords

  • Differentiation
  • Versioning

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'DataDiff: User-interpretable data transformation summaries for collaborative data analysis'. Together they form a unique fingerprint.

Cite this