Collaborative data analytics with datahub

Anant Bhardwaj, Amol Deshpande, Aaron J. Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, Rebecca Zhang

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

While there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for collaborative data analytics, especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools or languages for data analysis, and writing scripts to clean, preprocess, or query data. We demonstrate DataHub, a unified platform with the ability to load, store, query, collaboratively analyze, interactively visualize, interface with external applications, and share datasets. We will demonstrate the following aspects of the DataHub platform: (a) flexible data storage, sharing, and native versioning capabilities: multiple conference attendees can concurrently update the database and browse the different versions and inspect conflicts; (b) an app ecosystem that hosts apps for various dataprocessing activities: conference attendees will be able to effortlessly ingest, query, and visualize data using our existing apps; (c) thrift-based data serialization permits data analysis in any combination of 20+ languages, with DataHub as the common data store: conference attendees will be able to analyze datasets in R, Python, and Matlab, while the inputs and the results are still stored in DataHub. In particular, conference attendees will be able to use the DataHub notebook-an IPython-based notebook for analyzing data and storing the results of data analysis.

Original languageEnglish (US)
Title of host publicationProceedings of the VLDB Endowment
PublisherAssociation for Computing Machinery
Pages1916-1919
Number of pages4
Edition12
DOIs
StatePublished - Jan 1 2015
Event3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, Korea, Republic of
Duration: Sep 11 2006Sep 11 2006

Publication series

NameProceedings of the VLDB Endowment
Number12
Volume8
ISSN (Electronic)2150-8097

Other

Other3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006
CountryKorea, Republic of
CitySeoul
Period9/11/069/11/06

Fingerprint

Application programs
Ecosystems
Data storage equipment

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Bhardwaj, A., Deshpande, A., Elmore, A. J., Karger, D., Madden, S., Parameswaran, A., ... Zhang, R. (2015). Collaborative data analytics with datahub. In Proceedings of the VLDB Endowment (12 ed., pp. 1916-1919). (Proceedings of the VLDB Endowment; Vol. 8, No. 12). Association for Computing Machinery. https://doi.org/10.14778/2824032.2824100

Collaborative data analytics with datahub. / Bhardwaj, Anant; Deshpande, Amol; Elmore, Aaron J.; Karger, David; Madden, Sam; Parameswaran, Aditya; Subramanyam, Harihar; Wu, Eugene; Zhang, Rebecca.

Proceedings of the VLDB Endowment. 12. ed. Association for Computing Machinery, 2015. p. 1916-1919 (Proceedings of the VLDB Endowment; Vol. 8, No. 12).

Research output: Chapter in Book/Report/Conference proceedingChapter

Bhardwaj, A, Deshpande, A, Elmore, AJ, Karger, D, Madden, S, Parameswaran, A, Subramanyam, H, Wu, E & Zhang, R 2015, Collaborative data analytics with datahub. in Proceedings of the VLDB Endowment. 12 edn, Proceedings of the VLDB Endowment, no. 12, vol. 8, Association for Computing Machinery, pp. 1916-1919, 3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006, Seoul, Korea, Republic of, 9/11/06. https://doi.org/10.14778/2824032.2824100
Bhardwaj A, Deshpande A, Elmore AJ, Karger D, Madden S, Parameswaran A et al. Collaborative data analytics with datahub. In Proceedings of the VLDB Endowment. 12 ed. Association for Computing Machinery. 2015. p. 1916-1919. (Proceedings of the VLDB Endowment; 12). https://doi.org/10.14778/2824032.2824100
Bhardwaj, Anant ; Deshpande, Amol ; Elmore, Aaron J. ; Karger, David ; Madden, Sam ; Parameswaran, Aditya ; Subramanyam, Harihar ; Wu, Eugene ; Zhang, Rebecca. / Collaborative data analytics with datahub. Proceedings of the VLDB Endowment. 12. ed. Association for Computing Machinery, 2015. pp. 1916-1919 (Proceedings of the VLDB Endowment; 12).
@inbook{4cb4a9ca657143709723dd506e817f4e,
title = "Collaborative data analytics with datahub",
abstract = "While there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for collaborative data analytics, especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools or languages for data analysis, and writing scripts to clean, preprocess, or query data. We demonstrate DataHub, a unified platform with the ability to load, store, query, collaboratively analyze, interactively visualize, interface with external applications, and share datasets. We will demonstrate the following aspects of the DataHub platform: (a) flexible data storage, sharing, and native versioning capabilities: multiple conference attendees can concurrently update the database and browse the different versions and inspect conflicts; (b) an app ecosystem that hosts apps for various dataprocessing activities: conference attendees will be able to effortlessly ingest, query, and visualize data using our existing apps; (c) thrift-based data serialization permits data analysis in any combination of 20+ languages, with DataHub as the common data store: conference attendees will be able to analyze datasets in R, Python, and Matlab, while the inputs and the results are still stored in DataHub. In particular, conference attendees will be able to use the DataHub notebook-an IPython-based notebook for analyzing data and storing the results of data analysis.",
author = "Anant Bhardwaj and Amol Deshpande and Elmore, {Aaron J.} and David Karger and Sam Madden and Aditya Parameswaran and Harihar Subramanyam and Eugene Wu and Rebecca Zhang",
year = "2015",
month = "1",
day = "1",
doi = "10.14778/2824032.2824100",
language = "English (US)",
series = "Proceedings of the VLDB Endowment",
publisher = "Association for Computing Machinery",
number = "12",
pages = "1916--1919",
booktitle = "Proceedings of the VLDB Endowment",
edition = "12",

}

TY - CHAP

T1 - Collaborative data analytics with datahub

AU - Bhardwaj, Anant

AU - Deshpande, Amol

AU - Elmore, Aaron J.

AU - Karger, David

AU - Madden, Sam

AU - Parameswaran, Aditya

AU - Subramanyam, Harihar

AU - Wu, Eugene

AU - Zhang, Rebecca

PY - 2015/1/1

Y1 - 2015/1/1

N2 - While there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for collaborative data analytics, especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools or languages for data analysis, and writing scripts to clean, preprocess, or query data. We demonstrate DataHub, a unified platform with the ability to load, store, query, collaboratively analyze, interactively visualize, interface with external applications, and share datasets. We will demonstrate the following aspects of the DataHub platform: (a) flexible data storage, sharing, and native versioning capabilities: multiple conference attendees can concurrently update the database and browse the different versions and inspect conflicts; (b) an app ecosystem that hosts apps for various dataprocessing activities: conference attendees will be able to effortlessly ingest, query, and visualize data using our existing apps; (c) thrift-based data serialization permits data analysis in any combination of 20+ languages, with DataHub as the common data store: conference attendees will be able to analyze datasets in R, Python, and Matlab, while the inputs and the results are still stored in DataHub. In particular, conference attendees will be able to use the DataHub notebook-an IPython-based notebook for analyzing data and storing the results of data analysis.

AB - While there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for collaborative data analytics, especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools or languages for data analysis, and writing scripts to clean, preprocess, or query data. We demonstrate DataHub, a unified platform with the ability to load, store, query, collaboratively analyze, interactively visualize, interface with external applications, and share datasets. We will demonstrate the following aspects of the DataHub platform: (a) flexible data storage, sharing, and native versioning capabilities: multiple conference attendees can concurrently update the database and browse the different versions and inspect conflicts; (b) an app ecosystem that hosts apps for various dataprocessing activities: conference attendees will be able to effortlessly ingest, query, and visualize data using our existing apps; (c) thrift-based data serialization permits data analysis in any combination of 20+ languages, with DataHub as the common data store: conference attendees will be able to analyze datasets in R, Python, and Matlab, while the inputs and the results are still stored in DataHub. In particular, conference attendees will be able to use the DataHub notebook-an IPython-based notebook for analyzing data and storing the results of data analysis.

UR - http://www.scopus.com/inward/record.url?scp=84953870023&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84953870023&partnerID=8YFLogxK

U2 - 10.14778/2824032.2824100

DO - 10.14778/2824032.2824100

M3 - Chapter

AN - SCOPUS:84953870023

T3 - Proceedings of the VLDB Endowment

SP - 1916

EP - 1919

BT - Proceedings of the VLDB Endowment

PB - Association for Computing Machinery

ER -