Abstract

The rise of the “big data” era has created a pressing demand for educating many data scientists and engineers quickly at low cost. It is essential they learn by working on assignments that involve real world data sets to develop the skills needed to be successful in the workplace. However, enabling instructors to flexibly deliver all kinds of data science assignments using real world data sets to large numbers of learners (both on-campus and off-campus) at low cost is a significant open challenge. To address this emerging challenge generally, we develop and deploy a novel Cloud-based Lab for Data Science (CLaDS) to enable many learners around the world to work on real-world data science problems without having to move or otherwise distribute prohibitively large data sets. Leveraging version control and continuous integration, CLaDS provides a general infrastructure to enable any instructor to conveniently deliver any hands-on data science assignment that uses large real world data sets to as many learners as our cloud-computing infrastructure allows at very low cost. In this paper, we present the design and implementation of CLaDS and discuss our experience with using CLaDS to deploy seven major text data assignments for students in both an on-campus course and an online course to work on for learning about text data retrieval and mining techniques; this shows that CLaDS is a very promising novel general infrastructure for efficiently delivering a wide range of hands-on data science assignments to a large number of learners at very low cost.

Original languageEnglish (US)
Title of host publicationITiCSE 2018 - Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education
EditorsPanayiotis Andreou, Michal Armoni, Janet C. Read, Irene Polycarpou
PublisherAssociation for Computing Machinery
Pages176-181
Number of pages6
ISBN (Electronic)9781450357074
DOIs
StatePublished - Jul 2 2018
Event23rd Annual ACM Conference on Innovation and Technology in Computer Science Education, ITiCSE 2018 - Larnaca, Cyprus
Duration: Jul 2 2018Jul 4 2018

Publication series

NameAnnual Conference on Innovation and Technology in Computer Science Education, ITiCSE
ISSN (Print)1942-647X

Other

Other23rd Annual ACM Conference on Innovation and Technology in Computer Science Education, ITiCSE 2018
Country/TerritoryCyprus
CityLarnaca
Period7/2/187/4/18

Keywords

  • Cloud computing
  • Data science education
  • Virtual lab

ASJC Scopus subject areas

  • Management of Technology and Innovation
  • Education

Fingerprint

Dive into the research topics of 'CLaDS: A cloud-based virtual lab for the delivery of scalable hands-on assignments for practical data science education'. Together they form a unique fingerprint.

Cite this