Brainwash: A data system for feature engineering

Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Ré, Ce Zhang

Research output: Contribution to conferencePaperpeer-review

Abstract

A new generation of data processing systems, including web search, Google’s Knowledge Graph, IBM’s Watson, and several different recommendation systems, combine rich databases with software driven by machine learning. The spectacular successes of these trained systems have been among the most notable in all of computing and have generated excitement in health care, finance, energy, and general business. But building them can be challenging, even for computer scientists with PhD-level training. If these systems are to have a truly broad impact, building them must become easier. We explore one crucial pain point in the construction of trained systems: feature engineering. Given the sheer size of modern datasets, feature developers must (1) write code with few effective clues about how their code will interact with the data and (2) repeatedly endure long system waits even though their code typically changes little from run to run. We propose brainwash, a vision for a feature engineering data system that could dramatically ease the Explore-Extract-Evaluate interaction loop that characterizes many trained system projects.

Original languageEnglish (US)
StatePublished - 2013
Externally publishedYes
Event6th Biennial Conference on Innovative Data Systems Research, CIDR 2013 - Pacific Grove, United States
Duration: Jan 6 2013Jan 9 2013

Conference

Conference6th Biennial Conference on Innovative Data Systems Research, CIDR 2013
Country/TerritoryUnited States
CityPacific Grove
Period1/6/131/9/13

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems and Management
  • Artificial Intelligence
  • Information Systems

Fingerprint

Dive into the research topics of 'Brainwash: A data system for feature engineering'. Together they form a unique fingerprint.

Cite this