Brown dog

Making the digital world a better place, a few files at a time

Sandeep Puthanveetil Satheesan, Benjamin Galewsky, Jong Sung Lee, M. Christopher, Bing Zhang, Jay Alameda, Gregory Jansen, Richard Marciano, Arthur R Schmidt, Yan Zhao, Shannon Bradley, Rob Kooper, Luigi Marini, Marcus Slavenas, Inna Zharnitsky, Michael Dietze, Praveen Kumar, Barbara S. Minsker, William C Sullivan, Kenton Guadron McHenry

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Brown Dog is a data transformation service for auto-curation of long-tail data. In this digital age, we have more data available for analysis than ever and this trend will only increase. According to most estimates, 70-80% of this data is unstructured, and together with unsupported data formats and inaccessible software tools, in essence, this data is not either easily accessible or usable to its owners in a meaningful way. Brown Dog aims at making this data more accessible and usable by auto-curation and indexing, leveraging existing and novel data transformation tools. In this paper, we discuss the recent major component improvements to Brown Dog including transformation tools called extractors and converters; desktop, web and terminal-based clients which perform data transformations; libraries written in multiple programming languages which integrate with existing software and extend their data curation capabilities; an online tool store for users to contribute, manage and share data transformation tools and receive credit for developing them; cyberinfrastructure for deploying the system on diverse computing platforms leveraging scalability via Docker swarm;workflow management service for creatively integrating existing transformations to generate custom, reproducible workflows which meet research needs, and its data management capabilities. This paper also discusses data transformation tools developed to support some\ scientific and allied use cases, thereby benefiting researchers in diverse domains. Finally, we briefly discuss our future directions with regard to production deployments as well as how users can access Brown Dog to manage their un-curated unstructured data.

Original languageEnglish (US)
Title of host publicationPractice and Experience in Advanced Research Computing 2018
Subtitle of host publicationSeamless Creativity, PEARC 2018
PublisherAssociation for Computing Machinery
ISBN (Print)9781450364461
DOIs
StatePublished - Jul 22 2018
Event2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018 - Pittsburgh, United States
Duration: Jul 22 2017Jul 26 2017

Publication series

NameACM International Conference Proceeding Series

Other

Other2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018
CountryUnited States
CityPittsburgh
Period7/22/177/26/17

Fingerprint

Computer programming languages
Information management
Scalability

Keywords

  • API gateway
  • Auto-curation
  • Big data
  • Data conversion
  • Data curation
  • Data cyberinfrastructure
  • Data transformation
  • Data wrangling
  • Metadata extraction
  • Orchestration
  • Provenance
  • Unstructured data

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Satheesan, S. P., Galewsky, B., Lee, J. S., Christopher, M., Zhang, B., Alameda, J., ... McHenry, K. G. (2018). Brown dog: Making the digital world a better place, a few files at a time. In Practice and Experience in Advanced Research Computing 2018: Seamless Creativity, PEARC 2018 [a38] (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3219104.3219132

Brown dog : Making the digital world a better place, a few files at a time. / Satheesan, Sandeep Puthanveetil; Galewsky, Benjamin; Lee, Jong Sung; Christopher, M.; Zhang, Bing; Alameda, Jay; Jansen, Gregory; Marciano, Richard; Schmidt, Arthur R; Zhao, Yan; Bradley, Shannon; Kooper, Rob; Marini, Luigi; Slavenas, Marcus; Zharnitsky, Inna; Dietze, Michael; Kumar, Praveen; Minsker, Barbara S.; Sullivan, William C; McHenry, Kenton Guadron.

Practice and Experience in Advanced Research Computing 2018: Seamless Creativity, PEARC 2018. Association for Computing Machinery, 2018. a38 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Satheesan, SP, Galewsky, B, Lee, JS, Christopher, M, Zhang, B, Alameda, J, Jansen, G, Marciano, R, Schmidt, AR, Zhao, Y, Bradley, S, Kooper, R, Marini, L, Slavenas, M, Zharnitsky, I, Dietze, M, Kumar, P, Minsker, BS, Sullivan, WC & McHenry, KG 2018, Brown dog: Making the digital world a better place, a few files at a time. in Practice and Experience in Advanced Research Computing 2018: Seamless Creativity, PEARC 2018., a38, ACM International Conference Proceeding Series, Association for Computing Machinery, 2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018, Pittsburgh, United States, 7/22/17. https://doi.org/10.1145/3219104.3219132
Satheesan SP, Galewsky B, Lee JS, Christopher M, Zhang B, Alameda J et al. Brown dog: Making the digital world a better place, a few files at a time. In Practice and Experience in Advanced Research Computing 2018: Seamless Creativity, PEARC 2018. Association for Computing Machinery. 2018. a38. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3219104.3219132
Satheesan, Sandeep Puthanveetil ; Galewsky, Benjamin ; Lee, Jong Sung ; Christopher, M. ; Zhang, Bing ; Alameda, Jay ; Jansen, Gregory ; Marciano, Richard ; Schmidt, Arthur R ; Zhao, Yan ; Bradley, Shannon ; Kooper, Rob ; Marini, Luigi ; Slavenas, Marcus ; Zharnitsky, Inna ; Dietze, Michael ; Kumar, Praveen ; Minsker, Barbara S. ; Sullivan, William C ; McHenry, Kenton Guadron. / Brown dog : Making the digital world a better place, a few files at a time. Practice and Experience in Advanced Research Computing 2018: Seamless Creativity, PEARC 2018. Association for Computing Machinery, 2018. (ACM International Conference Proceeding Series).
@inproceedings{98b15d79a1ec41169f100ea264ffed4b,
title = "Brown dog: Making the digital world a better place, a few files at a time",
abstract = "Brown Dog is a data transformation service for auto-curation of long-tail data. In this digital age, we have more data available for analysis than ever and this trend will only increase. According to most estimates, 70-80{\%} of this data is unstructured, and together with unsupported data formats and inaccessible software tools, in essence, this data is not either easily accessible or usable to its owners in a meaningful way. Brown Dog aims at making this data more accessible and usable by auto-curation and indexing, leveraging existing and novel data transformation tools. In this paper, we discuss the recent major component improvements to Brown Dog including transformation tools called extractors and converters; desktop, web and terminal-based clients which perform data transformations; libraries written in multiple programming languages which integrate with existing software and extend their data curation capabilities; an online tool store for users to contribute, manage and share data transformation tools and receive credit for developing them; cyberinfrastructure for deploying the system on diverse computing platforms leveraging scalability via Docker swarm;workflow management service for creatively integrating existing transformations to generate custom, reproducible workflows which meet research needs, and its data management capabilities. This paper also discusses data transformation tools developed to support some\ scientific and allied use cases, thereby benefiting researchers in diverse domains. Finally, we briefly discuss our future directions with regard to production deployments as well as how users can access Brown Dog to manage their un-curated unstructured data.",
keywords = "API gateway, Auto-curation, Big data, Data conversion, Data curation, Data cyberinfrastructure, Data transformation, Data wrangling, Metadata extraction, Orchestration, Provenance, Unstructured data",
author = "Satheesan, {Sandeep Puthanveetil} and Benjamin Galewsky and Lee, {Jong Sung} and M. Christopher and Bing Zhang and Jay Alameda and Gregory Jansen and Richard Marciano and Schmidt, {Arthur R} and Yan Zhao and Shannon Bradley and Rob Kooper and Luigi Marini and Marcus Slavenas and Inna Zharnitsky and Michael Dietze and Praveen Kumar and Minsker, {Barbara S.} and Sullivan, {William C} and McHenry, {Kenton Guadron}",
year = "2018",
month = "7",
day = "22",
doi = "10.1145/3219104.3219132",
language = "English (US)",
isbn = "9781450364461",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "Practice and Experience in Advanced Research Computing 2018",

}

TY - GEN

T1 - Brown dog

T2 - Making the digital world a better place, a few files at a time

AU - Satheesan, Sandeep Puthanveetil

AU - Galewsky, Benjamin

AU - Lee, Jong Sung

AU - Christopher, M.

AU - Zhang, Bing

AU - Alameda, Jay

AU - Jansen, Gregory

AU - Marciano, Richard

AU - Schmidt, Arthur R

AU - Zhao, Yan

AU - Bradley, Shannon

AU - Kooper, Rob

AU - Marini, Luigi

AU - Slavenas, Marcus

AU - Zharnitsky, Inna

AU - Dietze, Michael

AU - Kumar, Praveen

AU - Minsker, Barbara S.

AU - Sullivan, William C

AU - McHenry, Kenton Guadron

PY - 2018/7/22

Y1 - 2018/7/22

N2 - Brown Dog is a data transformation service for auto-curation of long-tail data. In this digital age, we have more data available for analysis than ever and this trend will only increase. According to most estimates, 70-80% of this data is unstructured, and together with unsupported data formats and inaccessible software tools, in essence, this data is not either easily accessible or usable to its owners in a meaningful way. Brown Dog aims at making this data more accessible and usable by auto-curation and indexing, leveraging existing and novel data transformation tools. In this paper, we discuss the recent major component improvements to Brown Dog including transformation tools called extractors and converters; desktop, web and terminal-based clients which perform data transformations; libraries written in multiple programming languages which integrate with existing software and extend their data curation capabilities; an online tool store for users to contribute, manage and share data transformation tools and receive credit for developing them; cyberinfrastructure for deploying the system on diverse computing platforms leveraging scalability via Docker swarm;workflow management service for creatively integrating existing transformations to generate custom, reproducible workflows which meet research needs, and its data management capabilities. This paper also discusses data transformation tools developed to support some\ scientific and allied use cases, thereby benefiting researchers in diverse domains. Finally, we briefly discuss our future directions with regard to production deployments as well as how users can access Brown Dog to manage their un-curated unstructured data.

AB - Brown Dog is a data transformation service for auto-curation of long-tail data. In this digital age, we have more data available for analysis than ever and this trend will only increase. According to most estimates, 70-80% of this data is unstructured, and together with unsupported data formats and inaccessible software tools, in essence, this data is not either easily accessible or usable to its owners in a meaningful way. Brown Dog aims at making this data more accessible and usable by auto-curation and indexing, leveraging existing and novel data transformation tools. In this paper, we discuss the recent major component improvements to Brown Dog including transformation tools called extractors and converters; desktop, web and terminal-based clients which perform data transformations; libraries written in multiple programming languages which integrate with existing software and extend their data curation capabilities; an online tool store for users to contribute, manage and share data transformation tools and receive credit for developing them; cyberinfrastructure for deploying the system on diverse computing platforms leveraging scalability via Docker swarm;workflow management service for creatively integrating existing transformations to generate custom, reproducible workflows which meet research needs, and its data management capabilities. This paper also discusses data transformation tools developed to support some\ scientific and allied use cases, thereby benefiting researchers in diverse domains. Finally, we briefly discuss our future directions with regard to production deployments as well as how users can access Brown Dog to manage their un-curated unstructured data.

KW - API gateway

KW - Auto-curation

KW - Big data

KW - Data conversion

KW - Data curation

KW - Data cyberinfrastructure

KW - Data transformation

KW - Data wrangling

KW - Metadata extraction

KW - Orchestration

KW - Provenance

KW - Unstructured data

UR - http://www.scopus.com/inward/record.url?scp=85051434230&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051434230&partnerID=8YFLogxK

U2 - 10.1145/3219104.3219132

DO - 10.1145/3219104.3219132

M3 - Conference contribution

SN - 9781450364461

T3 - ACM International Conference Proceeding Series

BT - Practice and Experience in Advanced Research Computing 2018

PB - Association for Computing Machinery

ER -