Brown Dog: Leveraging everything towards autocuration

Smruti Padhy, Greg Jansen, Jay Alameda, Edgar Black, Liana Diesendruck, Mike Dietze, Praveen Kumar, Rob Kooper, Jong Sung Lee, Rui Liu, Richard Marciano, Luigi Marini, Dave Mattson, Barbara Minsker, Chris Navarro, Marcus Slavenas, William C Sullivan, Jason Votava, Inna Zharnitsky, Kenton Guadron McHenry

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present Brown Dog, two highly extensible services that aim to leverage any existing pieces of code, libraries, services, or standalone software (past or present) towards providing users with a simple to use and programmable means of automated aid in the curation and indexing of distributed collections of uncurated and/or unstructured data. Data collections such as these encompassing large varieties of data, in addition to large amounts of data, pose a significant challenge within modern day «Big Data» efforts. The two services, the Data Access Proxy (DAP) and the Data Tilling Service (DTS), focusing on format conversions and content based analysis/extraction respectively, wrap relevant conversion and extraction operations within arbitrary software, manages their deployment in an elastic manner, and manages job execution from behind a deliberately compact REST API. We describe both the motivation and need/scientific drivers for such services, the constituent components that allow for arbitrary software/code to be used and managed, and lastly an evaluation of the systems capabilities and scalability.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
EditorsFeng Luo, Kemafor Ogan, Mohammed J. Zaki, Laura Haas, Beng Chin Ooi, Vipin Kumar, Sudarsan Rachuri, Saumyadipta Pyne, Howard Ho, Xiaohua Hu, Shipeng Yu, Morris Hui-I Hsiao, Jian Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages493-500
Number of pages8
ISBN (Electronic)9781479999255
DOIs
StatePublished - Dec 22 2015
Event3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States
Duration: Oct 29 2015Nov 1 2015

Publication series

NameProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

Other

Other3rd IEEE International Conference on Big Data, IEEE Big Data 2015
CountryUnited States
CitySanta Clara
Period10/29/1511/1/15

Fingerprint

Application programming interfaces (API)
Scalability
Big data

Keywords

  • digital preservation
  • unstructured data
  • web services

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Software

Cite this

Padhy, S., Jansen, G., Alameda, J., Black, E., Diesendruck, L., Dietze, M., ... McHenry, K. G. (2015). Brown Dog: Leveraging everything towards autocuration. In F. Luo, K. Ogan, M. J. Zaki, L. Haas, B. C. Ooi, V. Kumar, S. Rachuri, S. Pyne, H. Ho, X. Hu, S. Yu, M. H-I. Hsiao, ... J. Li (Eds.), Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 (pp. 493-500). [7363791] (Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2015.7363791

Brown Dog : Leveraging everything towards autocuration. / Padhy, Smruti; Jansen, Greg; Alameda, Jay; Black, Edgar; Diesendruck, Liana; Dietze, Mike; Kumar, Praveen; Kooper, Rob; Lee, Jong Sung; Liu, Rui; Marciano, Richard; Marini, Luigi; Mattson, Dave; Minsker, Barbara; Navarro, Chris; Slavenas, Marcus; Sullivan, William C; Votava, Jason; Zharnitsky, Inna; McHenry, Kenton Guadron.

Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. ed. / Feng Luo; Kemafor Ogan; Mohammed J. Zaki; Laura Haas; Beng Chin Ooi; Vipin Kumar; Sudarsan Rachuri; Saumyadipta Pyne; Howard Ho; Xiaohua Hu; Shipeng Yu; Morris Hui-I Hsiao; Jian Li. Institute of Electrical and Electronics Engineers Inc., 2015. p. 493-500 7363791 (Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Padhy, S, Jansen, G, Alameda, J, Black, E, Diesendruck, L, Dietze, M, Kumar, P, Kooper, R, Lee, JS, Liu, R, Marciano, R, Marini, L, Mattson, D, Minsker, B, Navarro, C, Slavenas, M, Sullivan, WC, Votava, J, Zharnitsky, I & McHenry, KG 2015, Brown Dog: Leveraging everything towards autocuration. in F Luo, K Ogan, MJ Zaki, L Haas, BC Ooi, V Kumar, S Rachuri, S Pyne, H Ho, X Hu, S Yu, MH-I Hsiao & J Li (eds), Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015., 7363791, Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015, Institute of Electrical and Electronics Engineers Inc., pp. 493-500, 3rd IEEE International Conference on Big Data, IEEE Big Data 2015, Santa Clara, United States, 10/29/15. https://doi.org/10.1109/BigData.2015.7363791
Padhy S, Jansen G, Alameda J, Black E, Diesendruck L, Dietze M et al. Brown Dog: Leveraging everything towards autocuration. In Luo F, Ogan K, Zaki MJ, Haas L, Ooi BC, Kumar V, Rachuri S, Pyne S, Ho H, Hu X, Yu S, Hsiao MH-I, Li J, editors, Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 493-500. 7363791. (Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015). https://doi.org/10.1109/BigData.2015.7363791
Padhy, Smruti ; Jansen, Greg ; Alameda, Jay ; Black, Edgar ; Diesendruck, Liana ; Dietze, Mike ; Kumar, Praveen ; Kooper, Rob ; Lee, Jong Sung ; Liu, Rui ; Marciano, Richard ; Marini, Luigi ; Mattson, Dave ; Minsker, Barbara ; Navarro, Chris ; Slavenas, Marcus ; Sullivan, William C ; Votava, Jason ; Zharnitsky, Inna ; McHenry, Kenton Guadron. / Brown Dog : Leveraging everything towards autocuration. Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. editor / Feng Luo ; Kemafor Ogan ; Mohammed J. Zaki ; Laura Haas ; Beng Chin Ooi ; Vipin Kumar ; Sudarsan Rachuri ; Saumyadipta Pyne ; Howard Ho ; Xiaohua Hu ; Shipeng Yu ; Morris Hui-I Hsiao ; Jian Li. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 493-500 (Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015).
@inproceedings{24cac5da96054f82af6433c83908b6f1,
title = "Brown Dog: Leveraging everything towards autocuration",
abstract = "We present Brown Dog, two highly extensible services that aim to leverage any existing pieces of code, libraries, services, or standalone software (past or present) towards providing users with a simple to use and programmable means of automated aid in the curation and indexing of distributed collections of uncurated and/or unstructured data. Data collections such as these encompassing large varieties of data, in addition to large amounts of data, pose a significant challenge within modern day «Big Data» efforts. The two services, the Data Access Proxy (DAP) and the Data Tilling Service (DTS), focusing on format conversions and content based analysis/extraction respectively, wrap relevant conversion and extraction operations within arbitrary software, manages their deployment in an elastic manner, and manages job execution from behind a deliberately compact REST API. We describe both the motivation and need/scientific drivers for such services, the constituent components that allow for arbitrary software/code to be used and managed, and lastly an evaluation of the systems capabilities and scalability.",
keywords = "digital preservation, unstructured data, web services",
author = "Smruti Padhy and Greg Jansen and Jay Alameda and Edgar Black and Liana Diesendruck and Mike Dietze and Praveen Kumar and Rob Kooper and Lee, {Jong Sung} and Rui Liu and Richard Marciano and Luigi Marini and Dave Mattson and Barbara Minsker and Chris Navarro and Marcus Slavenas and Sullivan, {William C} and Jason Votava and Inna Zharnitsky and McHenry, {Kenton Guadron}",
year = "2015",
month = "12",
day = "22",
doi = "10.1109/BigData.2015.7363791",
language = "English (US)",
series = "Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "493--500",
editor = "Feng Luo and Kemafor Ogan and Zaki, {Mohammed J.} and Laura Haas and Ooi, {Beng Chin} and Vipin Kumar and Sudarsan Rachuri and Saumyadipta Pyne and Howard Ho and Xiaohua Hu and Shipeng Yu and Hsiao, {Morris Hui-I} and Jian Li",
booktitle = "Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015",
address = "United States",

}

TY - GEN

T1 - Brown Dog

T2 - Leveraging everything towards autocuration

AU - Padhy, Smruti

AU - Jansen, Greg

AU - Alameda, Jay

AU - Black, Edgar

AU - Diesendruck, Liana

AU - Dietze, Mike

AU - Kumar, Praveen

AU - Kooper, Rob

AU - Lee, Jong Sung

AU - Liu, Rui

AU - Marciano, Richard

AU - Marini, Luigi

AU - Mattson, Dave

AU - Minsker, Barbara

AU - Navarro, Chris

AU - Slavenas, Marcus

AU - Sullivan, William C

AU - Votava, Jason

AU - Zharnitsky, Inna

AU - McHenry, Kenton Guadron

PY - 2015/12/22

Y1 - 2015/12/22

N2 - We present Brown Dog, two highly extensible services that aim to leverage any existing pieces of code, libraries, services, or standalone software (past or present) towards providing users with a simple to use and programmable means of automated aid in the curation and indexing of distributed collections of uncurated and/or unstructured data. Data collections such as these encompassing large varieties of data, in addition to large amounts of data, pose a significant challenge within modern day «Big Data» efforts. The two services, the Data Access Proxy (DAP) and the Data Tilling Service (DTS), focusing on format conversions and content based analysis/extraction respectively, wrap relevant conversion and extraction operations within arbitrary software, manages their deployment in an elastic manner, and manages job execution from behind a deliberately compact REST API. We describe both the motivation and need/scientific drivers for such services, the constituent components that allow for arbitrary software/code to be used and managed, and lastly an evaluation of the systems capabilities and scalability.

AB - We present Brown Dog, two highly extensible services that aim to leverage any existing pieces of code, libraries, services, or standalone software (past or present) towards providing users with a simple to use and programmable means of automated aid in the curation and indexing of distributed collections of uncurated and/or unstructured data. Data collections such as these encompassing large varieties of data, in addition to large amounts of data, pose a significant challenge within modern day «Big Data» efforts. The two services, the Data Access Proxy (DAP) and the Data Tilling Service (DTS), focusing on format conversions and content based analysis/extraction respectively, wrap relevant conversion and extraction operations within arbitrary software, manages their deployment in an elastic manner, and manages job execution from behind a deliberately compact REST API. We describe both the motivation and need/scientific drivers for such services, the constituent components that allow for arbitrary software/code to be used and managed, and lastly an evaluation of the systems capabilities and scalability.

KW - digital preservation

KW - unstructured data

KW - web services

UR - http://www.scopus.com/inward/record.url?scp=84963716196&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963716196&partnerID=8YFLogxK

U2 - 10.1109/BigData.2015.7363791

DO - 10.1109/BigData.2015.7363791

M3 - Conference contribution

AN - SCOPUS:84963716196

T3 - Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

SP - 493

EP - 500

BT - Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

A2 - Luo, Feng

A2 - Ogan, Kemafor

A2 - Zaki, Mohammed J.

A2 - Haas, Laura

A2 - Ooi, Beng Chin

A2 - Kumar, Vipin

A2 - Rachuri, Sudarsan

A2 - Pyne, Saumyadipta

A2 - Ho, Howard

A2 - Hu, Xiaohua

A2 - Yu, Shipeng

A2 - Hsiao, Morris Hui-I

A2 - Li, Jian

PB - Institute of Electrical and Electronics Engineers Inc.

ER -