Towards Publishing Secure Capsule-Based Analysis

Jaimie Murdock, Jacob Jett, Tim Cole, Yu Ma, J. Stephen Downie, Beth Plale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Computational engagement with the HathiTrust Digital Library (HTDL) is confounded by the in- copyright status and licensing restrictions on the majority of the content. Because of these limitations, computational analysis on the HTDL must either be carried out in a secure environment or on derivative datasets. The HathiTrust Research Center (HTRC) Data Capsule service provides researchers with a secure environment through which they invoke tools that create, analyze, and export non-consumptive datasets. These derivative datasets, so long as they do not reproduce the full-text of the original work, are a transformative work protected by Fair Use provisions of United States Copyright Law, and can be published for reuse by other researchers, as the HTRC Extracted Features Dataset has been. Secure environments and derivative datasets enable researchers to engage with restricted data from focused studies of a few dozen volumes to large- scale experiments on millions of volumes. This paper describes advances in the Capsule service through a case study of how the HTRC Data Capsule service has advanced our activities on provenance, workflows, worksets, and non-consumptive exports through a topic modeling example. We also discuss the potential applications of this Capsule-based model to other digital libraries wrestling with research access and copyright restrictions.

Original languageEnglish (US)
Title of host publication2017 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538638613
DOIs
StatePublished - Jul 25 2017
Event17th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017 - Toronto, Canada
Duration: Jun 19 2017Jun 23 2017

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other17th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017
Country/TerritoryCanada
CityToronto
Period6/19/176/23/17

Keywords

  • Data provenance
  • digital libraries
  • metadata management
  • research workflows
  • semantic web
  • text processing

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Towards Publishing Secure Capsule-Based Analysis'. Together they form a unique fingerprint.

Cite this