The Web Archives Workbench (WAW) Tool Suite: Taking an Archival Approach to the Preservation of Web Content

Patricia Hswe, Joanne S Kaczmarek, Leah Houser, Janet Eke

Research output: Contribution to journalArticlepeer-review

Abstract

The ECHO DEPository (also known as ECHO DEP, an abbreviation for Exploring Collaborations to Harvest Objects in a Digital Environment for Preservation) is an NDIIPP-partner project led by the University of Illinois at Urbana-Champaign in collaboration with OCLC and a consortium of partners, including five state libraries and archives. A core deliverable of the project's first phase was OCLC's development of the Web Archives Workbench (WAW), an opensource suite of Web archiving tools for identifying, describing, and harvesting Web-based content for ingestion into an external digital repository. Released in October 2007, the suite is designed to bridge the gap between manual selection and automated capture based on the "Arizona Model," which applies a traditional aggregate-based archival approach to Web archiving. Aggregate-based archiving refers to archiving items by group or in series, rather than individually. Core functionality of the suite includes the ability to identify Web content of potential interest through crawls of "seed" URLs and the domains they link to; tools for creating and managing metadata for association with harvested objects; website structural analysis and visualization to aid human content selection decisions; and packaging using a PREMIS-based METS profile developed by the ECHO DEPository to support easier ingestion into multiple repositories. This article provides background on the Arizona Model; an overview of how the tools work and their technical implementation; and a brief summary of user feedback from testing and implementing the tools.

Original languageEnglish (US)
Pages (from-to)442-460
Number of pages19
JournalLibrary Trends
Volume57
Issue number3
DOIs
StatePublished - 2009

ASJC Scopus subject areas

  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'The Web Archives Workbench (WAW) Tool Suite: Taking an Archival Approach to the Preservation of Web Content'. Together they form a unique fingerprint.

Cite this