Proposal for Persistent & Unique Entity Identifiers

Jacob Jett, Guangchen Ruan, Leena Unnikrishnan, Colleen Fallaw, Chris Maden, Timothy W Cole

Research output: Book/Report/Conference proceedingTechnical report


This proposal argues for the establishment of persistent and unique identifiers for page level content. The page is a key conceptual entity within the HathiTrust Research Center (HTRC) framework. Volumes are composed of pages and pages are the size of the portions of data that the HTRC’s analytics modules consume and execute algorithms across. The need for infrastructure that supports persistent and unique identity for is best described by seven use cases: 1. Persistent Citability: Scholars engaging in the analysis of HTRC resources have a clear need to cite those resources in a persistent manner independent of those resources’ relative positions within other entities. 2. Point-in-time Citability: Scholars engaging in the analysis of HTRC resources have a clear need to cite resources in an unambiguous way that is persistent with respect to time. 3. Reproducibility: Scholars need methods by which the resources that they cite can be shared so that their work conforms to the norms of peer-review and reproducibility of results. 4. Supporting “non-consumptive” Usage: Anonymizing page-level content by disassociating it from the volumes that it is conceptually a part of increases the difficulty of leveraging HTRC analytics modules for the direct reproduction of HathiTrust (HT) content. 5. Improved Granularity: Since many features that scholars are interested in exist at the conceptual level of a page rather than at the level of a volume, unique page-level entities expand the types of methods by which worksets can be gathered and by which analytics modules can be constructed. 6. Expanded Workset Membership: In the near future we would like to empower scholars with options for creating worksets from arbitrary resources at arbitrary levels of granularity, including constructing worksets from collections of arbitrary pages. 7. Supporting Graph Representations: Unique identifiers for page-level content facilitate the creation of more conceptually accurate and functional graph representations of the HT corpus.
Original languageEnglish (US)
PublisherUniversity of Illinois Urbana-Champaign
Number of pages25
StatePublished - Aug 22 2014


  • system architecture
  • digital libraries
  • HathiTrust Digital Library
  • HathiTrust Research Center
  • identifiers


Dive into the research topics of 'Proposal for Persistent & Unique Entity Identifiers'. Together they form a unique fingerprint.

Cite this