File caching in data intensive scientific applications on data-grids

Ekow Otoo, Doron Rotem, Alexandra Romosan, Sridhar Seshadri

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We present some theoretical and experimental results of an important caching problem which arises frequently in data intensive scientific applications that are run in data-grids. Such applications often need to process several files simultaneously, i.e., the application runs only if all its needed files are present in some disk cache accessible to the compute resource of the application. The set of files requested by an application, all of which must be in cache for the application to run, is called a file-bundle. This requirement introduces the need for cache replacement algorithms that are based on file-bundles rather then individual files. We show that traditional caching algorithms such as Least Recently Used (LRU) and GreedyDual-Size (GDS) are not optimal in this case since they are not sensitive to file-bundles and may hold in the cache non-relevant combinations of files. We propose and analyze a new cache replacement algorithm specifically adapted to deal with filebundles. Results of experimental studies of the new algorithm, using a disk cache simulation model under a wide range of conditions such as file request distributions, relative cache size, file size distribution, and incoming job queue size, show significant improvement over traditional caching algorithms such as CDS.

Original languageEnglish (US)
Title of host publicationData Management in Grids - First VLDB Workshop, DMG 2005, Revised Selected Papers
Number of pages15
StatePublished - 2005
Externally publishedYes
Event1st VLDB Workshop on Data Management in Grids, DMG 2005 - Trondheim, Norway
Duration: Sep 2 2005Sep 3 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3836 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other1st VLDB Workshop on Data Management in Grids, DMG 2005

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'File caching in data intensive scientific applications on data-grids'. Together they form a unique fingerprint.

Cite this