TY - GEN
T1 - File caching in data intensive scientific applications on data-grids
AU - Otoo, Ekow
AU - Rotem, Doron
AU - Romosan, Alexandra
AU - Seshadri, Sridhar
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2005
Y1 - 2005
N2 - We present some theoretical and experimental results of an important caching problem which arises frequently in data intensive scientific applications that are run in data-grids. Such applications often need to process several files simultaneously, i.e., the application runs only if all its needed files are present in some disk cache accessible to the compute resource of the application. The set of files requested by an application, all of which must be in cache for the application to run, is called a file-bundle. This requirement introduces the need for cache replacement algorithms that are based on file-bundles rather then individual files. We show that traditional caching algorithms such as Least Recently Used (LRU) and GreedyDual-Size (GDS) are not optimal in this case since they are not sensitive to file-bundles and may hold in the cache non-relevant combinations of files. We propose and analyze a new cache replacement algorithm specifically adapted to deal with filebundles. Results of experimental studies of the new algorithm, using a disk cache simulation model under a wide range of conditions such as file request distributions, relative cache size, file size distribution, and incoming job queue size, show significant improvement over traditional caching algorithms such as CDS.
AB - We present some theoretical and experimental results of an important caching problem which arises frequently in data intensive scientific applications that are run in data-grids. Such applications often need to process several files simultaneously, i.e., the application runs only if all its needed files are present in some disk cache accessible to the compute resource of the application. The set of files requested by an application, all of which must be in cache for the application to run, is called a file-bundle. This requirement introduces the need for cache replacement algorithms that are based on file-bundles rather then individual files. We show that traditional caching algorithms such as Least Recently Used (LRU) and GreedyDual-Size (GDS) are not optimal in this case since they are not sensitive to file-bundles and may hold in the cache non-relevant combinations of files. We propose and analyze a new cache replacement algorithm specifically adapted to deal with filebundles. Results of experimental studies of the new algorithm, using a disk cache simulation model under a wide range of conditions such as file request distributions, relative cache size, file size distribution, and incoming job queue size, show significant improvement over traditional caching algorithms such as CDS.
UR - http://www.scopus.com/inward/record.url?scp=33745477342&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745477342&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33745477342
SN - 3540312129
SN - 9783540312123
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 85
EP - 99
BT - Data Management in Grids - First VLDB Workshop, DMG 2005, Revised Selected Papers
T2 - 1st VLDB Workshop on Data Management in Grids, DMG 2005
Y2 - 2 September 2005 through 3 September 2005
ER -