SMURF: Efficient and Scalable Metadata Access for Distributed Applications

Bing Zhang, Tevfik Kosar

Research output: Contribution to journalArticlepeer-review


In parallel with big data processing and analysis dominating the usage of distributed and Cloud infrastructures, the demand for distributed metadata access and transfer has increased. The volume of data generated by many application domains exceeds petabytes, while the corresponding metadata amounts to terabytes or even more. This article proposes a novel solution for efficient and scalable metadata access for distributed applications across wide-area networks, dubbed SMURF. Our solution combines novel pipelining and concurrent transfer mechanisms with reliability, provides distributed continuum caching and semantic locality-aware prefetching strategies to sidestep fetching latency, and achieves scalable and high-performance metadata fetch/prefetch services in the Cloud. We incorporate the phenomenon of semantic locality awareness for increased prefetch prediction rate using real-life application I/O traces from Yahoo! Hadoop audit logs and propose a novel prefetch predictor. By effectively caching and prefetching metadata based on the access patterns, our continuum caching and prefetching mechanism significantly improves the local cache hit rate and reduces the average fetching latency. We replay approximately 20 Million metadata access operations from real audit traces, where SMURF achieves 90% accuracy during prefetch prediction and reduced the average fetch latency by 50% compared to the state-of-the-art mechanisms.

Original languageEnglish (US)
Pages (from-to)3915-3928
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Issue number12
StatePublished - Dec 1 2022


  • Heterogeneity
  • continuum caching
  • metadata access
  • prefetch prediction
  • scalability
  • semantic locality

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics


Dive into the research topics of 'SMURF: Efficient and Scalable Metadata Access for Distributed Applications'. Together they form a unique fingerprint.

Cite this