CouchFS: A high-performance file system for large data sets

Fangzhou Yao, Roy H. Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Numerous file systems have been implemented to meet the needs in today's big data era, however many of them require specific configurations or frameworks for data processing. This paper presents CouchFS, a POSIX-compliant distributed file system for large data sets. We build CouchFS on top of CouchDB, which grants us flexibility to handle semistructured data. Since a database has similar behaviors as a file system, and CouchDB provides a high customizable MapReduce view for indexing, CouchFS is able to achieve high-performance searching for both text and supported binary objects. This work compares search of Wikipedia data using CouchDB, PostgreSQL and Spotlight on HFS+ file system. We show our design of CouchFS and discuss future approaches to improve this file system.

Original languageEnglish (US)
Title of host publicationProceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014
EditorsPeter Chen, Peter Chen, Hemant Jain
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages784-785
Number of pages2
ISBN (Electronic)9781479950577
DOIs
StatePublished - Sep 22 2014
Event3rd IEEE International Congress on Big Data, BigData Congress 2014 - Anchorage, United States
Duration: Jun 27 2014Jul 2 2014

Publication series

NameProceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014

Other

Other3rd IEEE International Congress on Big Data, BigData Congress 2014
CountryUnited States
CityAnchorage
Period6/27/147/2/14

Fingerprint

Big data

ASJC Scopus subject areas

  • Computer Science Applications

Cite this

Yao, F., & Campbell, R. H. (2014). CouchFS: A high-performance file system for large data sets. In P. Chen, P. Chen, & H. Jain (Eds.), Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014 (pp. 784-785). [6906866] (Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.Congress.2014.122

CouchFS : A high-performance file system for large data sets. / Yao, Fangzhou; Campbell, Roy H.

Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014. ed. / Peter Chen; Peter Chen; Hemant Jain. Institute of Electrical and Electronics Engineers Inc., 2014. p. 784-785 6906866 (Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yao, F & Campbell, RH 2014, CouchFS: A high-performance file system for large data sets. in P Chen, P Chen & H Jain (eds), Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014., 6906866, Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014, Institute of Electrical and Electronics Engineers Inc., pp. 784-785, 3rd IEEE International Congress on Big Data, BigData Congress 2014, Anchorage, United States, 6/27/14. https://doi.org/10.1109/BigData.Congress.2014.122
Yao F, Campbell RH. CouchFS: A high-performance file system for large data sets. In Chen P, Chen P, Jain H, editors, Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 784-785. 6906866. (Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014). https://doi.org/10.1109/BigData.Congress.2014.122
Yao, Fangzhou ; Campbell, Roy H. / CouchFS : A high-performance file system for large data sets. Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014. editor / Peter Chen ; Peter Chen ; Hemant Jain. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 784-785 (Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014).
@inproceedings{fd3c0739078f45b2a3c91922a232bfab,
title = "CouchFS: A high-performance file system for large data sets",
abstract = "Numerous file systems have been implemented to meet the needs in today's big data era, however many of them require specific configurations or frameworks for data processing. This paper presents CouchFS, a POSIX-compliant distributed file system for large data sets. We build CouchFS on top of CouchDB, which grants us flexibility to handle semistructured data. Since a database has similar behaviors as a file system, and CouchDB provides a high customizable MapReduce view for indexing, CouchFS is able to achieve high-performance searching for both text and supported binary objects. This work compares search of Wikipedia data using CouchDB, PostgreSQL and Spotlight on HFS+ file system. We show our design of CouchFS and discuss future approaches to improve this file system.",
author = "Fangzhou Yao and Campbell, {Roy H.}",
year = "2014",
month = "9",
day = "22",
doi = "10.1109/BigData.Congress.2014.122",
language = "English (US)",
series = "Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "784--785",
editor = "Peter Chen and Peter Chen and Hemant Jain",
booktitle = "Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014",
address = "United States",

}

TY - GEN

T1 - CouchFS

T2 - A high-performance file system for large data sets

AU - Yao, Fangzhou

AU - Campbell, Roy H.

PY - 2014/9/22

Y1 - 2014/9/22

N2 - Numerous file systems have been implemented to meet the needs in today's big data era, however many of them require specific configurations or frameworks for data processing. This paper presents CouchFS, a POSIX-compliant distributed file system for large data sets. We build CouchFS on top of CouchDB, which grants us flexibility to handle semistructured data. Since a database has similar behaviors as a file system, and CouchDB provides a high customizable MapReduce view for indexing, CouchFS is able to achieve high-performance searching for both text and supported binary objects. This work compares search of Wikipedia data using CouchDB, PostgreSQL and Spotlight on HFS+ file system. We show our design of CouchFS and discuss future approaches to improve this file system.

AB - Numerous file systems have been implemented to meet the needs in today's big data era, however many of them require specific configurations or frameworks for data processing. This paper presents CouchFS, a POSIX-compliant distributed file system for large data sets. We build CouchFS on top of CouchDB, which grants us flexibility to handle semistructured data. Since a database has similar behaviors as a file system, and CouchDB provides a high customizable MapReduce view for indexing, CouchFS is able to achieve high-performance searching for both text and supported binary objects. This work compares search of Wikipedia data using CouchDB, PostgreSQL and Spotlight on HFS+ file system. We show our design of CouchFS and discuss future approaches to improve this file system.

UR - http://www.scopus.com/inward/record.url?scp=84923924812&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84923924812&partnerID=8YFLogxK

U2 - 10.1109/BigData.Congress.2014.122

DO - 10.1109/BigData.Congress.2014.122

M3 - Conference contribution

AN - SCOPUS:84923924812

T3 - Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014

SP - 784

EP - 785

BT - Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014

A2 - Chen, Peter

A2 - Chen, Peter

A2 - Jain, Hemant

PB - Institute of Electrical and Electronics Engineers Inc.

ER -