Sharing programming resources between bio* projects

Raoul J.P. Bonnal, Andrew Yates, Naohisa Goto, Laurent Gautier, Scooter Willis, Christopher J Fields, Toshiaki Katayama, Pjotr Prins

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Open-source software encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, open-source software comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor. In this chapter, we compare the three principal approaches for sharing software between different programming languages: By remote procedure call (RPC), by sharing a local “call stack,” and by calling program to programs. RPC provides a language-independent protocol over a network interface; examples are SOAP and Rserve. The local call stack provides a between-language mapping, not over the network interface but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java virtual machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often. Here, we present cross-language examples for sequence translation and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite (EMBOSS). In general, call stack approaches outperform native Bio* implementations, and these, in turn, outperform “RPC”-based approaches. To test and compare strategies, we provide a downloadable Docker container with all examples, tools, and libraries included.

Original languageEnglish (US)
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Pages747-766
Number of pages20
DOIs
StatePublished - Jan 1 2019

Publication series

NameMethods in Molecular Biology
Volume1910
ISSN (Print)1064-3745

Fingerprint

Language
Software
Programming Languages
Boidae
Computational Biology
Libraries
Molecular Biology

Keywords

  • Bioinformatics
  • EMBOSS
  • Java
  • PAML
  • Perl
  • Python
  • R
  • RPC
  • Ruby
  • Web services

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

Bonnal, R. J. P., Yates, A., Goto, N., Gautier, L., Willis, S., Fields, C. J., ... Prins, P. (2019). Sharing programming resources between bio* projects. In Methods in Molecular Biology (pp. 747-766). (Methods in Molecular Biology; Vol. 1910). Humana Press Inc.. https://doi.org/10.1007/978-1-4939-9074-0_25

Sharing programming resources between bio* projects. / Bonnal, Raoul J.P.; Yates, Andrew; Goto, Naohisa; Gautier, Laurent; Willis, Scooter; Fields, Christopher J; Katayama, Toshiaki; Prins, Pjotr.

Methods in Molecular Biology. Humana Press Inc., 2019. p. 747-766 (Methods in Molecular Biology; Vol. 1910).

Research output: Chapter in Book/Report/Conference proceedingChapter

Bonnal, RJP, Yates, A, Goto, N, Gautier, L, Willis, S, Fields, CJ, Katayama, T & Prins, P 2019, Sharing programming resources between bio* projects. in Methods in Molecular Biology. Methods in Molecular Biology, vol. 1910, Humana Press Inc., pp. 747-766. https://doi.org/10.1007/978-1-4939-9074-0_25
Bonnal RJP, Yates A, Goto N, Gautier L, Willis S, Fields CJ et al. Sharing programming resources between bio* projects. In Methods in Molecular Biology. Humana Press Inc. 2019. p. 747-766. (Methods in Molecular Biology). https://doi.org/10.1007/978-1-4939-9074-0_25
Bonnal, Raoul J.P. ; Yates, Andrew ; Goto, Naohisa ; Gautier, Laurent ; Willis, Scooter ; Fields, Christopher J ; Katayama, Toshiaki ; Prins, Pjotr. / Sharing programming resources between bio* projects. Methods in Molecular Biology. Humana Press Inc., 2019. pp. 747-766 (Methods in Molecular Biology).
@inbook{f236e033695b4f8a922aba7dc7040b05,
title = "Sharing programming resources between bio* projects",
abstract = "Open-source software encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, open-source software comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor. In this chapter, we compare the three principal approaches for sharing software between different programming languages: By remote procedure call (RPC), by sharing a local “call stack,” and by calling program to programs. RPC provides a language-independent protocol over a network interface; examples are SOAP and Rserve. The local call stack provides a between-language mapping, not over the network interface but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java virtual machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often. Here, we present cross-language examples for sequence translation and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite (EMBOSS). In general, call stack approaches outperform native Bio* implementations, and these, in turn, outperform “RPC”-based approaches. To test and compare strategies, we provide a downloadable Docker container with all examples, tools, and libraries included.",
keywords = "Bioinformatics, EMBOSS, Java, PAML, Perl, Python, R, RPC, Ruby, Web services",
author = "Bonnal, {Raoul J.P.} and Andrew Yates and Naohisa Goto and Laurent Gautier and Scooter Willis and Fields, {Christopher J} and Toshiaki Katayama and Pjotr Prins",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-1-4939-9074-0_25",
language = "English (US)",
series = "Methods in Molecular Biology",
publisher = "Humana Press Inc.",
pages = "747--766",
booktitle = "Methods in Molecular Biology",

}

TY - CHAP

T1 - Sharing programming resources between bio* projects

AU - Bonnal, Raoul J.P.

AU - Yates, Andrew

AU - Goto, Naohisa

AU - Gautier, Laurent

AU - Willis, Scooter

AU - Fields, Christopher J

AU - Katayama, Toshiaki

AU - Prins, Pjotr

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Open-source software encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, open-source software comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor. In this chapter, we compare the three principal approaches for sharing software between different programming languages: By remote procedure call (RPC), by sharing a local “call stack,” and by calling program to programs. RPC provides a language-independent protocol over a network interface; examples are SOAP and Rserve. The local call stack provides a between-language mapping, not over the network interface but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java virtual machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often. Here, we present cross-language examples for sequence translation and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite (EMBOSS). In general, call stack approaches outperform native Bio* implementations, and these, in turn, outperform “RPC”-based approaches. To test and compare strategies, we provide a downloadable Docker container with all examples, tools, and libraries included.

AB - Open-source software encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, open-source software comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor. In this chapter, we compare the three principal approaches for sharing software between different programming languages: By remote procedure call (RPC), by sharing a local “call stack,” and by calling program to programs. RPC provides a language-independent protocol over a network interface; examples are SOAP and Rserve. The local call stack provides a between-language mapping, not over the network interface but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java virtual machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often. Here, we present cross-language examples for sequence translation and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite (EMBOSS). In general, call stack approaches outperform native Bio* implementations, and these, in turn, outperform “RPC”-based approaches. To test and compare strategies, we provide a downloadable Docker container with all examples, tools, and libraries included.

KW - Bioinformatics

KW - EMBOSS

KW - Java

KW - PAML

KW - Perl

KW - Python

KW - R

KW - RPC

KW - Ruby

KW - Web services

UR - http://www.scopus.com/inward/record.url?scp=85068833317&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068833317&partnerID=8YFLogxK

U2 - 10.1007/978-1-4939-9074-0_25

DO - 10.1007/978-1-4939-9074-0_25

M3 - Chapter

C2 - 31278684

AN - SCOPUS:85068833317

T3 - Methods in Molecular Biology

SP - 747

EP - 766

BT - Methods in Molecular Biology

PB - Humana Press Inc.

ER -