Vectorization of apply to reduce interpretation overhead of R

Haichuan Wang, David Padua, Peng Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

R is a popular dynamic language designed for statistical computing. Despite R's huge user base, the inefficiency in R's language implementation becomes a major pain-point in everyday use as well as an obstacle to apply R to solve large scale analytics problems. The two most common approaches to improve the performance of dynamic languages are: implementing more efficient interpretation strategies and extending the interpreter with Just-In-Time (JIT) compiler. However, both approaches require significant changes to the interpreter, and complicate the adoption by development teams as a result. This paper presents a new approach to improve execution efficiency of R programs by vectorizing the widely used Apply class of operations. Apply accepts two parameters: a function and a collection of input data elements. The standard implementation of Apply iteratively invokes the input function with each element in the data collection. Our approach combines data transformation and function vectorization to convert the looping-over-data execution of the standard Apply into a single invocation of a vectorized function that contains a sequence of vector operations over the input data. This conversion can significantly speed-up the execution of Apply operations in R by reducing the number of interpretation steps. We implemented the vectorization transformation as an R package. To enable the optimization, all that is needed is to invoke the package, and the user can use a normal R interpreter without any changes. The evaluation shows that the proposed method delivers significant performance improvements for a collection of data analysis algorithm benchmarks. This is achieved without any native code generation and using only a single-thread of execution.

Original languageEnglish (US)
Title of host publicationOOPSLA 2015 - Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming Systems, Languages, and Applications
EditorsPatrick Eugster, Jonathan Aldrich
PublisherAssociation for Computing Machinery
Pages400-415
Number of pages16
ISBN (Electronic)9781450336895
DOIs
StatePublished - Oct 23 2015
Event2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015 - Pittsburgh, United States
Duration: Oct 25 2015Oct 30 2015

Publication series

NameProceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA
Volume25-30-Oct-2015

Conference

Conference2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015
CountryUnited States
CityPittsburgh
Period10/25/1510/30/15

Keywords

  • Dynamic Language
  • R
  • Vectorization

ASJC Scopus subject areas

  • Software

Cite this