Potential of Interpreter Specialization for Data Analysis

Wei He, Michelle Mills Strout

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scientists frequently implement data analyses in high-level programming languages such as Python, Perl, Lu, and R. Many of these languages are inefficient due to the overhead of being dynamically typed and interpreted. In this paper, we report the potential performance improvement of domain-specific interpreter specialization for data analysis workloads and evaluate how the characteristics of data analysis workloads affect the specialization, both positively and negatively. Assisted by compilers, we specialize the Lu and CPython interpreters at source-level using the script being interpreted and the data types during the interpretation as invariants for five common tasks from real data analysis workloads. Through experiments, we measure 9.0–39.6% performance improvement for Lu and 11.0–17.2% performance improvement for CPython for benchmarks that perform data loading, histogram computation, data filtering, data transformation, and dataset shuffle. This specialization does not include misspeculation checks of data types at possible type conversion code that may be necessary for other workloads. We report the details of our evaluation and present a semi-automatic method for specializing the interpreters.

Original languageEnglish (US)
Title of host publicationHigh Performance Computing - ISC High Performance Digital 2021 International Workshops, 2021, Revised Selected Papers
EditorsHeike Jagode, Hartwig Anzt, Hatem Ltaief, Piotr Luszczek
PublisherSpringer
Pages212-225
Number of pages14
ISBN (Print)9783030905385
DOIs
StatePublished - 2021
Externally publishedYes
EventInternational Conference on High Performance Computing, ISC High Performance 2021 - Virtual, Online
Duration: Jun 24 2021Jul 2 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12761 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on High Performance Computing, ISC High Performance 2021
CityVirtual, Online
Period6/24/217/2/21

Keywords

  • Compiler-assisted specialization
  • Interpreter specialization
  • Profile-based optimization

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Potential of Interpreter Specialization for Data Analysis'. Together they form a unique fingerprint.

Cite this