An input-adaptive and in-place approach to dense tensor-times-matrix multiply

Jiajia Li, Casey Battaglino, Ioakeim Perros, Jimeng Sun, Richard Vuduc

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes a novel framework, called InTensLi ("intensely"), for producing fast single-node implementations of dense tensor-times-matrix multiply (Ttm) of arbitrary dimension. Whereas conventional implementations of Ttm rely on explicitly converting the input tensor operand into a matrix - -in order to be able to use any available and fast general matrix-matrix multiply (Gemm) implementation - -our framework's strategy is to carry out the Ttm in-place, avoiding this copy. As the resulting implementations expose tuning parameters, this paper also describes a heuristic empirical model for selecting an optimal configuration based on the Ttm's inputs. When compared to widely used single-node Ttm implementations that are available in the Tensor Toolbox and Cyclops Tensor Framework (Ctf), In-TensLi's in-place and input-adaptive Ttm implementations achieve 4× and 13× speedups, showing Gemm-like performance on a variety of input sizes.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2015
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781450337236
DOIs
StatePublished - Nov 15 2015
Externally publishedYes
EventInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 - Austin, United States
Duration: Nov 15 2015Nov 20 2015

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume15-20-November-2015
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

ConferenceInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
Country/TerritoryUnited States
CityAustin
Period11/15/1511/20/15

Keywords

  • code generation
  • multilinear algebra
  • offline autotuning
  • tensor operation

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'An input-adaptive and in-place approach to dense tensor-times-matrix multiply'. Together they form a unique fingerprint.

Cite this