Abstract
Vectorization is increasingly important to achieve high performance on modern hardware with SIMD instructions. Assembly of matrices and vectors in the finite element method, which is characterized by iterating a local assembly kernel over unstructured meshes, poses difficulties to effective vectorization. Maintaining a user-friendly high-level interface with a suitable degree of abstraction while generating efficient, vectorized code for the finite element method is a challenge for numerical software systems and libraries. In this work, we study cross-element vectorization in the finite element framework Firedrake via code transformation and demonstrate the efficacy of such an approach by evaluating a wide range of matrix-free operators spanning different polynomial degrees and discretizations on two recent CPUs using three mainstream compilers. Our experiments show that our approaches for cross-element vectorization achieve 30% of theoretical peak performance for many examples of practical significance, and exceed 50% for cases with high arithmetic intensities, with consistent speed-up over (intra-element) vectorization restricted to the local assembly kernels.
Original language | English (US) |
---|---|
Pages (from-to) | 629-644 |
Number of pages | 16 |
Journal | International Journal of High Performance Computing Applications |
Volume | 34 |
Issue number | 6 |
DOIs | |
State | Published - Nov 1 2020 |
Keywords
- Finite element method
- code generation
- global assembly
- vectorization
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture