Overview

What are expression templates?

Expression templates are a C++ template metaprogramming technique that essentially allows high level programming for loop fusion. Take the following exemple.

std::vector<float> operator+(std::vector<float> const &a,
                             std::vector<float> const &b) {
  std::vector<float> ret(a.size());
  for (size_t i = 0; i < a.size(); i++) {
    ret[i] = a[i] + b[i];
  }
  return ret;
}

int main() {
  std::vector<float> a, b, c, d, sum;

  ...

  sum = a + b + c + d;

  ...

  return 0;
}

The expression a + b + c + d involves three calls to operator+ and at least nine memory passes are necessary. This can be optimized as follows.

int main() {
  std::vector<float> a, b, c, d, sum;

  ...

  for (size_t i = 0; i < a.size(); i++) {
    ret[i] = a[i] + b[i] + c[i] + d[i];
  }

  ...

  return 0;
}

The rewriting above requires only four memory passes which is of course better but as humans we prefer the writing a + b + c + d. Expression templates solves exactly this problem and allows the programmer to write a + b + c + d and the compiler to see the loop written above.

Expressions templates with NSIMD

This module provides expression templates on top of NSIMD core. As a consequence the loops seen by the compiler deduced from the high-level expressions are optimized using SIMD instructions. Note also that NVIDIA and AMD GPUs are supported through CUDA and ROCm/HIP. The API for expression templates in NSIMD is C++98 compatible and is able to work with any container as its only requirement for data is that it must be contiguous.

All inputs to an expression must be declared using tet1d::in while the output must be declared using tet1d::out.

int main() {
  std::vector<float> a, b, c;

  ...

  tet1d::out(a) = tet1d::in(&a[0], a.size()) + tet1d::in(&b[0], b.size());

  ...

  return 0;
}

template <typename T, typename I> inline node in(const T *data, I sz);
Construct an input for expression templates starting at address data and containing sz elements. The return type of this functin node can be used with the help of the TET1D_IN(T) macro where T if the underlying type of data (ints, floats, doubles...).
template <typename T> node out(T *data);
Construct an output for expression templates starting at address data. Note that memory must be allocated by the user before passing it to the expression template engine. The output type can be used with the TET1D_OUT(T) where T is the underlying type (ints, floats, doubles...).

Note that it is possible to pass parameters to the expression template engine to specify the number of threads per block for GPUs or the SIMD extension to use...

template <typename T, typename Pack> node out(T *data, int threads_per_block, void *stream);
Construct an output for expression templates starting at address data. Note that memory must be allocated by the user before passing it to the expression template engine. The Pack parameter is useful when compiling for CPUs. The type is nsimd::pack<...> allowing the developper to specify all details about the NSIMD packs that will be used by the expression template engine. The threads_per_block and stream arguments are used only when compiling for GPUs. Their meaning is contained in their names. The output type can be used with the TET1D_OUT_EX(T, N, SimdExt) where T is the underlying type (ints, floats, doubles...), N is the unroll factor and SimdExt the SIMD extension.

Moreover a MATLAB-like syntax is provided. One can select a subrange of given input. Indexes are understood as for Python: -1 represents the last element. The contant tet1d::end = -1 allows one to write portable code.

int main() {
  std::vector<float> a, b, c;

  ...

  TET1D_IN(float) va = tet1d::in(&a[0], a.size());
  TET1D_IN(float) vb = tet1d::in(&b[0], b.size());
  tet1d::out(c) = va(10, tet1d::end - 10) + vb;

  ...

  return 0;
}

One can also specify which elements of the output must be rewritten with the following syntax.

int main() {
  std::vector<float> a, b, c;

  ...

  TET1D_IN(float) va = tet1d::in(&a[0], a.size());
  TET1D_IN(float) vb = tet1d::in(&b[0], b.size());
  TET1D_OUT(float) vc = tet1d::out(&c[0]);
  vc(va >= 10 && va < 20) = vb;

  ...

  return 0;
}

In the exemple above, element i in vc is written only if va[i] >= 10 and va[i] < 20. The expression appearing in the parenthesis can contain arbitrary expression templates as soon as the underlying type is bool.

Warning using `auto`

Using auto can lead to surprising results. We advice you never to use auto when dealing with expression templates. Indeed using auto will make the variable an obscure type representing the computation tree of the expression template. This implies that you won't be able to get data from this variable i.e. get the .data member for exemple. Again this variable or its type cannot be used in template arguments where you need it.

Overview

What are expression templates?

Expressions templates with NSIMD

Warning using auto

Warning using `auto`