NSIMD documentation
Index | Tutorial | FAQ | Contribute | API overview | API reference | Wrapped intrinsics | Modules
Tiny expression templates 1D module documentation
Overview | API reference

Overview

What are expression templates?

Expression templates are a C++ template metaprogramming technique that essentially allows high level programming for loop fusion. Take the following exemple.

std::vector<float> operator+(std::vector<float> const &a,
                             std::vector<float> const &b) {
  std::vector<float> ret(a.size());
  for (size_t i = 0; i < a.size(); i++) {
    ret[i] = a[i] + b[i];
  }
  return ret;
}

int main() {
  std::vector<float> a, b, c, d, sum;

  ...

  sum = a + b + c + d;

  ...

  return 0;
}

The expression a + b + c + d involves three calls to operator+ and at least nine memory passes are necessary. This can be optimized as follows.

int main() {
  std::vector<float> a, b, c, d, sum;

  ...

  for (size_t i = 0; i < a.size(); i++) {
    ret[i] = a[i] + b[i] + c[i] + d[i];
  }

  ...

  return 0;
}

The rewriting above requires only four memory passes which is of course better but as humans we prefer the writing a + b + c + d. Expression templates solves exactly this problem and allows the programmer to write a + b + c + d and the compiler to see the loop written above.

Expressions templates with NSIMD

This module provides expression templates on top of NSIMD core. As a consequence the loops seen by the compiler deduced from the high-level expressions are optimized using SIMD instructions. Note also that NVIDIA and AMD GPUs are supported through CUDA and ROCm/HIP. The API for expression templates in NSIMD is C++98 compatible and is able to work with any container as its only requirement for data is that it must be contiguous.

All inputs to an expression must be declared using tet1d::in while the output must be declared using tet1d::out.

int main() {
  std::vector<float> a, b, c;

  ...

  tet1d::out(a) = tet1d::in(&a[0], a.size()) + tet1d::in(&b[0], b.size());

  ...

  return 0;
}

Note that it is possible to pass parameters to the expression template engine to specify the number of threads per block for GPUs or the SIMD extension to use...

Moreover a MATLAB-like syntax is provided. One can select a subrange of given input. Indexes are understood as for Python: -1 represents the last element. The contant tet1d::end = -1 allows one to write portable code.

int main() {
  std::vector<float> a, b, c;

  ...

  TET1D_IN(float) va = tet1d::in(&a[0], a.size());
  TET1D_IN(float) vb = tet1d::in(&b[0], b.size());
  tet1d::out(c) = va(10, tet1d::end - 10) + vb;

  ...

  return 0;
}

One can also specify which elements of the output must be rewritten with the following syntax.

int main() {
  std::vector<float> a, b, c;

  ...

  TET1D_IN(float) va = tet1d::in(&a[0], a.size());
  TET1D_IN(float) vb = tet1d::in(&b[0], b.size());
  TET1D_OUT(float) vc = tet1d::out(&c[0]);
  vc(va >= 10 && va < 20) = vb;

  ...

  return 0;
}

In the exemple above, element i in vc is written only if va[i] >= 10 and va[i] < 20. The expression appearing in the parenthesis can contain arbitrary expression templates as soon as the underlying type is bool.

Warning using auto

Using auto can lead to surprising results. We advice you never to use auto when dealing with expression templates. Indeed using auto will make the variable an obscure type representing the computation tree of the expression template. This implies that you won't be able to get data from this variable i.e. get the .data member for exemple. Again this variable or its type cannot be used in template arguments where you need it.