Their names follow the following pattern: Sxx
where
S
is i
for signed integers, u
for unsigned integer or f
for
floatting point number.
xx
is the number of bits taken to represent the number.
Full list of scalar types:
f64
f32
f16
i64
i32
i16
i8
u64
u32
u16
u8
In NSIMD, we call a platform an architecture e.g. Intel, ARM, POWERPC. We call
SIMD extension a set of low-level functions and types provided by hardware
vendors to access SIMD units. Examples include SSE2, SSE42, AVX, ... When
compiling the generic SIMD vector types represents a SIMD register of the
target. Examples are a __m128
for Intel SSE, __m512
for Intel AVX-512 or
svfloat32_t
for Arm SVE.
Their names follow the following pattern:
C base API: vSCALAR
where SCALAR
is a one of scalar type listed above.
C advanced API: nsimd_pack_SCALAR
where SCALAR
is a one of scalar type
listed above.
C++ advanced API: nsimd::pack<SCALAR>
where SCALAR
is a one of scalar
type listed above.
Full list of SIMD vector types:
Base type | C base API | C advanced API | C++ advanced API |
---|---|---|---|
f64 |
vf64 |
nsimd_pack_f64 |
nsimd::pack<f64> |
f32 |
vf32 |
nsimd_pack_f32 |
nsimd::pack<f32> |
f16 |
vf16 |
nsimd_pack_f16 |
nsimd::pack<f16> |
i64 |
vi64 |
nsimd_pack_i64 |
nsimd::pack<i64> |
i32 |
vi32 |
nsimd_pack_i32 |
nsimd::pack<i32> |
i16 |
vi16 |
nsimd_pack_i16 |
nsimd::pack<i16> |
i8 |
vi8 |
nsimd_pack_i8 |
nsimd::pack<i8> |
u64 |
vu64 |
nsimd_pack_u64 |
nsimd::pack<u64> |
u32 |
vu32 |
nsimd_pack_u32 |
nsimd::pack<u32> |
u16 |
vu16 |
nsimd_pack_u16 |
nsimd::pack<u16> |
u8 |
vu8 |
nsimd_pack_u8 |
nsimd::pack<u8> |
These come automatically when you include nsimd/nsimd.h
. You do not need
to include a header file for having a function. Here is a list of supported
platforms and their corresponding SIMD extensions.
Platform arm
neon128
aarch64
sve
sve128
sve256
sve512
sve1024
sve2048
Platform x86
sse2
sse42
avx
avx2
avx512_knl
avx512_skylake
Platform ppc
vmx
vsx
Platform cpu
cpu
Each simd extension has its own set of SIMD types and functions. Types follow
the pattern: nsimd_SIMDEXT_vSCALAR
where
SIMDEXT
is the SIMD extensions.
SCALAR
is one of scalar types listed above.
There are also logical types associated to each SIMD vector type. These types
are used, for example, to represent the result of a comparison of SIMD vectors.
They are usually bit masks. Their name follow the pattern:
nsimd_SIMDEXT_vlSCALAR
where
SIMDEXT
is the SIMD extensions.
SCALAR
is one of scalar types listed above.
Note 1: Platform cpu
is a 128 bits SIMD emulation fallback when no SIMD
extension has been specified or is supported on a given compilation target.
Note 2: as all SIMD extensions of all platforms are different there is no need to put the name of the platform in each identifier.
Function names follow the pattern: nsimd_SIMDEXT_FUNCNAME_SCALAR
where
SIMDEXT
is the SIMD extensions.
FUNCNAME
is the name of a function e.g. add
or sub
.
SCALAR
is one of scalar types listed above.
In the base C API, genericity is achieved using macros.
vec(SCALAR)
is a type to represent a SIMD vector containing SCALAR
elements. SCALAR must be one of scalar types listed above.
vecl(SCALAR)
is a type to represent a SIMD vector of logicals for SCALAR
elements. SCALAR must be one of scalar types listed above.
vec_a(SCALAR, SIMDEXT)
is a type to represent a SIMD vector containing
SCALAR elements for the simd extension SIMDEXT. SCALAR must be one of scalar
types listed above and SIMDEXT must be a valid SIMD extension.
vecl_a(SCALAR, SIMDEXT)
is a type to represent a SIMD vector of logicals
for SCALAR elements for the simd extension SIMDEXT. SCALAR must be one of
scalar types listed above and SIMDEXT must be a valid SIMD extension.
vFUNCNAME
takes as input the above types to access the operator FUNCNAME
e.g. vadd
, vsub
.
In C++98 and C++03, type traits are available.
nsimd::simd_traits<SCALAR, SIMDEXT>::vector
is the SIMD vector type for
platform SIMDEXT containing SCALAR elements. SIMDEXT is one of SIMD
extension listed above, SCALAR is one of scalar type listed above.
nsimd::simd_traits<SCALAR, SIMDEXT>::vectorl
is the SIMD vector of logicals
type for platform SIMDEXT containing SCALAR elements. SIMDEXT is one of
SIMD extensions listed above, SCALAR is one of scalar type listed above.
In C++11 and beyond, type traits are still available but typedefs are also provided.
nsimd::vector<SCALAR, SIMDEXT>
is a typedef to
nsimd::simd_traits<SCALAR, SIMDEXT>::vector
.
nsimd::vectorl<SCALAR, SIMDEXT>
is a typedef to
nsimd::simd_traits<SCALAR, SIMDEXT>::vectorl
.
The C++20 API does not bring different types for SIMD registers nor other
way to access the other SIMD types. It only brings concepts instead of usual
typename
s. For more informations cf. concepts.md.
Note that all macro and functions available in plain C are still available in C++.
In the documentation we use interchangeably the terms "function" and
"operator". For each operator FUNCNAME a C function (also available in C++)
named nsimd_SIMDEXT_FUNCNAME_SCALAR
is available for each SCALAR type unless
specified otherwise.
For each FUNCNAME, a C macro (also available in C++) named vFUNCNAME
is
available and takes as its last argument a SCALAR type.
For each FUNCNAME, a C macro (also available in C++) named vFUNCNAME_a
is
available and takes as its two last argument a SCALAR type and a SIMDEXT.
For each FUNCNAME, a C++ function in namespace nsimd
named FUNCNAME
is
available. It takes as its last argument the SCALAR type and can optionnally
take the SIMDEXT as its last last argument.
For example, for the addition of two SIMD vectors a
and b
here are the
possibilities:
c = nsimd_add_avx_f32(a, b); // use AVX
c = nsimd::add(a, b, f32()); // use detected SIMDEXT
c = nsimd::add(a, b, f32(), avx()); // force AVX even if detected SIMDEXT is not AVX
c = vadd(a, b, f32); // use detected SIMDEXT
c = vadd_e(a, b, f32, avx); // force AVX even if detected SIMDEXT is not AVX
Here is a list of available FUNCNAME.
int len();
vSCALAR set1(SCALAR a0);
vlSCALAR set1l(int a0);
vSCALAR loadu(SCALAR const* a0);
vSCALAR masko_loadu1(vlSCALAR a0, SCALAR const* a1, vSCALAR a2);
vSCALAR maskz_loadu1(vlSCALAR a0, SCALAR const* a1);
vSCALARx2 load2u(SCALAR const* a0);
vSCALARx3 load3u(SCALAR const* a0);
vSCALARx4 load4u(SCALAR const* a0);
vSCALAR loada(SCALAR const* a0);
vSCALAR masko_loada1(vlSCALAR a0, SCALAR const* a1, vSCALAR a2);
vSCALAR maskz_loada1(vlSCALAR a0, SCALAR const* a1);
vSCALARx2 load2a(SCALAR const* a0);
vSCALARx3 load3a(SCALAR const* a0);
vSCALARx4 load4a(SCALAR const* a0);
vlSCALAR loadlu(SCALAR const* a0);
vlSCALAR loadla(SCALAR const* a0);
void storeu(SCALAR* a0, vSCALAR a1);
void mask_storeu1(vlSCALAR a0, SCALAR* a1, vSCALAR a2);
void store2u(SCALAR* a0, vSCALAR a1, vSCALAR a2);
void store3u(SCALAR* a0, vSCALAR a1, vSCALAR a2, vSCALAR a3);
void store4u(SCALAR* a0, vSCALAR a1, vSCALAR a2, vSCALAR a3, vSCALAR a4);
void storea(SCALAR* a0, vSCALAR a1);
void mask_storea1(vlSCALAR a0, SCALAR* a1, vSCALAR a2);
void store2a(SCALAR* a0, vSCALAR a1, vSCALAR a2);
void store3a(SCALAR* a0, vSCALAR a1, vSCALAR a2, vSCALAR a3);
void store4a(SCALAR* a0, vSCALAR a1, vSCALAR a2, vSCALAR a3, vSCALAR a4);
vSCALAR gather(SCALAR const* a0, viCALAR a1);
Only available for f64, f32, f16, i16, u16, u32, i32, i64, u64
vSCALAR gather_linear(SCALAR const* a0, int a1);
void scatter(SCALAR* a0, viCALAR a1, vSCALAR a2);
Only available for f64, f32, f16, i16, u16, u32, i32, i64, u64
void scatter_linear(SCALAR* a0, int a1, vSCALAR a2);
void storelu(SCALAR* a0, vlSCALAR a1);
void storela(SCALAR* a0, vlSCALAR a1);
vSCALAR orb(vSCALAR a0, vSCALAR a1);
vSCALAR andb(vSCALAR a0, vSCALAR a1);
vSCALAR andnotb(vSCALAR a0, vSCALAR a1);
vSCALAR notb(vSCALAR a0);
vSCALAR xorb(vSCALAR a0, vSCALAR a1);
vlSCALAR orl(vlSCALAR a0, vlSCALAR a1);
vlSCALAR andl(vlSCALAR a0, vlSCALAR a1);
vlSCALAR andnotl(vlSCALAR a0, vlSCALAR a1);
vlSCALAR xorl(vlSCALAR a0, vlSCALAR a1);
vlSCALAR notl(vlSCALAR a0);
vSCALAR add(vSCALAR a0, vSCALAR a1);
vSCALAR sub(vSCALAR a0, vSCALAR a1);
SCALAR addv(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR mul(vSCALAR a0, vSCALAR a1);
vSCALAR div(vSCALAR a0, vSCALAR a1);
vSCALAR neg(vSCALAR a0);
vSCALAR min(vSCALAR a0, vSCALAR a1);
vSCALAR max(vSCALAR a0, vSCALAR a1);
vSCALAR shr(vSCALAR a0, int a1);
Only available for i64, i32, i16, i8, u64, u32, u16, u8
vSCALAR shl(vSCALAR a0, int a1);
Only available for i64, i32, i16, i8, u64, u32, u16, u8
vSCALAR shra(vSCALAR a0, int a1);
Only available for i64, i32, i16, i8, u64, u32, u16, u8
vlSCALAR eq(vSCALAR a0, vSCALAR a1);
vlSCALAR ne(vSCALAR a0, vSCALAR a1);
vlSCALAR gt(vSCALAR a0, vSCALAR a1);
vlSCALAR ge(vSCALAR a0, vSCALAR a1);
vlSCALAR lt(vSCALAR a0, vSCALAR a1);
vlSCALAR le(vSCALAR a0, vSCALAR a1);
vSCALAR if_else1(vlSCALAR a0, vSCALAR a1, vSCALAR a2);
vSCALAR abs(vSCALAR a0);
vSCALAR fma(vSCALAR a0, vSCALAR a1, vSCALAR a2);
vSCALAR fnma(vSCALAR a0, vSCALAR a1, vSCALAR a2);
vSCALAR fms(vSCALAR a0, vSCALAR a1, vSCALAR a2);
vSCALAR fnms(vSCALAR a0, vSCALAR a1, vSCALAR a2);
vSCALAR ceil(vSCALAR a0);
vSCALAR floor(vSCALAR a0);
vSCALAR trunc(vSCALAR a0);
vSCALAR round_to_even(vSCALAR a0);
int all(vlSCALAR a0);
int any(vlSCALAR a0);
int nbtrue(vlSCALAR a0);
vSCALAR reinterpret(vSCALAR a0);
vlSCALAR reinterpretl(vlSCALAR a0);
vSCALAR cvt(vSCALAR a0);
vSCALARx2 upcvt(vSCALAR a0);
Only available for i8, u8, i16, u16, f16, i32, u32, f32
vSCALAR downcvt(vSCALAR a0, vSCALAR a1);
Only available for i16, u16, f16, i32, u32, f32, i64, u64, f64
vSCALAR rec(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR rec11(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR rec8(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR sqrt(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR rsqrt11(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR rsqrt8(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR ziplo(vSCALAR a0, vSCALAR a1);
vSCALAR ziphi(vSCALAR a0, vSCALAR a1);
vSCALAR unziplo(vSCALAR a0, vSCALAR a1);
vSCALAR unziphi(vSCALAR a0, vSCALAR a1);
vSCALARx2 zip(vSCALAR a0, vSCALAR a1);
vSCALARx2 unzip(vSCALAR a0, vSCALAR a1);
vSCALAR to_mask(vlSCALAR a0);
vlSCALAR to_logical(vSCALAR a0);
vSCALAR iota();
vlSCALAR mask_for_loop_tail(int a0, int a1);
vSCALAR adds(vSCALAR a0, vSCALAR a1);
vSCALAR subs(vSCALAR a0, vSCALAR a1);
vSCALAR sin_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR cos_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR tan_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR asin_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR acos_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR atan_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR atan2_u35(vSCALAR a0, vSCALAR a1);
Only available for f64, f32, f16
vSCALAR log_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR cbrt_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR sin_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR cos_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR tan_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR asin_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR acos_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR atan_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR atan2_u10(vSCALAR a0, vSCALAR a1);
Only available for f64, f32, f16
vSCALAR log_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR cbrt_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR exp_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR pow_u10(vSCALAR a0, vSCALAR a1);
Only available for f64, f32, f16
vSCALAR sinh_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR cosh_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR tanh_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR sinh_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR cosh_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR tanh_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR asinh_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR acosh_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR atanh_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR exp2_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR exp2_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR exp10_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR exp10_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR expm1_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR log10_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR log2_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR log2_u35(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR log1p_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR sinpi_u05(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR cospi_u05(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR hypot_u05(vSCALAR a0, vSCALAR a1);
Only available for f64, f32, f16
vSCALAR hypot_u35(vSCALAR a0, vSCALAR a1);
Only available for f64, f32, f16
vSCALAR remainder(vSCALAR a0, vSCALAR a1);
Only available for f64, f32, f16
vSCALAR fmod(vSCALAR a0, vSCALAR a1);
Only available for f64, f32, f16
vSCALAR lgamma_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR tgamma_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR erf_u10(vSCALAR a0);
Only available for f64, f32, f16
vSCALAR erfc_u15(vSCALAR a0);
Only available for f64, f32, f16
The C advanced API takes advantage of the C11 _Generic
keyword to provide
function overloading. Unlike the base API described above there is no need to
pass as arguments the base type of the SIMD extension. The informations are
contained in the types provided by this API.
nsimd_pack_SCALAR_SIMDEXT
represents a SIMD vectors containing
SCALAR elements of SIMD extension SIMDEXT.
nsimd::packl_SCALAR_SIMDEXT
represents a SIMD vectors of logicals
for SCALAR elements of SIMD extension SIMDEXT.
There are versions of the above type without SIMDEXT for which the targeted SIMD extension is automatically chosen.
nsimd_pack_SCALAR
represents a SIMD vectors containing SCALAR elements.
nsimd::packl_SCALAR
represents a SIMD vectors of logicals for SCALAR
elements.
Generic types are also available:
nsimd_pack(SCALAR)
is a type to represent a SIMD vector containing SCALAR
elements. SCALAR must be one of scalar types listed above.
nsimd_packl(SCALAR)
is a type to represent a SIMD vector of logicals for
SCALAR elements. SCALAR must be one of scalar types listed above.
nsimd_pack_a(SCALAR, SIMDEXT)
is a type to represent a SIMD vector
containing SCALAR elements for the simd extension SIMDEXT. SCALAR must be one
of scalar types listed above and SIMDEXT must be a valid SIMD extension.
nsimd_packl_a(SCALAR, SIMDEXT)
is a type to represent a SIMD vector of
logicals for SCALAR elements for the simd extension SIMDEXT. SCALAR must be
one of scalar types listed above and SIMDEXT must be a valid SIMD extension.
Finally, operators are follow the naming: nsimd_FUNCNAME
e.g. nsimd_add
,
nsimd_sub
.
The C++ advanced API is called advanced not because it requires C++11 or above but because it makes use of the particular implementation of ARM SVE by ARM in their compiler. We do not know if GCC (and possibly MSVC in the distant future) will use the same approach. Anyway the current implementation allows us to put SVE SIMD vectors inside some kind of structs that behave like standard structs. If you want to be sure to write portable code do not use this API. Two new types are available.
nsimd::pack<SCALAR, N, SIMDEXT>
represents N
SIMD vectors containing
SCALAR elements of SIMD extension SIMDEXT. You can specify only the first
template argument. The second defaults to 1 while the third defaults to the
detected SIMDEXT.
nsimd::packl<SCALAR, N, SIMDEXT>
represents N
SIMD vectors of logical
type containing SCALAR elements of SIMD extension SIMDEXT. You can specify
only the first template argument. The second defaults to 1 while the third
defaults to the detected SIMDEXT.
Use N > 1 when declaring packs to have an unroll of N. This is particularily useful on ARM.
Functions that takes packs do not take any other argument unless specified
otherwise e.g. the load family of funtions. It is impossible to determine
the kind of pack (unroll and SIMDEXT) from the type of a pointer. Therefore
in this case, the last argument must be a pack and this same type will then
return. Also some functions are available as C++ operators. They follow the
naming: nsimd::FUNCNAME
.