SIMD-enabled functions (formerly called elemental functions) are a general language construct to express a data parallel algorithm. A SIMD-enabled function is written as a regular C/C++ function, and the algorithm within describes the operation on one element, using scalar syntax. The function can then be called as a regular C/C++ function to operate on a single element or it can be called in a data parallel context to operate on many elements.
In some cases it is desirable to have a pointer for SIMD-enabled functions, but without special effort, the vector nature of a function will be lost: function pointers will point to the scalar function and there will be no way to call the short vector variants existing for this scalar function.
In order to support indirect calls to vector variants of SIMD-enabled functions, SIMD-enabled function pointers were introduced. A SIMD-enabled function pointer is a special kind of pointer incompatible with a regular function pointer. They refer to an entire set of short vector variants as well as the scalar function. This incompatibility incurs the risk of inappropriate misuse, especially in C++ code. Therefore vector function pointer support is disabled by default.
When you write a SIMD-enabled function, the compiler generates short vector variants of the function that you requested, which can perform your function's operation on multiple arguments in a single invocation. The short vector variants may be able to perform multiple operations as fast as the regular implementation performs just one such operation by utilizing the vector instruction set architecture (ISA) in the CPU. When a call to SIMD-enabled function occurs in a SIMD loop or another SIMD-enabled function, the compiler replaces the scalar call with the best fit short vector variant of the function among those available.
Indirect SIMD-enabled function calls are handled similarly, but the set of available variants should be associated with the function pointer variable, not the target function, because actual call targets are unknown at the indirect call. That means all SIMD-enabled functions to be referenced by a SIMD-enabled function pointer should have a set of variants that match the set of variants declared for the pointer.
In order for the compiler to generate a pointer to a SIMD-enabled function, you need to provide an indication in your code.
Linux
Use the __attribute__((vector (clauses))) attribute, as follows:
__attribute__((vector (clauses))) return_type (*function_pointer_name) (parameters)
Alternately, you can use OpenMP #pragma omp declare simd, which requires the [q or Q]openmp or [q or Q]openmp-simd compiler option.
Windows
Use the __declspec(vector (clauses)) attribute, as follows:
__declspec(vector (clauses)) return_type (*function_pointer_name) (parameters)
The clauses are described in the previous topic on SIMD-enabled functions.
You may associate several vector attributes with one SIMD-enabled function pointer which reflects all the variants available for the target functions to be called through the pointer. The attributes usually reflect a possible use of the function pointer in the loops. Encountering an indirect call, the compiler matches the vector variants declared on the function pointer with the actual parameter kinds and chooses the best match. Matching is done exactly the same way as with direct calls (see the previous topic on SIMD-enabled functions). Consider the following example of the declaration of vector function pointers and loops with indirect calls.
// pointer declaration
#pragma omp declare simd // universal but slowest definition matches the use in all three loops
#pragma omp declare simd linear(in1) linear(ref(in2)) uniform(mul) // matches the use in the first loop
#pragma omp declare simd linear(ref(in2)) // matches the use in the second and the third loops
#pragma omp declare simd linear(ref(in2)) linear(mul) // matches the use in the second loop
#pragma omp declare simd linear(val(in2:2)) // matches the use in the third loop
int (*func)(int* in1, int& in2, int mul);
int *a, *b, mul, *c;
int *ndx, nn;
...
// loop examples
for (int i = 0; i < nn; i++) {
c[i] = func(a + i, *(b + i), mul); // in the loop, the first parameter is changed linearly,
// the second reference is changed linearly too
// the third parameter is not changed
}
for (int i = 0; i < nn; i++) {
c[i] = func(&a[ndx[i]], b[i], i + 1); // the value of the first parameter is unpredictable,
// the second reference is changed linearly
// the third parameter is changed linearly
}
#pragma omp simd
for (int i = 0; i < nn; i++) {
int k = i * 2; // during vectorization, private variables are transformed into arrays: k->k_vec[vector_length]
c[i] = func(&a[ndx[i]], k, b[i]); // the value of the first parameter is unpredictable,
// the second reference and value can be considered linear
// the third parameter has unpredictable value
// (the __declspec(vector(linear(val(in2:2)))) will be chosen from the two matching variants)
}
Before any use in a call, the function pointer should be assigned either the address of a function or another function pointer. Just as with function pointers, vector function pointers should be compatible at assignment and initialization. The compatibility rules are described below.
Pointer assignment compatibility is defined as following:
SIMD-enabled function pointers and regular function pointers are binary-incompatible and handled differently. Mixing them may lead to severe unpredictable results. The compiler does its best to check compatibility where it is allowed by C/C++ language standards, but in certain cases it cannot check, such as passing function pointers to undeclared functions or as variable arguments. It is best to refrain from using SIMD-enabled function pointers in these contexts. Additional complexities with respect to the C++ type system are described in the SIMD-enabled Function Pointers and the C++ Type System section below.
A SIMD-enabled function pointer may be assigned to a scalar function pointer with a cast as described in rule 4 above, but a SIMD-enabled function pointer cannot refer to a scalar function pointer.
// pointer declarations
#pragma omp declare simd
int (*ptr1)(int*, int);
#pragma omp declare simd
int (*ptr1a)(int*, int);
#pragma omp declare simd
#pragma omp declare simd linear(a)
typedef int (*fptr_t2)(int* a, int b);
typedef int (*fptr_t3)(int*, int);
fptr_t2 ptr2, ptr2a;
fptr_t3 ptr3;
// function declarations
#pragma omp declare simd
int func1(int* x, int b);
#pragma omp declare simd
#pragma omp declare simd linear(x)
int func2(int* x, int b);
#pragma omp declare simd
#pragma omp declare simd linear(x)
int func3(float* x, int b);
//--------------------------------------
// allowed assignments
ptr1 = func1; // same prototype and vector spec
ptr2 = func2; // same prototype and vector spec
ptr1a = ptr1; // same prototype and vector spec
ptr1a = func2; // same prototype vector spec on function includes all vector spec on pointer
ptr3 = func1; // scalar pointer with same prototype - use scalar func1
ptr3 = func2; // scalar pointer with same prototype - use scalar func2
ptr3 = ptr1; // scalar pointer with same prototype - implicit conversion from vector to scalar pointer
ptr3 = ptr2; // scalar pointer with same prototype - implicit conversion from vector to scalar pointer
// disallowed assignments
ptr2 = func1; // vector spec on function does not have all specs on pointer
ptr2 = func3; // prototype mismatch although vector spec matched
ptr1 = func3; // prototype mismatch although vector spec matched
ptr3 = func3; // prototype mismatch
ptr1 = ptr2; // pointers should have the same vector spec
ptr2 = ptr3; // pointers should have the same vector spec
Unlike regular function calls, which transfer control to a target function, the call target of an indirect call depends on the dynamic content of the function pointer. In a loop, call targets may be different on different iterations of a vectorized loop or on different lanes of a SIMD-enabled function executing the call. When vectorized, such an indirect call may involve multiple calls to different targets within a single SIMD chunk. This works as follows:
// pointer typedefs
#pragma omp declare simd
typedef int (*fptr_t1)(int*, int);
// function declarations
#pragma omp declare simd
int func1(int* x, int b);
// uses of vector function pointers
fptr_t1 *fptr_array; // array of vector function pointers
void foo(int N, int *x, int y){
fptr_t1 ptr1 = func1;
#pragma omp simd
for (int i = 0; i < N; i++) {
ptr1(x+i, y); // ptr1 is uniform by OpenMP rule.
fptr_t1 ptr1a = ptr1;
ptr1a(x+i, y); // compiler can prove ptr1a is uniform.
fptr_t1 ptr1b = fptr_array[i];
ptr1b(x+i,y); // ptr1b may or may not be uniform.
}
}
Use caution when using SIMD-enabled function pointers in modern C++: C++ imposes strict requirements on compilation and execution environments which may not compose well with semantically-rich language extensions such as SIMD-enabled function pointers. Vector specifications on SIMD-enabled function pointers are attributes in C++11 sense and so are not part of a pointer type even though they make that pointer binary incompatible with another pointer of the same type but without the attribute. Vector specifications are not bound to a pointer type, but instead are bound to the variable or function argument (which is an instance of a pointer type) itself. For a given function pointer, the type of the pointer is the same with or without SIMD-enabled function pointer decoration. This has the following important implications:
// pointer typedefs and pointer declarations
typedef int
(*fptr_t)(int*, int);
#pragma omp declare simd
typedef int (*fptr_t1)(int*, int);
#pragma omp declare simd
#pragma omp declare simd linear(x)
typedef int (*fptr_t2)(int* a, int b);
fptr_t ptr
fptr_t1 ptr1
fptr_t2 ptr2
// function prototype that only differs in SIMD-enabled function decoration
// All these will have identical mangled names.
void foo(fptr_t);
void foo(fptr_t1);
void foo(fptr_t2);
// template instantiation
template <typename T>
void bar(T);
…
bar(fptr); // bar<fptr_t>
bar(fptr1); // bar<fptr_t>
bar(fptr2); // bar<fptr_t>
Typically, the invocation of a SIMD-enabled function directly or indirectly provides arrays wherever scalar arguments are specified as formal parameters.
The following invocations will give instruction-level parallelism by having the compiler issue special vector instructions.
#pragma omp declare simd
float (**vf_ptr)(float, float);
//operates on the whole extent of the arrays a, b, c
a[:] = vf_ptr[:] (b[:],c[:]);
// use the full array notation construct to also specify n
// as an extend and s as a stride
a[0:n:s] = vf_ptr[0:n:s] (b[0:n:s],c[0:n:s]);