...
-g | Build application with debug information to allow binary-to-source correlation in the reports. |
-qopenmp | Enable generation of multi-threaded code if OpenMP directives/pragmas exist. |
-O2 (or higher) | Request compiler optimization. |
-vec | Enable vectorization if option O2 or higher is in effect (enabled by default). |
-simd | Enable SIMD directives/pragmas (enabled by default). |
For details of these options refer to man page or documentations of Intel compilers.
Explicit Vectorization
Compiler SIMD directives/pragmas
Users can add compiler SIMD directives/pragmas to the source code to tell the compiler that dependency does not exist, so that the compiler can vectorize the loop when the user re-compiles the modified source code. Such SIMD directives/pragmas include:
#pragma vector always: instruct to vectorize a loop if it is safe to do so #pragma vector align: assert that data within the loop is aligned on 16B boundary #pragma ivdep: instruct the compiler to ignore potential data dependencies #pragma simd: enforce vectorization of a loop |
OpenMP directives/pragmas
Users can use OpenMP 4.0 new directives/pragmas for explicit vectorization:
#pragma omp simd: enforce vectorization of a loop #pragma omp declare simd: instruct the compiler to vectorize a function #pragma omp parallel for simd: target same loop for threading and SIMD, with each thread executing SIMD instructions |
SIMD enabled functions
Users can also declare and use SIMD enabled functions. In the example below, function foo is declared as a SIMD enabled function (vector function), so it is vectorized. So is the for loop in which it is called.
__attribute(vector) float foo( float ); void vfoo( float *restrict a, float *restrict b, int n){ int i; for (i=0; i<n; i++) { a[i] = foo(b[i]); } } float foo( float x) { ... } |
Programming Guidelines for Writing Vectorizable Code
- Use simple loops, avoid variant upper iteration limit and data-dependent loop exit conditions
- Write straight-line code: avoid branches, most function calls or if constructs
- Use array notations instead of pointers
- Use unit stride (increment 1 for each iteration) in inner loops
- Use aligned data layout (memory addresses)
- Use structure of arrays instead of arrays of structures
- Use only assignment statements in the innermost loops
- Avoid data dependencies between loop iterations, such as read-after-write, write-after-read, write-after-write
- Avoid indirect addressing
- Avoid mixing vectorizable types in the same loop
- Avoid functions calls in innermost loop, except math library calls