Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

-gBuild application with debug information to allow binary-to-source correlation in the reports.
-qopenmpEnable generation of multi-threaded code if OpenMP directives/pragmas exist.
-O2 (or higher)Request compiler optimization.
-vecEnable vectorization if option O2 or higher is in effect (enabled by default).
-simdEnable SIMD directives/pragmas (enabled by default).

For details of these options refer to man page or documentations of Intel compilers.

 

 

 

 

 

Explicit Vectorization

Compiler SIMD directives/pragmas

Users can add compiler SIMD directives/pragmas to the source code to tell the compiler that dependency does not exist, so that the compiler can vectorize the loop when the user re-compiles the modified source code. Such SIMD directives/pragmas include:

 

#pragma vector always: instruct to vectorize a loop if it is safe to do so
#pragma vector align: assert that data within the loop is aligned on 16B boundary
#pragma ivdep: instruct the compiler to ignore potential data dependencies
#pragma simd: enforce vectorization of a loop

 

OpenMP directives/pragmas

Users can use OpenMP 4.0 new directives/pragmas for explicit vectorization:

 

 

#pragma omp simd: enforce vectorization of a loop
#pragma omp declare simd: instruct the compiler to vectorize a function
#pragma omp parallel for simd: target same loop for threading and SIMD, with each thread executing SIMD instructions

 

 

 

SIMD enabled functions

Users can also declare and use SIMD enabled functions. In the example below, function foo is declared as a SIMD enabled function (vector function), so it is vectorized. So is the for loop in which it is called.

 

__attribute(vector)
float foo(float);
void vfoo(float *restrict a, float *restrict b, int n){
    int i;
    for (i=0; i<n; i++) { a[i] = foo(b[i]); }
}
float foo(float x) { ... }

 

Programming Guidelines for Writing Vectorizable Code

  • Use simple loops, avoid variant upper iteration limit and data-dependent loop exit conditions
  • Write straight-line code: avoid branches, most function calls or if constructs
  • Use array notations instead of pointers
  • Use unit stride (increment 1 for each iteration) in inner loops
  • Use aligned data layout (memory addresses)
  • Use structure of arrays instead of arrays of structures
  • Use only assignment statements in the innermost loops
  • Avoid data dependencies between loop iterations, such as read-after-write, write-after-read, write-after-write
  • Avoid indirect addressing
  • Avoid mixing vectorizable types in the same loop
  • Avoid functions calls in innermost loop, except math library calls