...

Example:

module load intel-cc/16.0.3.210

icc -g -qopenmp -O2 example.c –o example

...

-g	Build application with debug information to allow binary-to-source correlation in the reports.
-qopenmp	Enable generation of multi-threaded code if OpenMP directives/pragmas exist.
-O2 (or higher)	Request compiler optimization.
-vec	Enable vectorization if option O2 or higher is in effect (enabled by default).
-simd	Enable SIMD directives/pragmas (enabled by default).

For details of these options refer to man page or documentations of Intel compilers.

2) Submit a PBS job which executes the binary and runs the Intel Advisor command line tool advixe-cl to collect vectorisation information:

Example (PBS scripting part is omitted):

......
module load intel-advisor/2016.1.40.463413
advixe-cl --collect survey --project-dir ./advi ./example

For a 4-process MPI program, collect survey data into the shared ./advi project directory:

......
module load intel-advisor/2016.1.40.463413
mpirun -n 4 advixe-cl --collect survey --project-dir ./advi ./mpi_example_serial

3) Once the job finishes, launch advixe-gui on a login node to visualize the data collected by advixe-cl:

Example:

module load intel-advisor/2016.1.40.463413

advixe-gui &

Choose “Open Result” tab and then select the .advixeexp file generated by advixe-cli in previous step.
In the “Summary” part the summary of the report generated by Vectorization Advisor is shown. Vector instruction sets used, vectorization gain/efficiency are shown. Below is a screenshot of of the summary for a code run on a KNL node:

The Survey Report provides detailed compiler report data and performance data regarding vectorization such as:
- Which loops are vectorized, the location in the source code
- Vectorization issues
- The reason why a loop is not vectorized
- Vector ISA used
- Vectorization efficiency, speedup
- Vector length (# of elements processed in the SIMD instruction)
- Vectorization instructions used

Below is a screenshot of of the Survey Report for a code run on a KNL node:

If a loop cannot be vectorized with automatic vectorization, Intel Advisor will provide the reason and advices on how to fix the vectorization issues specific to your code, such as dependency analysis and memory access pattern analysis. Users should follow these advices, modify their source code and give compiler more hints to improve vectorization, by using compiler options or adding directives/pragmas to the source code (explicit vectorization).

Explicit Vectorization

Compiler SIMD directives/pragmas

Users can add compiler SIMD directives/pragmas to the source code to tell the compiler that dependency does not exist, so that the compiler can vectorize the loop when the user re-compiles the modified source code. Such SIMD directives/pragmas include:

#pragma vector always: instruct to vectorize a loop if it is safe to do so
#pragma vector align: assert that data within the loop is aligned on 16B boundary
#pragma ivdep: instruct the compiler to ignore potential data dependencies
#pragma simd: enforce vectorization of a loop

OpenMP directives/pragmas

Users can use OpenMP 4.0 new directives/pragmas for explicit vectorization:

#pragma omp simd: enforce vectorization of a loop
#pragma omp declare simd: instruct the compiler to vectorize a function
#pragma omp parallel for simd: target same loop for threading and SIMD, with each thread executing SIMD instructions

Compiler options and macros

Users can also use compiler options and macros for explicit vectorizaiton:

-D NOALIAS/-noalias: assert that there is no aliasing of memory references (array addresses or pointers)
-D REDUCTION: apply an omp simd directive with a reduction clause
-D NOFUNCCALL: remove the function and inline the loop
-D ALIGNED/-align: assert that data is aligned on 16B boundary
-fargument-noalias: function arguments cannot alias each other

SIMD enabled functions

Users can also declare and use SIMD enabled functions. In the example below, function foo is declared as a SIMD enabled function (vector function), so it is vectorized. So is the for loop in which it is called.

__attribute(vector) 
float foo(float); 
void vfoo(float *restrict a, float *restrict b, int n){ 
    int i; 
    for (i=0; i<n; i++) { a[i] = foo(b[i]); } 
} 
float foo(float x) { ... }

Programming Guidelines for Writing Vectorizable Code

Use simple loops, avoid variant upper iteration limit and data-dependent loop exit conditions
Write straight-line code: avoid branches, most function calls or if constructs
Use array notations instead of pointers
Use unit stride (increment 1 for each iteration) in inner loops
Use aligned data layout (memory addresses)
Use structure of arrays instead of arrays of structures
Use only assignment statements in the innermost loops
Avoid data dependencies between loop iterations, such as read-after-write, write-after-read, write-after-write
Avoid indirect addressing
Avoid mixing vectorizable types in the same loop
Avoid functions calls in innermost loop, except math library calls

Page tree

Versions Compared

Old Version 7

New Version 8

Key

Explicit Vectorization

Compiler SIMD directives/pragmas

OpenMP directives/pragmas

Compiler options and macros

SIMD enabled functions

Programming Guidelines for Writing Vectorizable Code

Page tree

Page History

Versions Compared

Old Version 7

New Version 8

Key

Explicit Vectorization

Compiler SIMD directives/pragmas

OpenMP directives/pragmas

Compiler options and macros

SIMD enabled functions

Programming Guidelines for Writing Vectorizable Code