You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

 

In this page

  • FERMI
    • Version 2.9 (BG/Q optimized)
    • Version 2.9 with plumed 1.3

Version 2.9 (BG/Q optimized)

Input:  (http://www.ks.uiuc.edu/Research/namd/utilities/)

STMV (virus) benchmark (1,066,628 atoms, periodic, PME)    
config, directory, 27.5M .tar.gz, 27.5M .zip

 This version of the code, optimized by IBM, adopts a mixed MPI-Thread approach for the parallel computation. The number of MPI process per node are selected with the --ranks-per-node option of LoadLeveler, while the number of OpenMP threads per MPI process with the +ppn flag of namd.

Chosen the number of nodes with bg_size, we have tested different combinations of ranks-per-node and +ppn options and Graphic 1 reports the best performance obtained for the STMV benchmark.

 

Graphic 1: the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of cores

 

The scalability figure is good, up to the higher number of nodes.

 

In all the cases except bg_size 1024 and 2048, the best combination is given by ranks-per-node = 4 or 8 and +ppn16 or 8, respectively (thus making use of the maximum number, 64 threads per node).

For example in Graphic 2 are the results for the calculations with bg_size = 128 and maximum number of threads (total number 8192), corresponding to these running options:

runjob --np 2048 --ranks-per-node 16 :  $NAMD_HOME/namd2 stmv_ori.namd +ppn4 > stmv.out

runjob --np 1024 --ranks-per-node  8 :  $NAMD_HOME/namd2 stmv_ori.namd +ppn8 > stmv.out

runjob --np  512 --ranks-per-node  4 :  $NAMD_HOME/namd2 stmv_ori.namd +ppn16 > stmv.out

runjob --np  256 --ranks-per-node  2 :  $NAMD_HOME/namd2 stmv_ori.namd +ppn32 > stmv.out

runjob --np  128 --ranks-per-node  1 :  $NAMD_HOME/namd2 stmv_ori.namd +ppn64 > stmv.out

 

Note that the namd output can be misleading in its definition of process/thread. For example, when running with these options:

runjob --np  512 --ranks-per-node  4 :  $NAMD_HOME/namd2 stmv_ori.namd +ppn16 > stmv.out

you can find this line in the namd output:

 Info: Running on 8192 processors, 512 nodes, 128 physical nodes.

where nodes are actually the MPI processes and processors the total number of threads.

 

Graphic 2: the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of ranks-per-node (fixed bg_size=128 and no. of total threads)

 

The dependency from the number of threads per process at fixed bg_size and rank_per_node is shown in Graphic 3. The plot clearly shows that the best performance are obtained by using the maximum number of threads (64 = 16 threads per MPI process * 4 process).

 

Graphic 3:  the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of threads per MPI process (fixed bg_size=128, fixed ranks-per-nodes=4)

 

 

When increasing the number of nodes, this is not true anymore. For example, Graphic 4 shows the same data for bg_size = 1024 and 2048.

 

 Graphic 4: the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of threads per MPI process (fixed bg_size=1024 and 2048, fixed ranks-per-node=4)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This graph shows that with bg_size = 1024 the best performance are obtained with two threads per core, while with bg_size = 2048 one thread per core is already enough.

 

Note that even though the options --ranks-per-node=4 and +ppn16 were the best in a wide range of cores for this benchmark, this could not be true for your system since the MD calculations are strongly dependent on system size and topology.

© Copyright 2

  • No labels