In this page
- MARCONI
- Version 2.12 (A1 partition)
Version 2.12 (A1 partition)
Input: (http://www.ks.uiuc.edu/Research/namd/utilities/)
STMV (virus) benchmark (1,066,628 atoms, periodic, PME)
Graphic 1: the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of cores
Version 2.9 (BG/Q optimized)
Input: (http://www.ks.uiuc.edu/Research/namd/utilities/)
STMV (virus) benchmark (1,066,628 atoms, periodic, PME)
config, directory, 27.5M .tar.gz, 27.5M .zip
This version of the code, optimized by IBM, adopts a mixed MPI-Thread approach for the parallel computation. The number of MPI process per node are selected with the --ranks-per-node option of LoadLeveler, while the number of OpenMP threads per MPI process with the +ppn flag of namd.
Chosen the number of nodes with bg_size, we have tested different combinations of ranks-per-node and +ppn options and Graphic 1 reports the best performance obtained for the STMV benchmark.
Graphic 1: the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of cores
The scalability figure is good, up to the higher number of nodes.
In all the cases except bg_size 1024 and 2048, the best combination is given by ranks-per-node = 4 or 8 and +ppn16 or 8, respectively (thus making use of the maximum number, 64 threads per node).
For example in Graphic 2 are the results for the calculations with bg_size = 128 and maximum number of threads (total number 8192), corresponding to these running options:
runjob --np 2048 --ranks-per-node 16 : $NAMD_HOME/namd2 stmv_ori.namd +ppn4 > stmv.out
runjob --np 1024 --ranks-per-node 8 : $NAMD_HOME/namd2 stmv_ori.namd +ppn8 > stmv.out
runjob --np 512 --ranks-per-node 4 : $NAMD_HOME/namd2 stmv_ori.namd +ppn16 > stmv.out
runjob --np 256 --ranks-per-node 2 : $NAMD_HOME/namd2 stmv_ori.namd +ppn32 > stmv.out
runjob --np 128 --ranks-per-node 1 : $NAMD_HOME/namd2 stmv_ori.namd +ppn64 > stmv.out
Note that the namd output can be misleading in its definition of process/thread. For example, when running with these options:
runjob --np 512 --ranks-per-node 4 : $NAMD_HOME/namd2 stmv_ori.namd +ppn16 > stmv.out
you can find this line in the namd output:
Info: Running on 8192 processors, 512 nodes, 128 physical nodes.
where nodes are actually the MPI processes and processors the total number of threads.
Graphic 2: the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of ranks-per-node (fixed bg_size=128 and no. of total threads)
The dependency from the number of threads per process at fixed bg_size and rank_per_node is shown in Graphic 3. The plot clearly shows that the best performance are obtained by using the maximum number of threads (64 = 16 threads per MPI process * 4 process).
Graphic 3: the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of threads per MPI process (fixed bg_size=128, fixed ranks-per-nodes=4)
When increasing the number of nodes, this is not true anymore. For example, Graphic 4 shows the same data for bg_size = 1024 and 2048.
Graphic 4: the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of threads per MPI process (fixed bg_size=1024 and 2048, fixed ranks-per-node=4)
This graph shows that with bg_size = 1024 the best performance are obtained with two threads per core, while with bg_size = 2048 one thread per core is already enough.
Note that even though the options --ranks-per-node=4 and +ppn16 were the best in a wide range of cores for this benchmark, this could not be true for your system since the MD calculations are strongly dependent on system size and topology.
© Copyright 2