In this page
- MARCONI
- Version 2.12 (BDW A1 partition)
Version 2.12 (A1 partition) - work in progress
Input: (http://www.ks.uiuc.edu/Research/namd/utilities/)
STMV (virus) benchmark (1,066,628 atoms, periodic, PME)
Graphic 1: the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of cores
Input: (http://www.ks.uiuc.edu/Research/namd/utilities/)
APOA1 - Acquaporin, (water channel) benchmark (92 224 atoms, periodic, PME)
Graphic 1: the NAMD performance (simulation time in ns/day) is reported vs. the increasing number of cores
MARCONI (KNL- A2 partition)
Problem
namd2 +p 136 apoa1/apoa1.namd +pemap 0-135
cores | days/ns | ns/day |
8 | 1.53863 | 0.649929 |
16 | 0.798383 | 1.25253 |
32 | 0.41594 | 2.40419 |
68 | 0.33502 | 2.9849 |
136 | 6.17328 | 0.16198 |
NB: SMP version of NAMD-12 downlowded directly from NAMD website
Notice the drop in performance >32 cores and particularly with hyperthreading (136 cores).
For this reason Intel were given access to Marconi: they obtained the same results.
Reason for poor performance
The origin of the poor single node performance on KNL was eventually found to be the gettimeofday() system call when called in parallel:
Marconi | Endeavour | |||
cores | 54 | 64 | 54 | 64 |
Total time | 28.59 | 43.64 | 23.92 | 20.75 |
__vdso_gettimeofday | 5.25 | 21.29 | 0.77 | 1.44 |
performance, ns/day, HB | 3.78 | 3.38 | 3.76 | 4.47 |
(Endeavour is the KNL cluster based at Intel.)
This call is heavily used at the beginning of NAMD in the Dynamic Load Balancing (DLB) phase and leads to a slowdown at least in the first few hundred steps.
The difference between the Marconi and Endeavour versions of gettimeofday() lie in the value of the current_clocksource setting of Linux:
© Copyright 2