Archive

Posts Tagged ‘coolthreads’

MPI and Solaris on Coolthreads

October 4, 2010 Leave a comment

Coolthreads. Interesting boxes. What follows is a log of my experiments with MPI (Message Passing Interface)  as a way to exploit the large number of worker threads present on T-series Sun hardware.

With the workload below coordinated by MPICH2,  we see near linear performance increases from 1 to 16 threads and then a very shallow increase as we approach 32 threads, at which point there is no further improvement.

Could this performance ceiling be due to memory bandwidth limits or cache contention perhaps?

While 0 > x <= 64, mpirun -np $x ./john --test --format=MD5

CPUs:   1       Raw:    1149 c/s real, 1149 c/s virtual
CPUs:   2       Raw:    2298 c/s real, 2298 c/s virtual
CPUs:   3       Raw:    3447 c/s real, 3447 c/s virtual
CPUs:   4       Raw:    4596 c/s real, 4596 c/s virtual
CPUs:   5       Raw:    5739 c/s real, 5733 c/s virtual
CPUs:   6       Raw:    6882 c/s real, 6894 c/s virtual
CPUs:   7       Raw:    8043 c/s real, 8043 c/s virtual
CPUs:   8       Raw:    9192 c/s real, 9186 c/s virtual
CPUs:   9       Raw:    10303 c/s real, 10297 c/s virtual
CPUs:   10      Raw:    11438 c/s real, 11402 c/s virtual
CPUs:   11      Raw:    11941 c/s real, 11921 c/s virtual
CPUs:   12      Raw:    13606 c/s real, 13598 c/s virtual
CPUs:   13      Raw:    14741 c/s real, 14784 c/s virtual
CPUs:   14      Raw:    15822 c/s real, 15840 c/s virtual
CPUs:   15      Raw:    16298 c/s real, 16298 c/s virtual
CPUs:   16      Raw:    16815 c/s real, 16817 c/s virtual
CPUs:   17      Raw:    17094 c/s real, 17105 c/s virtual
CPUs:   18      Raw:    16467 c/s real, 16514 c/s virtual
CPUs:   19      Raw:    18438 c/s real, 18438 c/s virtual
CPUs:   20      Raw:    18449 c/s real, 18464 c/s virtual
CPUs:   21      Raw:    18516 c/s real, 18507 c/s virtual
CPUs:   22      Raw:    18638 c/s real, 18619 c/s virtual
CPUs:   23      Raw:    18433 c/s real, 18447 c/s virtual
CPUs:   24      Raw:    18998 c/s real, 19016 c/s virtual
CPUs:   25      Raw:    18857 c/s real, 18859 c/s virtual
CPUs:   26      Raw:    19126 c/s real, 19126 c/s virtual
CPUs:   27      Raw:    19377 c/s real, 19327 c/s virtual
CPUs:   28      Raw:    19603 c/s real, 19566 c/s virtual
CPUs:   29      Raw:    19745 c/s real, 19730 c/s virtual
CPUs:   30      Raw:    19876 c/s real, 19783 c/s virtual
CPUs:   31      Raw:    20100 c/s real, 20023 c/s virtual
CPUs:   32      Raw:    20049 c/s real, 20048 c/s virtual
CPUs:   33      Raw:    20123 c/s real, 20154 c/s virtual
CPUs:   34      Raw:    19971 c/s real, 19995 c/s virtual
CPUs:   35      Raw:    20120 c/s real, 20148 c/s virtual
CPUs:   36      Raw:    20010 c/s real, 19988 c/s virtual
CPUs:   37      Raw:    20038 c/s real, 20049 c/s virtual
CPUs:   38      Raw:    19955 c/s real, 19955 c/s virtual
CPUs:   39      Raw:    19871 c/s real, 19795 c/s virtual
CPUs:   40      Raw:    19849 c/s real, 19852 c/s virtual
CPUs:   41      Raw:    19788 c/s real, 19809 c/s virtual
CPUs:   42      Raw:    19841 c/s real, 19828 c/s virtual
CPUs:   43      Raw:    20012 c/s real, 19974 c/s virtual
CPUs:   44      Raw:    19947 c/s real, 19952 c/s virtual
CPUs:   45      Raw:    20055 c/s real, 20019 c/s virtual
CPUs:   46      Raw:    19994 c/s real, 19973 c/s virtual
CPUs:   47      Raw:    20094 c/s real, 20091 c/s virtual
CPUs:   48      Raw:    20190 c/s real, 20148 c/s virtual
CPUs:   49      Raw:    20340 c/s real, 20360 c/s virtual
CPUs:   50      Raw:    20271 c/s real, 20224 c/s virtual
CPUs:   51      Raw:    20456 c/s real, 20455 c/s virtual
CPUs:   52      Raw:    20596 c/s real, 20592 c/s virtual
CPUs:   53      Raw:    20411 c/s real, 20385 c/s virtual
CPUs:   54      Raw:    20503 c/s real, 20491 c/s virtual
CPUs:   55      Raw:    20372 c/s real, 20355 c/s virtual
CPUs:   56      Raw:    20274 c/s real, 20299 c/s virtual
CPUs:   57      Raw:    20347 c/s real, 20278 c/s virtual
CPUs:   58      Raw:    20343 c/s real, 20297 c/s virtual
CPUs:   59      Raw:    20299 c/s real, 20324 c/s virtual
CPUs:   60      Raw:    20360 c/s real, 20313 c/s virtual
CPUs:   61      Raw:    20395 c/s real, 20313 c/s virtual
CPUs:   62      Raw:    20423 c/s real, 20353 c/s virtual
CPUs:   63      Raw:    20364 c/s real, 20393 c/s virtual
CPUs:   64      Raw:    20352 c/s real, 20431 c/s virtual

Notes:

  • MPICH2 was compiled with the following flags :

./configure --enable-fast --enable-timer-type=gethrtime --enable-cache --enable-cxx --disable-f77 --disable-f90 --enable-threads --with-device=ch3:sock --prefix=/usr/local/mpich2

  • Results above are for 32-bit compilation of libraries and executables, there was no performance benefit to be seen when recompiled as 64-bit.

Room for improvements:

  • Analyse MPI performance, read Jack Dongarra paper ‘Review of Performance Analysis Tools for MPI Parallel Programs‘.
  • Use the Oracle Message Passing Toolkit (formerly Sun HPC ClusterTools) instead of MPICH, as its an optimised implementation for Solaris and includes Dtrace hooks.
  • Use the new hydra process manager, which is the default from MPICH2 1.3 releases onwards, instead of mpd.
  • Experiment with mpiexec CPU and cache binding options to reduce contention of hardware resources.