Home > Uncategorized > MPI and Solaris on Coolthreads

MPI and Solaris on Coolthreads

Coolthreads. Interesting boxes. What follows is a log of my experiments with MPI (Message Passing Interface)  as a way to exploit the large number of worker threads present on T-series Sun hardware.

With the workload below coordinated by MPICH2,  we see near linear performance increases from 1 to 16 threads and then a very shallow increase as we approach 32 threads, at which point there is no further improvement.

Could this performance ceiling be due to memory bandwidth limits or cache contention perhaps?

While 0 > x <= 64, mpirun -np $x ./john --test --format=MD5

CPUs:   1       Raw:    1149 c/s real, 1149 c/s virtual
CPUs:   2       Raw:    2298 c/s real, 2298 c/s virtual
CPUs:   3       Raw:    3447 c/s real, 3447 c/s virtual
CPUs:   4       Raw:    4596 c/s real, 4596 c/s virtual
CPUs:   5       Raw:    5739 c/s real, 5733 c/s virtual
CPUs:   6       Raw:    6882 c/s real, 6894 c/s virtual
CPUs:   7       Raw:    8043 c/s real, 8043 c/s virtual
CPUs:   8       Raw:    9192 c/s real, 9186 c/s virtual
CPUs:   9       Raw:    10303 c/s real, 10297 c/s virtual
CPUs:   10      Raw:    11438 c/s real, 11402 c/s virtual
CPUs:   11      Raw:    11941 c/s real, 11921 c/s virtual
CPUs:   12      Raw:    13606 c/s real, 13598 c/s virtual
CPUs:   13      Raw:    14741 c/s real, 14784 c/s virtual
CPUs:   14      Raw:    15822 c/s real, 15840 c/s virtual
CPUs:   15      Raw:    16298 c/s real, 16298 c/s virtual
CPUs:   16      Raw:    16815 c/s real, 16817 c/s virtual
CPUs:   17      Raw:    17094 c/s real, 17105 c/s virtual
CPUs:   18      Raw:    16467 c/s real, 16514 c/s virtual
CPUs:   19      Raw:    18438 c/s real, 18438 c/s virtual
CPUs:   20      Raw:    18449 c/s real, 18464 c/s virtual
CPUs:   21      Raw:    18516 c/s real, 18507 c/s virtual
CPUs:   22      Raw:    18638 c/s real, 18619 c/s virtual
CPUs:   23      Raw:    18433 c/s real, 18447 c/s virtual
CPUs:   24      Raw:    18998 c/s real, 19016 c/s virtual
CPUs:   25      Raw:    18857 c/s real, 18859 c/s virtual
CPUs:   26      Raw:    19126 c/s real, 19126 c/s virtual
CPUs:   27      Raw:    19377 c/s real, 19327 c/s virtual
CPUs:   28      Raw:    19603 c/s real, 19566 c/s virtual
CPUs:   29      Raw:    19745 c/s real, 19730 c/s virtual
CPUs:   30      Raw:    19876 c/s real, 19783 c/s virtual
CPUs:   31      Raw:    20100 c/s real, 20023 c/s virtual
CPUs:   32      Raw:    20049 c/s real, 20048 c/s virtual
CPUs:   33      Raw:    20123 c/s real, 20154 c/s virtual
CPUs:   34      Raw:    19971 c/s real, 19995 c/s virtual
CPUs:   35      Raw:    20120 c/s real, 20148 c/s virtual
CPUs:   36      Raw:    20010 c/s real, 19988 c/s virtual
CPUs:   37      Raw:    20038 c/s real, 20049 c/s virtual
CPUs:   38      Raw:    19955 c/s real, 19955 c/s virtual
CPUs:   39      Raw:    19871 c/s real, 19795 c/s virtual
CPUs:   40      Raw:    19849 c/s real, 19852 c/s virtual
CPUs:   41      Raw:    19788 c/s real, 19809 c/s virtual
CPUs:   42      Raw:    19841 c/s real, 19828 c/s virtual
CPUs:   43      Raw:    20012 c/s real, 19974 c/s virtual
CPUs:   44      Raw:    19947 c/s real, 19952 c/s virtual
CPUs:   45      Raw:    20055 c/s real, 20019 c/s virtual
CPUs:   46      Raw:    19994 c/s real, 19973 c/s virtual
CPUs:   47      Raw:    20094 c/s real, 20091 c/s virtual
CPUs:   48      Raw:    20190 c/s real, 20148 c/s virtual
CPUs:   49      Raw:    20340 c/s real, 20360 c/s virtual
CPUs:   50      Raw:    20271 c/s real, 20224 c/s virtual
CPUs:   51      Raw:    20456 c/s real, 20455 c/s virtual
CPUs:   52      Raw:    20596 c/s real, 20592 c/s virtual
CPUs:   53      Raw:    20411 c/s real, 20385 c/s virtual
CPUs:   54      Raw:    20503 c/s real, 20491 c/s virtual
CPUs:   55      Raw:    20372 c/s real, 20355 c/s virtual
CPUs:   56      Raw:    20274 c/s real, 20299 c/s virtual
CPUs:   57      Raw:    20347 c/s real, 20278 c/s virtual
CPUs:   58      Raw:    20343 c/s real, 20297 c/s virtual
CPUs:   59      Raw:    20299 c/s real, 20324 c/s virtual
CPUs:   60      Raw:    20360 c/s real, 20313 c/s virtual
CPUs:   61      Raw:    20395 c/s real, 20313 c/s virtual
CPUs:   62      Raw:    20423 c/s real, 20353 c/s virtual
CPUs:   63      Raw:    20364 c/s real, 20393 c/s virtual
CPUs:   64      Raw:    20352 c/s real, 20431 c/s virtual

Notes:

  • MPICH2 was compiled with the following flags :

./configure --enable-fast --enable-timer-type=gethrtime --enable-cache --enable-cxx --disable-f77 --disable-f90 --enable-threads --with-device=ch3:sock --prefix=/usr/local/mpich2

  • Results above are for 32-bit compilation of libraries and executables, there was no performance benefit to be seen when recompiled as 64-bit.

Room for improvements:

  • Analyse MPI performance, read Jack Dongarra paper ‘Review of Performance Analysis Tools for MPI Parallel Programs‘.
  • Use the Oracle Message Passing Toolkit (formerly Sun HPC ClusterTools) instead of MPICH, as its an optimised implementation for Solaris and includes Dtrace hooks.
  • Use the new hydra process manager, which is the default from MPICH2 1.3 releases onwards, instead of mpd.
  • Experiment with mpiexec CPU and cache binding options to reduce contention of hardware resources.
Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: