Benchmarking: HPL (optimised) with Intel MKL and Intel MPI

From Define Wiki
Revision as of 15:04, 6 January 2014 by David (talk | contribs) (Created page with "== Getting the software == * Intel provide prebuilt binaries for use with Intel MPI and MKL. * The latest version can be downloaded from: == Parameter optimisation == Main ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Getting the software

  • Intel provide prebuilt binaries for use with Intel MPI and MKL.
  • The latest version can be downloaded from:

Parameter optimisation

Main parameters you need to consider while running HPL:

  • Problem size (N): Your problem size should be the largest to fit in the memory to get best performance. For e.g.: If you have 10 nodes with 1 GB RAM, total memory is 10GB. i.e. nearly 1342 M double precision elements. Square root of that number is 36635. You need to leave some memory for Operating System and other things. As a rule of thumb, 80% of the total memory will be a starting point for problem size (So, in this case, say, 33000). If the problem size is too large, it is swapped out, and the performance will degrade.
  • Block Size (NB): HPL uses the block size NB for the data distribution as well as for the computational granularity. A very small NB will limit computational performance because no data reuse will occur, and also the number of messages will also increase. "Good" block sizes are almost always in the [32 .. 256] interval and it depends on Cache size. These block size are found to be good, 80-216 for IA32; 128-192 for IA64 3M cache; 400 for 4M cache for IA64 and 130 for Woodcrests.
  • Process Grid Ratio (PXQ): This depends on physical interconnection network. P and Q should be approximately equal, with Q slightly larger than P. For e.g. for a 480 processor cluster, 20X24 will be a good ratio.

Tips: You can also try changing the node-order in the machine file for check the performance improvement. Choose all the above parameters by trial and error to get the best performance.

You can also use a simple PHP web tool to enter you system specs and it will suggest for you optimal input parameters for your HPL file before running the benchmark on the cluster. The tool can be accessed via the URL below under sourceforge:

http://hpl‐calculator.sourceforge.net