Benchmarking: HPL (High Performance Linpack)

HPL on Rocks 5.3

Build with OFED 1.5.1 on CentOS 5.4 with GotoBLAS2-1.13

Copy GotoBLAS2-1.13 and lapack-3.1.1.tar.gz from PDD:HPC Benchmarks/Applications/gotoblas
Copy hpl.tar.gz from PDD:HPC Benchmarks/Applications/hpl

Build GotoBLAS2

Note: make sure you have internet access as quickbuild.64bit will go to http://www.netlib.org/lapack/lapack-3.1.1.tgz to download lapack-3.1.1.tgz and compile it as part of the standard build.

If no internet access is available, make sure lapack-3.1.1.tgz is in the same directory where you compile GotoBLAS2

Build GOTOBLAS (this should result in a libgoto.a that you can link hpl against)

tar zxvf GotoBLAS-2.x.x.tar.gz
cd GotoBLAS2
./quickbuild.64bit
or
./quickbuild.64bit NUM_THREADS=1

I had to specify the target architecture for GotoBLAS to build without error (Note: the default Fortran compiler GotoBLAS looks for is g77). See 02QuickInstall.txt for more details.

make CC=gcc FC=gfortran USE_THREAD=1 TARGET=NEHALEM NUM_THREADS=8 libs

Build HPL with OpenMPI and GotoBLAS

tar zxvf hpl.tgz
cd hpl
cp setup/Make.Linux_PII_CBLAS Make.Linux

     <file>Make.Linux<file>
     ARCH         = Linux
     ..
     TOPdir       = $(HOME)/scratch/hpl
     ..
     MPdir        = /opt/ofed/1.5.1/mpi/gcc/openmpi-1.4.1
     MPinc        = -I$(MPdir)/include
     MPlib        = $(MPdir)/lib64/libmpi.so
     ..
     LAdir        = /home/viglen/scratch/GotoBLAS2
     LAinc        =
     LAlib        = $(LAdir)/libgoto2.a -lpthread -lm
     ..
     HPL_OPTS     =
     ..
     LINKER       = mpif90
</file>

make arch=Linux

Build using MKL

As above but change the following the in Make.Linux file

ARCH         = linux_64_mkl
..
MPdir        = /opt/mpich/intel/
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/libmpich.a
..
LAdir        = /opt/intel/ict/3.0/cmkl/9.0/lib/em64t
LAinc        =
LAlib        = -L$(LAdir) -lguide -lmkl_blacs -lmkl_em64t -lmkl_scalapack -lpthread
### Update for MKL on 11.1
LAlib        = $(LAdir)/libmkl_intel_ilp64.a $(LAdir)/libmkl_intel_thread.a $(LAdir)/libmkl_core.a /opt/intel/Compiler/11.1/073/lib/intel64/libiomp5.a -lpthread -lm

..
HPL_OPTS     = -DHPL_CALL_CBLAS
..
CC           = icc
LINKER       = icc  # could also be ifort

Build using Platform MPI and MKL

MPdir        = /opt/platform_mpi/
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/linux_amd64/libmpi.so

LAdir        = /opt/intel/Compiler/11.1/073/mkl/lib/em64t/
LAinc        = 
LAlib        = $(LAdir)/libmkl_core.so $(LAdir)/libmkl_sequential.so $(LAdir)/libmkl_intel_lp64.so -lm

HPL_OPTS     =

CC           = mpicc
CCNOOPT      = $(HPL_DEFS)
CCFLAGS      = $(HPL_DEFS) -O3 -xP -ip -fno-alias
# -xP deprecated in newer releases

LINKER       = mpicc

Build using Atlas

As above but change the following in the Make.Linux file

LAdir        = /usr/local/atlas/lib
LAinc        =
LAlib        = $(LAdir)/libcblas.a $(LAdir)/libatlas.a

Interlagos build using Open64, OpenMPI and ACML

OpenMPI built using the Open64 compilers

# open mpi
MPdir        = /home/dpower/sw/mpi/open64/mc-ompi/default
MPinc        = -I$(MPdir)/include
MPlib        = -L$(MPdir)/lib -lmpi
# acml 
LAdir        = /opt/amd/acml/latest/open64_64_fma4
LAinc        = -I$(LADir)/include
LAlib        = -L$(LAdir)/lib -lacml
# Last few bits
HPL_OPTS     =
CC           = mpicc
CCFLAGS      = $(HPL_DEFS) -O3 -march=bdver1 -DHPCC_FFT_235 -DHPCC_MEMALLCTR -DRA_SANDIA_OPT2 -DLONG_IS_64BITS
LINKER       = mpif90

Calculate N for HPL.dat

In this example, we are working at 80% memory usage (0.8)

N=`echo "sqrt ( ${NUM_NODES} * ${MEM_PER_NODE} * 0.8 * 100000000)" | bc`

Useful Online Tools

http://hpl-calculator.sourceforge.net/hpl-calculations.php

Script to calculate N

In the HPL.dat file, N is the size of matrix size. This should correspond to roughly 80% of the total memory (across all nodes use for the run). Here's a script to calculate this:

#!/bin/bash
 
if [ $# -ne 2 ]
then
        echo "";
        echo "Usage: $0 [Number of nodes] [Memory per node (Gb)]" >&2;
        echo "Example: $0 32 8";
        exit 1
fi
 
NUM_NODES=$1;
MEM_PER_NODE=$2;
 
echo -e "---------------";
echo -e "[\E[32mNodes\E[39m]: ${NUM_NODES} ";
echo -e "[\E[32mMemory\E[39m]: ${MEM_PER_NODE}Gb";
 
N=`echo "sqrt ( ${NUM_NODES} * ${MEM_PER_NODE} * 0.8 * 100000000)" | bc`
 
echo -e "---------------";
echo -e "[\E[32mN\E[39m]: ${N}";
echo -e "---------------";