Seattle: Building and running HPL
Pre-Requisits
The default O/S image provided by AMD is very minimal and doesn't include many packages at all. The following should be installed:
yum groupinstall 'Development Tools'
yum install tar bzip2 bc g++Install OpenMPI
Either install from the repos openmpi, build it manually of choose an alternative MPI.
yum install openmpi openmpi-develSet up user environment
# add to the end of your ~/.bashrc
export PATH=$PATH:/usr/lib64/openmpi/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib;/usr/include/openmpi-aarch64/Download & build ATLAS
Download ATLAS from: http://sourceforge.net/projects/math-atlas/files/Stable/3.10.1/atlas3.10.1.tar.bz2/download
Extract the contents of the tarball:
cd /root/scratch
wget http://downloads.sourceforge.net/project/math-atlas/Stable/3.10.1/atlas3.10.1.tar.bz2
tar jxvf atlas3.10.1.tar.bz2Build ATLAS. N.B. - This will take a long time. On our sample Seattle system this took in excess of 5 hours. It is worth running the build process in a screen session or with nohup.
cd /root/scratch
mkdir atlas3.10.1_build
cd atlas3.10.1_build/
/root/scratch/ATLAS/configure --prefix=/opt/atlas/3.10.1
make
make checkDownload & Build Linpack
Latest versions are available from: http://www.netlib.org/benchmark/hpl/
Download and decompress HPL:
cd /root/scratch
wget http://www.netlib.org/benchmark/hpl/hpl-2.1.tar.gz
tar zxvf hpl-2.1.tar.gzCreate the Makefile:
cd hpl-2.1
cp setup/Make.Linux_PII_CBLAS Make.Linux-aarch64Edit the following items (approximate line numbers shown):
---
64 ARCH = Linux_aarch64
---
70 TOPdir = $(HOME)/scratch/hpl-2.1
---
84 MPdir = /usr/lib64/openmpi
85 MPinc = -I/usr/include/openmpi-aarch64
86 MPlib = $(MPdir)/lib/libmpi.so
---
95 LAdir = /root/scratch/atlas3.10.1_build/lib
---
176 LINKER = /usr/lib64/openmpi/bin/mpif90Build:
cd /root/scratch/hpl-2.1
make arch=Linux_aarch64Test run using the default HPL.dat file (which is for 4 cores) to ensure the build works OK. The performance won't be great but HPL should run and complete within a few seconds:
cd bin/Linux_JB
mpirun -np 4 ./xhplEdit the HPL.dat to optimise
- Problem size (N): Your problem size should be the largest to fit in the memory to get best performance. Our sample system had a total of 16GB. There are 125M double precision elements per 1GB of memory so 16GB RAM results in 2 billion double precision elements. Square root of that number is 44721. You need to leave some memory for Operating System and other things. As a rule of thumb, 80% of the total memory will be a starting point for problem size - so, in this case, say, that is 35777. N / (P * Q) needs to be an integer, so 35328 is a reasonable number. N.B. If the problem size is too large, it is swapped out, and the performance will degrade.
- Block Size (NB): HPL uses the block size NB for the data distribution as well as for the computational granularity. A very small NB will limit computational performance because no data reuse will occur, and also the number of messages will also increase. "Good" block sizes are almost always in the [32 .. 256] interval and it depends on Cache size. These block size are found to be good, 80-216 for IA32; 128-192 for IA64 3M cache; 400 for 4M cache for IA64 and 130 for Woodcrests.
- Process Grid Ratio (PXQ): This depends on physical interconnection network. P and Q should be approximately equal, with Q slightly larger than P. P * Q should equal the number of available cores. Our sample system had a 6-core CPU to P=2 & Q=3
These numbers can be auto generated by a number of online tools, for example: http://www.advancedclustering.com/act-kb/tune-hpl-dat-file/
Example of HPL.dat file from initial testing:
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
35328 Ns
1 # of NBs
128 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
2 Ps
3 Qs
16.0 threshold
1 # of panel fact
2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)