Benchmarking: HPL with MPICH2 and Vanilla Raspbian (Raspberry Pi)
Installing dependencies
HPL has a few software dependencies that have to be satisfied before it can be installed. They are:
- gfortran - fortran program compiler
- MPICH2 - an implementation of MPI
- mpich2-dev - development tools
- BLAS - Basic Linear Algebra Subprograms
To install other dependencies and packages, use the following command:
sudo apt-get install mpich2 libatlas-base-dev libmpich2-dev gfortranDownload HPL and set it up
Download the HPL package from [[1]].
wget http://www.netlib.org/benchmark/hpl/hpl-2.1.tar.gzThe next thing to do is extract the tar file and create a makefile based on the given template. Open the terminal and change the directory to where the downloaded HPL tar file is stored. Execute the following set of commands one after another.
tar xf hpl-2.1.tar.gz
cd hpl-2.1/setup
sh make_generic
cd ..
cp setup/Make.UNKNOWN Make.piThe last command copies the contents of Make.UNKNOWN to Make.rpi . We do this is because, the make file contains all the configuration details of the system ( The raspberry pi) and also the details various libraries such as mpich2, atlas/blas packages, home directory, etc. In the next step, we make changes to the Make.rpi file.
Adjust the Make.pi file
This is an important step. Changes shown below vary according to your system. Here I show it with respect to my system. Please note that the following changes have parameters shown which are spread throughout the Make.rpi file. So I suggest you to find each parameter and replace or add the changes and only then continue to the next parameter.
ARCH = rpi
TOPdir = $(HOME)/hpl-2.1
MPdir = /usr/local/mpich2
MPinc = -I $(MPdir)/include
MPlib = $(MPdir)/lib/libmpich.a
LAdir = /usr/lib/atlas-base/
LAlib = $(LAdir)/libf77blas.a $(LAdir)/libatlas.aCompiling the HPL
Once the Make file is ready, we can start with the compilation of the HPL. The ".xhpl" file will be present in the "bin/rpi" folder within the HPL folder. Run the following command:
make arch=rpiCreating the HPL input file
The following is an example of the "HPL.dat" file. This is the input file for HPL when it is run. The values provided in this file is used to generate and compute the problem. You can use this file directly to run tests for a single node. Create a file within the "bin/rpi" folder and name it "HPL.dat". copy the contents below into that file.
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
5040 Ns
1 # of NBs
128 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
2 Ps
2 Qs
16.0 threshold
1 # of panel fact
2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)Running HPL on single node
Once the HPL.dat file is ready, we can run the HPL. The HPL.dat file above is for a single node or processor.
cd bin/rpi
mpirun.mpich2 -np 4 ./xhpl
The output looks something similar to what is shown below:
================================================================================
HPLinpack 2.1 -- High-Performance Linpack benchmark -- October 26, 2012
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 5040
NB : 128
PMAP : Row-major process mapping
P : 2
Q : 2
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0