Benchmarking: HPL on a GPU using CUDA
Jump to navigation
Jump to search
Source and Build Instructions PDF are located on PDD: HPC Benchmarking/Applications/hpl-cuda
PDD Link: <file>\\srv-vfs2\PDD_DATA\Product Development\High Performance Computing\HPC Benchmarking\Applications\hpl-cuda</file>
Build Source
- Built using:
- Platform mpi (/opt/platform_mpi)
- Intel MKL (/shared/intel/composer-2011, 12.0 compilers)
- CUDA 4.0 (/usr/local/cuda)
- Untar/gz, cd in to the directory and edit the Make.CUDA file
# TOPDir around line 103
ifndef TOPdir
TOPdir = /home/david/benchmarking/hpl-2.0_FERMI_v13
endif
# openmpi section
MPdir = /opt/platform_mpi/
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/linux_amd64/libmpi.so
# MKL LAdir/inc/lib
LAdir = /shared/intel/composerxe-2011/mkl/lib/intel64/
LAinc =
LAlib = -L $(TOPdir)/src/cuda -ldgemm -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5
# next two lines for Intel Compilers:
CC = mpicc
CCFLAGS = $(HPL_DEFS) -O3 -axS -w -fomit-frame-pointer -funroll-loops -openmp
# rest of the file should be ok straight from unzipping, build using makeBuild the binaries
make
# which will end up producing
[david@vhpchead hpl-2.0_FERMI_v13]$ find bin/
bin/
bin/CUDA
bin/CUDA/xhpl
bin/CUDA/HPL.dat
bin/CUDA/HPL.dat_example
bin/CUDA/run_linpack
bin/CUDA/output_example
bin/CUDA/._HPL.dat
bin/CUDA/._run_linpackEdit run_linpack script
- In bin/CUDA/run_linpack, check the following is set:
#!/bin/bash
#location of HPL
HPL_DIR=/home/david/benchmarking/hpl-2.0_FERMI_v13
# Number of CPU cores ( per GPU used = per MPI process )
CPU_CORES_PER_GPU=4
# FOR MKL
export MKL_NUM_THREADS=$CPU_CORES_PER_GPU
# FOR GOTO
export GOTO_NUM_THREADS=$CPU_CORES_PER_GPU
# FOR OMP
export OMP_NUM_THREADS=$CPU_CORES_PER_GPU
export MKL_DYNAMIC=FALSE
# hint: for 2050 or 2070 card
# try 350/(350 + MKL_NUM_THREADS*4*cpu frequency in GHz)
export CUDA_DGEMM_SPLIT=0.80
# hint: try CUDA_DGEMM_SPLIT - 0.10
export CUDA_DTRSM_SPLIT=0.70
export LD_LIBRARY_PATH=$HPL_DIR/src/cuda:$LD_LIBRARY_PATH
$HPL_DIR/bin/CUDA/xhplRun on a Single GPU
Results
- From a E5620 system with 2x M2075
# CPU_CORES_PER_GPU=8
# CUDA_DGEMM_SPLIT=0.80
# CUDA_DTRSM_SPLIT=0.70
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 108032 1024 1 2 1170.08 7.184e+02
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0041656 ...... PASSED
================================================================================