Difference between revisions of "Benchmarking: Stream (Memory Bandwidth)"

Revision as of 21:55, 21 July 2013

STREAM: The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels.

Note: Ensure power saving features are disabled, we need max clock speed to prevent fluctuations in performance:
- /etc/init.d/cpuspeed stop

Get the source

Main STREAM website: http://www.cs.virginia.edu/stream/
Pull the latest copy of STREAM from:

  # (v 5.10 at the time of edit)
  wget http://www.cs.virginia.edu/stream/FTP/Code/stream.c

Compile

Can use either Intel or GCC to build/compile
Ensure you build for multi-threaded runs (-fopenmp (gcc) -openmp (icc)
For large array sizes, include -mcmodel=medium
Noticed best performance using Intel ICC

Intel

  icc -O3 -static -openmp stream.c -o stream_icc

GCC

GCC typically gave the worst performance in the limited tests we performed (probably better optimisation flags required, but not well documented for STREAM)

  gcc -O3 -fopenmp stream.c -o stream_gcc

Open 64

Below are optimisations for the AMD 6300 Arch

  opencc -march=bdver1 -mp -Ofast -LNO:simd=2 -WOPT:sib=on  \
     -LNO:prefetch=2:pf2=0 -CG:use_prefetchnta=on -LNO:prefetch_ahead=4 -DSTREAM_ARRAY_SIZE=30000000 \
     -DNTIMES=30 -DOFFSET=1840 stream.c -o stream_occ

Run

Vary the number of threads used by using: export OMP_NUM_THREADS=32

Intel

  export OMP_NUM_THREADS=16
  export KMP_AFFINITY=compact
  ./stream_icc

GCC

  export GOMP_CPU_AFFINITY="0 1 2 ..."
  ./stream_gcc

Open64

Below is the recommend best for AMD 6300 arch
Peak memory bandwidth is achieved when STREAM is run on three cores of each NUMA node. For example, the following run shows that the same system is capable of achieving STREAM 5% better than when using all cores.

  # assuming 32 core system
  export O64_OMP_AFFINITY=”TRUE”
  export O64_OMP_AFFINITY_MAP=”2,4,6,10,12,14,18,20,22,26,28,30”
  export OMP_NUM_THREADS=12
  ./stream

Results

@@ Line 30: / Line 30: @@
 </syntaxhighlight>
-== Open 64 ==
+=== Open 64 ===
 * Below are optimisations for the AMD 6300 Arch
 <syntaxhighlight>

Difference between revisions of "Benchmarking: Stream (Memory Bandwidth)"

Revision as of 21:55, 21 July 2013

Contents

Get the source

Compile

Intel

GCC

Open 64

Run

Intel

GCC

Open64

Results

Navigation menu

Search