Difference between revisions of "Benchmarking: Stream (Memory Bandwidth)"

From Define Wiki
Jump to navigation Jump to search
Line 1: Line 1:
 
STREAM: The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels.  
 
STREAM: The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels.  
 +
 +
* '''Note''': Ensure power saving features are disabled, we need max clock speed to prevent fluctuations in performance:
 +
** <tt>/etc/init.d/cpuspeed stop</tt>
  
 
== Get the source ==
 
== Get the source ==
Line 11: Line 14:
  
 
== Compile ==
 
== Compile ==
Can use either Intel or GCC to build/compile
+
* Can use either Intel or GCC to build/compile
 +
* Ensure you build for multi-threaded runs (<tt>-fopenmp (gcc) -openmp (icc)</tt>
 +
* For large array sizes, include <tt>-mcmodel=medium<tt>
 +
* Noticed best performance using Intel ICC
  
 
=== Intel ===
 
=== Intel ===
 +
<syntaxhighlight>
 +
 +
</syntaxhighlight>
  
 
=== GCC ===
 
=== GCC ===
 +
<syntaxhighlight>
 +
 +
</syntaxhighlight>
  
 +
== Open 64 ==
 +
* Best to optimising on AMD arch
 +
<syntaxhighlight>
 +
/shared/apps/open64-5.0/bin/opencc -march=bdver1 -mp -Ofast -LNO:simd=2 -WOPT:sib=on  \
 +
    -LNO:prefetch=2:pf2=0 -CG:use_prefetchnta=on -LNO:prefetch_ahead=4 -DSTREAM_ARRAY_SIZE=30000000 \
 +
    -DNTIMES=30 -DOFFSET=1840 stream.c -o stream_occ
 +
</syntaxhighlight>
 
== Run ==
 
== Run ==
 +
* Vary the number of threads used by using: <tt>export OMP_NUM_THREADS=32</tt>
 +
  
 
== Results ==
 
== Results ==

Revision as of 21:34, 21 July 2013

STREAM: The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels.

  • Note: Ensure power saving features are disabled, we need max clock speed to prevent fluctuations in performance:
    • /etc/init.d/cpuspeed stop

Get the source

  # (v 5.10 at the time of edit)
  wget http://www.cs.virginia.edu/stream/FTP/Code/stream.c

Compile

  • Can use either Intel or GCC to build/compile
  • Ensure you build for multi-threaded runs (-fopenmp (gcc) -openmp (icc)
  • For large array sizes, include -mcmodel=medium
  • Noticed best performance using Intel ICC

Intel

GCC

Open 64

  • Best to optimising on AMD arch
/shared/apps/open64-5.0/bin/opencc -march=bdver1 -mp -Ofast -LNO:simd=2 -WOPT:sib=on  \
     -LNO:prefetch=2:pf2=0 -CG:use_prefetchnta=on -LNO:prefetch_ahead=4 -DSTREAM_ARRAY_SIZE=30000000 \
     -DNTIMES=30 -DOFFSET=1840 stream.c -o stream_occ

Run

  • Vary the number of threads used by using: export OMP_NUM_THREADS=32


Results