Difference between revisions of "Benchmarking: Stream (Memory Bandwidth)"
Jump to navigation
Jump to search
| Line 1: | Line 1: | ||
STREAM: The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels. | STREAM: The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels. | ||
| + | |||
| + | * '''Note''': Ensure power saving features are disabled, we need max clock speed to prevent fluctuations in performance: | ||
| + | ** <tt>/etc/init.d/cpuspeed stop</tt> | ||
== Get the source == | == Get the source == | ||
| Line 11: | Line 14: | ||
== Compile == | == Compile == | ||
| − | Can use either Intel or GCC to build/compile | + | * Can use either Intel or GCC to build/compile |
| + | * Ensure you build for multi-threaded runs (<tt>-fopenmp (gcc) -openmp (icc)</tt> | ||
| + | * For large array sizes, include <tt>-mcmodel=medium<tt> | ||
| + | * Noticed best performance using Intel ICC | ||
=== Intel === | === Intel === | ||
| + | <syntaxhighlight> | ||
| + | |||
| + | </syntaxhighlight> | ||
=== GCC === | === GCC === | ||
| + | <syntaxhighlight> | ||
| + | |||
| + | </syntaxhighlight> | ||
| + | == Open 64 == | ||
| + | * Best to optimising on AMD arch | ||
| + | <syntaxhighlight> | ||
| + | /shared/apps/open64-5.0/bin/opencc -march=bdver1 -mp -Ofast -LNO:simd=2 -WOPT:sib=on \ | ||
| + | -LNO:prefetch=2:pf2=0 -CG:use_prefetchnta=on -LNO:prefetch_ahead=4 -DSTREAM_ARRAY_SIZE=30000000 \ | ||
| + | -DNTIMES=30 -DOFFSET=1840 stream.c -o stream_occ | ||
| + | </syntaxhighlight> | ||
== Run == | == Run == | ||
| + | * Vary the number of threads used by using: <tt>export OMP_NUM_THREADS=32</tt> | ||
| + | |||
== Results == | == Results == | ||
Revision as of 21:34, 21 July 2013
STREAM: The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels.
- Note: Ensure power saving features are disabled, we need max clock speed to prevent fluctuations in performance:
- /etc/init.d/cpuspeed stop
Get the source
- Main STREAM website: http://www.cs.virginia.edu/stream/
- Pull the latest copy of STREAM from:
# (v 5.10 at the time of edit)
wget http://www.cs.virginia.edu/stream/FTP/Code/stream.cCompile
- Can use either Intel or GCC to build/compile
- Ensure you build for multi-threaded runs (-fopenmp (gcc) -openmp (icc)
- For large array sizes, include -mcmodel=medium
- Noticed best performance using Intel ICC
Intel
GCC
Open 64
- Best to optimising on AMD arch
/shared/apps/open64-5.0/bin/opencc -march=bdver1 -mp -Ofast -LNO:simd=2 -WOPT:sib=on \
-LNO:prefetch=2:pf2=0 -CG:use_prefetchnta=on -LNO:prefetch_ahead=4 -DSTREAM_ARRAY_SIZE=30000000 \
-DNTIMES=30 -DOFFSET=1840 stream.c -o stream_occRun
- Vary the number of threads used by using: export OMP_NUM_THREADS=32