Results:HPL GPU
From Define Wiki
Jump to navigation
Jump to search
Results for HPL on GPU
- CentOS 7.1 - Kernel 3.10.0-229.20.1.el7.x86_64
- CUDA-7.5
- OpenMPI 1.8.5
- Intel Compiler and MKL 2013_3.174
Some observations from runs with 128GB RAM:- A larger N value allows for a larger DGEMM_SPLIT value.
- Performance suffers significantly when using large N values 100k+ with low core count's per GPU (<3).
- In some instances oversubscribing cores per GPU will offer a performance boost.
- The N values for the 128GB runs were the maximum obtainable for that particular GPU configuration
| HPL Results (Nvidia Compiled Binary) from Nvidia Tesla K Series Cards on Single Dual Socket Systems
|
| GPU |
Qty |
System |
CPU |
Freq |
Cores |
Memory (GB) |
N |
NB |
Result TFlop/s |
DGEMM Split
|
| K80m |
1 |
1028QG-TRT |
E5-2650v3 |
2.5GHz |
10 |
64GB |
51072 |
896 |
1.753 TFlop/s |
0.85
|
| K80m |
2 |
1028QG-TRT |
E5-2650v3 |
2.5GHz |
10 |
64GB |
79744 |
896 |
3.663 TFlop/s |
0.85
|
| K80m |
3 |
1028QG-TRT |
E5-2650v3 |
2.5GHz |
10 |
64GB |
79744 |
896 |
5.129 TFlop/s |
0.85
|
| K80m |
4 |
1028QG-TRT |
E5-2650v3 |
2.5GHz |
10 |
64GB |
79744 |
896 |
6.154 TFlop/s |
0.85
|
|
|
| K80m |
1 |
1028QG-TRT |
E5-2650v3 |
2.5GHz |
10 |
128GB |
51968 |
896 |
1.721 TFlop/s |
0.90
|
| K80m |
2 |
1028QG-TRT |
E5-2650v3 |
2.5GHz |
10 |
128GB |
116480 |
896 |
3.945 TFlop/s |
0.90
|
| K80m |
3 |
1028QG-TRT |
E5-2650v3 |
2.5GHz |
10 |
128GB |
91398 |
896 |
4.855 TFlop/s |
0.85
|
| K80m |
4 |
1028QG-TRT |
E5-2650v3 |
2.5GHz |
10 |
128GB |
102144 |
896 |
6.557 TFlop/s |
0.90
|
|
|
| K80m |
1 |
1028QG-TRT |
E5-2698v3 |
2.3GHz |
16 |
128GB |
|
896 |
TFlop/s |
|
| K80m |
2 |
1028QG-TRT |
E5-2698v3 |
2.3GHz |
16 |
128GB |
|
896 |
TFlop/s |
|
| K80m |
3 |
1028QG-TRT |
E5-2698v3 |
2.3GHz |
16 |
128GB |
|
896 |
TFlop/s |
|
| K80m |
4 |
1028QG-TRT |
E5-2698v3 |
2.3GHz |
16 |
128GB |
|
896 |
TFlop/s |
|