Difference between revisions of "Results:HPL GPU"

From Define Wiki
Jump to navigation Jump to search
Line 22: Line 22:
 
! colspan="11" |
 
! colspan="11" |
 
|-
 
|-
| scope="row" | K80m || 1 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 128GB || || 896 || TFlop/s || 0.90
+
| scope="row" | K80m || 1 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 128GB || 51968 || 896 || 1.721 TFlop/s || 0.90
 
|-
 
|-
 
| scope="row" | K80m || 2 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 128GB || 116480 || 896 || 3.945 TFlop/s || 0.90
 
| scope="row" | K80m || 2 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 128GB || 116480 || 896 || 3.945 TFlop/s || 0.90
Line 30: Line 30:
 
| scope="row" | K80m || 4 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 128GB || 102144 || 896 || 6.491 TFlop/s || 0.90
 
| scope="row" | K80m || 4 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 128GB || 102144 || 896 || 6.491 TFlop/s || 0.90
 
|-
 
|-
 +
 +
'''Some observations from runs with 128GB RAM:'''
 +
* A larger N value allows for a larger DGEMM_SPLIT value.
 +
* Performance suffers significantly when using large N values 100k+ with low core count's per GPU (<3).
 +
* In some instances oversubscribing cores per GPU will offer a performance boost.
 +
 
! colspan="11" |
 
! colspan="11" |
 
|-
 
|-

Revision as of 20:59, 24 November 2015

Results for HPL on GPU

  • CentOS 7.1 - Kernel 3.10.0-229.20.1.el7.x86_64
  • CUDA-7.5
  • OpenMPI 1.8.5
  • Intel Compiler and MKL 2013_3.174
Some observations from runs with 128GB RAM:
  • A larger N value allows for a larger DGEMM_SPLIT value.
  • Performance suffers significantly when using large N values 100k+ with low core count's per GPU (<3).
  • In some instances oversubscribing cores per GPU will offer a performance boost.
HPL Results (Nvidia Compiled Binary) from Nvidia Tesla K Series Cards on Single Dual Socket Systems
GPU Qty System CPU Freq Cores Memory (GB) N NB Result TFlop/s DGEMM Split
K80m 1 1028QG-TRT E5-2650v3 2.5GHz 10 64GB 51072 896 1.753 TFlop/s 0.85
K80m 2 1028QG-TRT E5-2650v3 2.5GHz 10 64GB 79744 896 3.663 TFlop/s 0.85
K80m 3 1028QG-TRT E5-2650v3 2.5GHz 10 64GB 79744 896 5.129 TFlop/s 0.85
K80m 4 1028QG-TRT E5-2650v3 2.5GHz 10 64GB 79744 896 6.154 TFlop/s 0.85
K80m 1 1028QG-TRT E5-2650v3 2.5GHz 10 128GB 51968 896 1.721 TFlop/s 0.90
K80m 2 1028QG-TRT E5-2650v3 2.5GHz 10 128GB 116480 896 3.945 TFlop/s 0.90
K80m 3 1028QG-TRT E5-2650v3 2.5GHz 10 128GB 91398 896 4.855 TFlop/s 0.85
K80m 4 1028QG-TRT E5-2650v3 2.5GHz 10 128GB 102144 896 6.491 TFlop/s 0.90
K80m 1 1028QG-TRT E5-2698v3 2.3GHz 16 128GB 896 TFlop/s
K80m 2 1028QG-TRT E5-2698v3 2.3GHz 16 128GB 896 TFlop/s
K80m 3 1028QG-TRT E5-2698v3 2.3GHz 16 128GB 896 TFlop/s
K80m 4 1028QG-TRT E5-2698v3 2.3GHz 16 128GB 896 TFlop/s