Difference between revisions of "Results:HPL GPU"
Jump to navigation
Jump to search
| (16 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
== Results for HPL on GPU == | == Results for HPL on GPU == | ||
| + | |||
| + | * CentOS 7.1 - Kernel 3.10.0-229.20.1.el7.x86_64 | ||
| + | * CUDA-7.5 | ||
| + | * OpenMPI 1.8.5 | ||
| + | * Intel Compiler and MKL 2013_3.174 | ||
| + | |||
{| class="wikitable" style="text-align:center; width:100%; " | {| class="wikitable" style="text-align:center; width:100%; " | ||
|- | |- | ||
| − | ! colspan="11" | | + | ! colspan="11" | HPL Results (Nvidia Compiled Binary) from Nvidia Tesla K Series Cards on Single Dual Socket Systems |
|- | |- | ||
| − | ! GPU || Qty || System || CPU || Freq || Cores || Memory (GB)|| N || NB || Result TFlop/s || | + | ! GPU || Qty || System || CPU || Freq || Cores || Memory (GB)|| N || NB || Result TFlop/s || DGEMM Split |
|- | |- | ||
| − | | scope="row" | K80m || 1 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 64GB || 51072 || 896 || 1.753 TFlop/s || | + | | scope="row" | K80m || 1 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 64GB || 51072 || 896 || 1.753 TFlop/s || 0.85 |
|- | |- | ||
| − | | scope="row" | K80m || 2 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 64GB || 79744 || 896 || 3.663 TFlop/s || | + | | scope="row" | K80m || 2 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 64GB || 79744 || 896 || 3.663 TFlop/s || 0.85 |
|- | |- | ||
| − | | scope="row" | K80m || 3 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 64GB || 79744 || 896 || 5.129 TFlop/s || | + | | scope="row" | K80m || 3 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 64GB || 79744 || 896 || 5.129 TFlop/s || 0.85 |
|- | |- | ||
| − | | scope="row" | K80m || 4 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 64GB || 79744 || 896 || 6. | + | | scope="row" | K80m || 4 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 64GB || 79744 || 896 || 6.154 TFlop/s || 0.85 |
| + | |- | ||
| + | ! colspan="11" | | ||
| + | |- | ||
| + | | scope="row" | K80m || 1 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 128GB || 51968 || 896 || 1.721 TFlop/s || 0.90 | ||
| + | |- | ||
| + | | scope="row" | K80m || 2 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 128GB || 116480 || 896 || 3.945 TFlop/s || 0.90 | ||
| + | |- | ||
| + | | scope="row" | K80m || 3 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 128GB || 91398 || 896 || 4.855 TFlop/s || 0.85 | ||
| + | |- | ||
| + | | scope="row" | K80m || 4 || 1028QG-TRT || E5-2650v3 || 2.5GHz || 10 || 128GB || 102144 || 896 || 6.557 TFlop/s || 0.90 | ||
| + | |- | ||
| + | |||
| + | '''Some observations from runs with 128GB RAM:''' | ||
| + | * A larger N value allows for a larger DGEMM_SPLIT value. | ||
| + | * Performance suffers significantly when using large N values 100k+ with low core count's per GPU (<3). | ||
| + | * In some instances oversubscribing cores per GPU will offer a performance boost. | ||
| + | * The N values for the 128GB runs were the maximum obtainable for that particular GPU configuration | ||
| + | |||
| + | ! colspan="11" | | ||
| + | |- | ||
| + | | scope="row" | K80m || 1 || 1028QG-TRT || E5-2698v3 || 2.3GHz || 16 || 128GB || || 896 || TFlop/s || | ||
| + | |- | ||
| + | | scope="row" | K80m || 2 || 1028QG-TRT || E5-2698v3 || 2.3GHz || 16 || 128GB || || 896 || TFlop/s || | ||
| + | |- | ||
| + | | scope="row" | K80m || 3 || 1028QG-TRT || E5-2698v3 || 2.3GHz || 16 || 128GB || || 896 || TFlop/s || | ||
| + | |- | ||
| + | | scope="row" | K80m || 4 || 1028QG-TRT || E5-2698v3 || 2.3GHz || 16 || 128GB || || 896 || TFlop/s || | ||
|- | |- | ||
|} | |} | ||
Latest revision as of 23:01, 24 November 2015
Results for HPL on GPU
- CentOS 7.1 - Kernel 3.10.0-229.20.1.el7.x86_64
- CUDA-7.5
- OpenMPI 1.8.5
- Intel Compiler and MKL 2013_3.174
- A larger N value allows for a larger DGEMM_SPLIT value.
- Performance suffers significantly when using large N values 100k+ with low core count's per GPU (<3).
- In some instances oversubscribing cores per GPU will offer a performance boost.
- The N values for the 128GB runs were the maximum obtainable for that particular GPU configuration
| HPL Results (Nvidia Compiled Binary) from Nvidia Tesla K Series Cards on Single Dual Socket Systems | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GPU | Qty | System | CPU | Freq | Cores | Memory (GB) | N | NB | Result TFlop/s | DGEMM Split |
| K80m | 1 | 1028QG-TRT | E5-2650v3 | 2.5GHz | 10 | 64GB | 51072 | 896 | 1.753 TFlop/s | 0.85 |
| K80m | 2 | 1028QG-TRT | E5-2650v3 | 2.5GHz | 10 | 64GB | 79744 | 896 | 3.663 TFlop/s | 0.85 |
| K80m | 3 | 1028QG-TRT | E5-2650v3 | 2.5GHz | 10 | 64GB | 79744 | 896 | 5.129 TFlop/s | 0.85 |
| K80m | 4 | 1028QG-TRT | E5-2650v3 | 2.5GHz | 10 | 64GB | 79744 | 896 | 6.154 TFlop/s | 0.85 |
| K80m | 1 | 1028QG-TRT | E5-2650v3 | 2.5GHz | 10 | 128GB | 51968 | 896 | 1.721 TFlop/s | 0.90 |
| K80m | 2 | 1028QG-TRT | E5-2650v3 | 2.5GHz | 10 | 128GB | 116480 | 896 | 3.945 TFlop/s | 0.90 |
| K80m | 3 | 1028QG-TRT | E5-2650v3 | 2.5GHz | 10 | 128GB | 91398 | 896 | 4.855 TFlop/s | 0.85 |
| K80m | 4 | 1028QG-TRT | E5-2650v3 | 2.5GHz | 10 | 128GB | 102144 | 896 | 6.557 TFlop/s | 0.90 |
| K80m | 1 | 1028QG-TRT | E5-2698v3 | 2.3GHz | 16 | 128GB | 896 | TFlop/s | ||
| K80m | 2 | 1028QG-TRT | E5-2698v3 | 2.3GHz | 16 | 128GB | 896 | TFlop/s | ||
| K80m | 3 | 1028QG-TRT | E5-2698v3 | 2.3GHz | 16 | 128GB | 896 | TFlop/s | ||
| K80m | 4 | 1028QG-TRT | E5-2698v3 | 2.3GHz | 16 | 128GB | 896 | TFlop/s | ||