Difference between revisions of "Linux: Checking the Infiniband Fabric"
| (One intermediate revision by one other user not shown) | |||
| Line 92: | Line 92: | ||
== QDR MPI Performance Figures == | == QDR MPI Performance Figures == | ||
| + | Benchmark results for IMB (Intel MPI Benchmark) on QDR 40Gb | ||
| + | |||
| + | |||
Here are some figures for QDR IB (FDR cards, running at QDR speed because of switch) | Here are some figures for QDR IB (FDR cards, running at QDR speed because of switch) | ||
<syntaxhighlight> | <syntaxhighlight> | ||
| Line 127: | Line 130: | ||
== FDR MPI Performance Figures using fibre cable == | == FDR MPI Performance Figures using fibre cable == | ||
| + | |||
| + | Done attaching two systems directly as the switch didn't work with the cable. | ||
<syntaxhighlight> | <syntaxhighlight> | ||
| Line 160: | Line 165: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| − | |||
== FDR MPI Performance Figure == | == FDR MPI Performance Figure == | ||
Latest revision as of 11:23, 16 May 2017
- Assuming Platform Cluster Manager 3.2 is installed, otherwise please make sure the latest version of OFED which will include openmpi
Check the IB links
Use a command called ibstatus to the current state of the IB link
david@compute000 imb]$ ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:0030:48ff:ffff:e57d
base lid: 0x0
sm lid: 0x0
state: 2: INIT
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
link_layer: InfiniBandIn this instance we can see that the state: is only in an INIT stage. This typically means that the IB link is having trouble with the subnet manager.
Warning form mpirun?
This will result in warnings where running MPI performance tests (check the output from openmpi mpirun for clues:
WARNING: There is at least one OpenFabrics device found but there are
no active ports detected (or Open MPI was unable to use them). This
is most certainly not what you wanted. Check your cables, subnet
manager configuration, etc. The openib BTL will be ignored for this
job.Check the fabric performance
OpenMPI will default back to using Ethernet, you can tell by the high latency and low bandwidth:
david@compute000 imb]$ module load openmpi-x86_64
[david@compute000 imb]$ which mpirun
/usr/lib64/openmpi/bin/mpirun
[david@compute000 imb]$ pwd
/home/david/benchmarks/imb
[david@compute000 imb]$ cat hosts
compute000
compute001
[david@compute000 imb]$ /usr/lib64/openmpi/bin/mpirun -np 2 -hostfile ./hosts /usr/lib64/openmpi/bin/mpitests-IMB-MPI1
# lots of warning cut out
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 47.79 0.00 # <-- This is high ethernet latency, typical 1GB eth0 latency can be as low as 25usec
1 1000 44.85 0.02
2 1000 45.24 0.04
4 1000 45.87 0.08
8 1000 44.51 0.17
16 1000 43.21 0.35
32 1000 43.76 0.70
64 1000 43.92 1.39
128 1000 43.48 2.81
256 1000 48.91 4.99
512 1000 52.95 9.22
1024 1000 96.30 10.14
2048 1000 403.23 4.84
4096 1000 262.84 14.86
8192 1000 279.54 27.95
16384 1000 333.65 46.83
32768 1000 686.98 45.49
65536 640 1364.94 45.79
131072 320 1668.31 74.93
262144 160 2683.26 93.17
524288 80 5044.39 99.12
1048576 40 9498.91 105.28
2097152 20 18256.90 109.55
4194304 10 36169.60 110.59 # <-- Typical 1GB bandwidthMake sure the subnet manager is running
Sometime the subnet manager will be running on the switch, other times it will need to be started manually on one of the hosts on the IB fabric. OFED provides a utility to run a subnet manager on a host (from the opensm pacakge)
/etc/init.d/opensmd restart
# checking the ibstatus output, we have an ACTIVE link!
[david@compute000 imb]$ ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:0030:48ff:ffff:e57d
base lid: 0x1
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
link_layer: InfiniBandOk now we are looking much better, test performance again:
QDR MPI Performance Figures
Benchmark results for IMB (Intel MPI Benchmark) on QDR 40Gb
Here are some figures for QDR IB (FDR cards, running at QDR speed because of switch)
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 1.29 0.00
1 1000 1.15 0.83
2 1000 1.16 1.65
4 1000 1.16 3.30
8 1000 1.17 6.50
16 1000 1.19 12.79
32 1000 1.22 24.97
64 1000 1.25 48.93
128 1000 1.85 66.03
256 1000 1.96 124.60
512 1000 2.15 227.37
1024 1000 2.50 390.62
2048 1000 2.90 673.74
4096 1000 3.68 1061.62
8192 1000 5.36 1457.39
16384 1000 7.81 1999.63
32768 1000 12.21 2560.43
65536 640 20.84 2999.41
131072 320 38.02 3288.14
262144 160 75.01 3332.78
524288 80 146.31 3417.37
1048576 40 289.19 3457.94
2097152 20 574.40 3481.87
4194304 10 1144.80 3494.05
FDR MPI Performance Figures using fibre cable
Done attaching two systems directly as the switch didn't work with the cable.
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.94 0.00
1 1000 1.01 0.95
2 1000 1.01 1.89
4 1000 0.97 3.92
8 1000 0.98 7.79
16 1000 0.98 15.50
32 1000 1.01 30.26
64 1000 1.05 58.07
128 1000 1.38 88.77
256 1000 1.50 162.33
512 1000 1.67 291.95
1024 1000 2.00 487.07
2048 1000 2.69 726.21
4096 1000 3.33 1172.88
8192 1000 4.73 1651.86
16384 1000 6.18 2527.32
32768 1000 8.92 3502.77
65536 640 15.49 4035.07
131072 320 27.39 4564.36
262144 160 45.02 5552.52
524288 80 86.40 5787.04
1048576 40 169.00 5917.16
2097152 20 334.23 5983.96
4194304 10 665.51 6010.43FDR MPI Performance Figure
Results of IB running at full FDR
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 1.14 0.00
1 1000 1.20 0.79
2 1000 1.14 1.67
4 1000 1.14 3.36
8 1000 1.16 6.56
16 1000 1.17 13.08
32 1000 1.19 25.65
64 1000 1.23 49.70
128 1000 1.55 78.63
256 1000 1.68 145.28
512 1000 1.85 263.73
1024 1000 2.18 447.45
2048 1000 2.84 686.61
4096 1000 3.48 1123.92
8192 1000 4.90 1595.05
16384 1000 6.72 2323.44
32768 1000 9.46 3303.35
65536 640 16.01 3902.81
131072 320 27.94 4473.51
262144 160 45.63 5479.44
524288 80 87.02 5745.92
1048576 40 169.46 5901.03
2097152 20 335.03 5969.69
4194304 10 668.45 5983.96Going too fast
To check that it is running between multiple nodes check the latency and bandwidth achieved.
Latencies lower than 1 or bandwidths higher than 4000 would suggest another issue as these speeds are higher than would be expected for infiniband. It is likely that the processes are running on the same node.