Install and configure Intel Omni Path OPA Fabric
- Software can be downloaded from here: https://downloadcenter.intel.com/download/26064/Intel-Omni-Path-Fabric-Software-Including-Intel-Omni-Path-Host-Fabric-Interface-Driver- (As of Aug16)
- Docs etc: http://www.intel.com/content/www/us/en/support/network-and-i-o/fabric-products/000016242.html
- Two packages:
- Basic: Just the compute nodes drivers and software (no subnet manager)
- IFS: Includes the fabric manager / subnet manager
Install the OPA Fabric Software
Check kernel-devel on your build host is same as kernel / or pull from;
wget http://mirror.centos.org/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-327.22.2.el7.x86_64.rpm
# for lustre kernel (from ee-3.0.0.0)
wget http://mirror.centos.org/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-327.13.1.el7.x86_64.rpmInstall the IFS software
# tested on an centos 7
yum install expect sysfsutils kernel-devel libibmad libibumad rdma libibverbs bc
yum install pciutils tcsh atlas sysfsutils infinipath-psm
tar zxvf IntelOPA-IFS.RHEL72-x86_64.10.1.1.0.9.tgz
cd IntelOPA-IFS.RHEL72-x86_64.10.1.1.0.9/
./INSTALL \
-i opa_stack -i opa_stack_dev -i intel_hfi \
-i delta_ipoib -i ibacm -i fastfabric \
-i mvapich2_gcc_hfi -i mvapich2_intel_hfi \
-i openmpi_gcc_hfi -i openmpi_intel_hfi \
-i opafm -i oftools -D opafm
# once installed its recommended that you reboot - NOTE OpenHPC nodes/make sure re-install isnt set.
systemctl disable srpd
rebootVerify the Fabric/Adaptor
Check the kernel module is loaded etc
[root@node001 ~]# lsmod | grep hfi
hfi1 655730 1
ib_mad 47817 4 hfi1,ib_cm,ib_sa,ib_umad
ib_core 98787 14 hfi1,rdma_cm,ib_cm,ib_sa,iw_cm,xprtrdma,ib_mad,ib_ucm,ib_iser,ib_umad,ib_uverbs,ib_ipoib,ib_isert
[root@node001 ~]# modinfo hfi1
filename: /lib/modules/3.10.0-327.22.2.el7.x86_64/updates/hfi1.ko
version: 0.11-162
description: Intel Omni-Path Architecture driver
license: Dual BSD/GPL
rhelversion: 7.2
srcversion: A9F55090C67176C5B9120E1
alias: pci:v00008086d000024F1sv*sd*bc*sc*i*
alias: pci:v00008086d000024F0sv*sd*bc*sc*i*
depends: ib_core,ib_mad
vermagic: 3.10.0-327.22.2.el7.x86_64 SMP mod_unload modversionsMake sure the subnet manager is running
systemctl status opafmExample output if the fabric manager is not running
[root@node001 IntelOPA-IFS.RHEL72-x86_64.10.1.1.0.9]# opainfo
hfi1_0:1 PortGUID:0x00117501017bfb57
PortState: Init (LinkUp)
LinkSpeed Act: 25Gb En: 25Gb
LinkWidth Act: 4 En: 4
LinkWidthDnGrd ActTx: 4 Rx: 4 En: 1,2,3,4
LCRC Act: 14-bit En: 14-bit,16-bit,48-bit Mgmt: True
QSFP: PassiveCu, 2m Hitachi Metals P/N IQSFP26C-20 Rev 02
Xmit Data: 0 MB Pkts: 0
Recv Data: 0 MB Pkts: 0
Link Quality: 5 (Excellent)With the subnet manager running, link goes from INIT to ACTIVE
[root@node001 IntelOPA-IFS.RHEL72-x86_64.10.1.1.0.9]# opainfo
hfi1_0:1 PortGID:0xfe80000000000000:00117501017bfb57
PortState: Active
LinkSpeed Act: 25Gb En: 25Gb
LinkWidth Act: 4 En: 4
LinkWidthDnGrd ActTx: 4 Rx: 4 En: 3,4
LCRC Act: 14-bit En: 14-bit,16-bit,48-bit Mgmt: True
LID: 0x00000001-0x00000001 SM LID: 0x00000001 SL: 0
QSFP: PassiveCu, 2m Hitachi Metals P/N IQSFP26C-20 Rev 02
Xmit Data: 1 MB Pkts: 4355
Recv Data: 1 MB Pkts: 4472
Link Quality: 5 (Excellent)Check the fabric details
[root@node001 ~]# opafabricinfo
Fabric 0:0 Information:
SM: node001 hfi1_0 Guid: 0x00117501017bfb57 State: Master
Number of HFIs: 51
Number of Switches: 2
Number of Links: 67
Number of HFI Links: 51 (Internal: 0 External: 51)
Number of ISLs: 16 (Internal: 0 External: 16)
Number of Degraded Links: 1 (HFI Links: 0 ISLs: 1)
Number of Omitted Links: 0 (HFI Links: 0 ISLs: 0)
-------------------------------------------------------------------------------Performance Tests
Quick step to verify, assumes ssh passwordless access between hosts
# Note; set the CPUs to performance on all nodes
# cpupower frequency-set --governor performance
source /usr/mpi/gcc/mvapich2-*-hfi/bin/mpivars.sh
cd /usr/mpi/gcc/mvapich2-*-hfi/tests/osu_benchmarks-*
# verify latency
mpirun -hosts node001,node002 ./osu_latency
# verify bandwidth
mpirun -hosts node001,node002 ./osu_bw
# deviation
cd /usr/mpi/gcc/mvapich2-*-hfi/tests/intel
seq -f 'node0%02.0f' 1 16 > /tmp/mpi_hosts
mpirun -hostfile /tmp/mpi_hosts ./deviationLatency OSU
[root@node001 osu_benchmarks-3.1.1]# mpirun -hosts node001,node002 ./osu_latency
# OSU MPI Latency Test v3.1.1
# Size Latency (us)
0 1.02
1 1.00
2 0.99
4 0.97
8 0.97
16 1.09
32 1.09
64 1.09
128 1.10
256 1.14
512 1.22
1024 1.33
2048 1.57
4096 1.99
8192 3.12
16384 5.78
32768 7.73
65536 14.27
131072 20.81
262144 31.25
524288 51.42
1048576 94.65
2097152 178.55
4194304 347.68Bandwidth OSU
[root@node001 osu_benchmarks-3.1.1]# mpirun -hosts node001,node002 ./osu_bw
# OSU MPI Bandwidth Test v3.1.1
# Size Bandwidth (MB/s)
1 3.03
2 6.36
4 12.66
8 26.04
16 47.82
32 95.72
64 191.98
128 379.83
256 715.83
512 1393.06
1024 2484.21
2048 4106.95
4096 6075.32
8192 7772.87
16384 8021.97
32768 10206.65
65536 11830.66
131072 12121.88
262144 12240.60
524288 12324.96
1048576 12366.55
2097152 12378.66
4194304 12382.88Intel Deviation Tests
[root@node001 intel]# mpirun -hosts node001,node002,node003,node005 ./deviation
Trial runs of 4 hosts are being performed to find
the best host since no baseline host was specified.
Baseline host is node001.plymouth.net (0)
Running Sequential MPI Latency Tests - Pairs 3 Testing 3
Running Sequential MPI Bandwidth Tests - Pairs 3 Testing 3
Sequential MPI Performance Test Results
Latency Summary:
Min: 0.98 usec, Max: 1.10 usec, Avg: 1.05 usec
Range: +12.3% of Min, Worst: +4.4% of Avg
Cfg: Tolerance: +50% of Avg, Delta: 0.80 usec, Threshold: 1.85 usec
Message Size: 0, Loops: 4000
Bandwidth Summary:
Min: 12318.9 MB/s, Max: 12375.5 MB/s, Avg: 12341.2 MB/s
Range: -0.5% of Max, Worst: -0.2% of Avg
Cfg: Tolerance: -20% of Avg, Delta: 150.0 MB/s, Threshold: 9873.0 MB/s
Message Size: 2097152, Loops: 30 BiDir: no
Latency: PASSED
Bandwidth: PASSED