Install and configure Intel Omni Path OPA Fabric

From Define Wiki
Jump to navigation Jump to search

Install the OPA Fabric Software

# tested on an OpenHPC compute node - other deps may be required on vanilla centos nodes
yum install expect sysfsutils kernel-devel libibmad libibumad 
yum install pciutils tcsh atlas sysfsutils infinipath-psm 
tar zxvf IntelOPA-IFS.RHEL72-x86_64.10.1.1.0.9.tgz 
cd IntelOPA-IFS.RHEL72-x86_64.10.1.1.0.9/
./INSTALL \
          -i opa_stack -i opa_stack_dev -i intel_hfi \
          -i delta_ipoib -i ibacm -i fastfabric \
          -i mvapich2_gcc_hfi -i mvapich2_intel_hfi \
          -i openmpi_gcc_hfi  -i openmpi_intel_hfi \
          -i opafm -i oftools -D opafm

# once installed its recommended that you reboot - NOTE OpenHPC nodes/make sure re-install isnt set. 
systemctl disable srpd
reboot

Verify the Fabric/Adaptor

Make sure the subnet manager is running

systemctl status opafm

Example output if the fabric manager is not running

[root@node001 IntelOPA-IFS.RHEL72-x86_64.10.1.1.0.9]# opainfo 
hfi1_0:1                           PortGUID:0x00117501017bfb57
   PortState:     Init (LinkUp)
   LinkSpeed      Act: 25Gb         En: 25Gb        
   LinkWidth      Act: 4            En: 4           
   LinkWidthDnGrd ActTx: 4  Rx: 4   En: 1,2,3,4     
   LCRC           Act: 14-bit       En: 14-bit,16-bit,48-bit       Mgmt: True 
   QSFP: PassiveCu, 2m   Hitachi Metals    P/N IQSFP26C-20       Rev 02
   Xmit Data:                  0 MB Pkts:                    0
   Recv Data:                  0 MB Pkts:                    0
   Link Quality: 5 (Excellent)

With the subnet manager running, link goes from INIT to ACTIVE

[root@node001 IntelOPA-IFS.RHEL72-x86_64.10.1.1.0.9]# opainfo 
hfi1_0:1                           PortGID:0xfe80000000000000:00117501017bfb57
   PortState:     Active
   LinkSpeed      Act: 25Gb         En: 25Gb        
   LinkWidth      Act: 4            En: 4           
   LinkWidthDnGrd ActTx: 4  Rx: 4   En: 3,4         
   LCRC           Act: 14-bit       En: 14-bit,16-bit,48-bit       Mgmt: True 
   LID: 0x00000001-0x00000001       SM LID: 0x00000001 SL: 0 
   QSFP: PassiveCu, 2m   Hitachi Metals    P/N IQSFP26C-20       Rev 02
   Xmit Data:                  1 MB Pkts:                 4355
   Recv Data:                  1 MB Pkts:                 4472
   Link Quality: 5 (Excellent)

Check the fabric details

[root@node001 ~]# opafabricinfo 
Fabric 0:0 Information:
SM: node001 hfi1_0 Guid: 0x00117501017bfb57 State: Master
Number of HFIs: 51
Number of Switches: 2
Number of Links: 67
Number of HFI Links: 51             (Internal: 0   External: 51)
Number of ISLs: 16                  (Internal: 0   External: 16)
Number of Degraded Links: 1         (HFI Links: 0   ISLs: 1)
Number of Omitted Links: 0          (HFI Links: 0   ISLs: 0)
-------------------------------------------------------------------------------

Performance Tests

Quick step to verify, assumes ssh passwordless access between hosts

# Note; set the CPUs to performance on all nodes
# cpupower frequency-set --governor performance
source /usr/mpi/gcc/mvapich2-*-hfi/bin/mpivars.sh
cd /usr/mpi/gcc/mvapich2-*-hfi/tests/osu_benchmarks-*
# verify latency 
mpirun -hosts node001,node002 ./osu_latency
# verify bandwidth 
mpirun -hosts node001,node002 ./osu_bw
# deviation 
cd /usr/mpi/gcc/mvapich2-*-hfi/tests/intel
seq -f 'node0%02.0f' 1 16 > /tmp/mpi_hosts
mpirun -hostfile /tmp/mpi_hosts ./deviation

Latency OSU

[root@node001 osu_benchmarks-3.1.1]# mpirun -hosts node001,node002 ./osu_latency
# OSU MPI Latency Test v3.1.1
# Size            Latency (us)
0                         1.02
1                         1.00
2                         0.99
4                         0.97
8                         0.97
16                        1.09
32                        1.09
64                        1.09
128                       1.10
256                       1.14
512                       1.22
1024                      1.33
2048                      1.57
4096                      1.99
8192                      3.12
16384                     5.78
32768                     7.73
65536                    14.27
131072                   20.81
262144                   31.25
524288                   51.42
1048576                  94.65
2097152                 178.55
4194304                 347.68

Bandwidth OSU

[root@node001 osu_benchmarks-3.1.1]# mpirun -hosts node001,node002 ./osu_bw 
# OSU MPI Bandwidth Test v3.1.1
# Size        Bandwidth (MB/s)
1                         3.03
2                         6.36
4                        12.66
8                        26.04
16                       47.82
32                       95.72
64                      191.98
128                     379.83
256                     715.83
512                    1393.06
1024                   2484.21
2048                   4106.95
4096                   6075.32
8192                   7772.87
16384                  8021.97
32768                 10206.65
65536                 11830.66
131072                12121.88
262144                12240.60
524288                12324.96
1048576               12366.55
2097152               12378.66
4194304               12382.88

Intel Deviation Tests

[root@node001 intel]# mpirun -hosts node001,node002,node003,node005 ./deviation 

Trial runs of 4 hosts are being performed to find
the best host since no baseline host was specified.

Baseline host is node001.plymouth.net (0)

Running Sequential MPI Latency Tests - Pairs 3   Testing     3
Running Sequential MPI Bandwidth Tests - Pairs 3   Testing     3

Sequential MPI Performance Test Results
  Latency Summary:
    Min: 0.98 usec, Max: 1.10 usec, Avg: 1.05 usec
    Range: +12.3% of Min, Worst: +4.4% of Avg
    Cfg: Tolerance: +50% of Avg, Delta: 0.80 usec, Threshold: 1.85 usec
         Message Size: 0, Loops: 4000

  Bandwidth Summary:
    Min: 12318.9 MB/s, Max: 12375.5 MB/s, Avg: 12341.2 MB/s
    Range: -0.5% of Max, Worst: -0.2% of Avg
    Cfg: Tolerance: -20% of Avg, Delta: 150.0 MB/s, Threshold: 9873.0 MB/s
         Message Size: 2097152, Loops: 30 BiDir: no

Latency: PASSED
Bandwidth: PASSED