Difference between revisions of "Mellanox:training"
| Line 204: | Line 204: | ||
ibdiagnet -P all=1 --get_cable_info | ibdiagnet -P all=1 --get_cable_info | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| + | |||
| + | |||
| + | == Firmware updates == | ||
| + | |||
| + | <syntaxhighlight> | ||
| + | mst start | ||
| + | mst status | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | |||
| + | <syntaxhighlight> | ||
| + | [root@nodeB MLNX_OFED_LINUX-2.1-1.0.6-rhel6.5-x86_64]# mst status | ||
| + | MST modules: | ||
| + | ------------ | ||
| + | MST PCI module loaded | ||
| + | MST PCI configuration module loaded | ||
| + | |||
| + | MST devices: | ||
| + | ------------ | ||
| + | /dev/mst/mt4099_pciconf0 - PCI configuration cycles access. | ||
| + | domain:bus:dev.fn=0000:06:00.0 addr.reg=88 data.reg=92 | ||
| + | Chip revision is: 01 | ||
| + | /dev/mst/mt4099_pci_cr0 - PCI direct access. | ||
| + | domain:bus:dev.fn=0000:06:00.0 bar=0xdf900000 size=0x100000 | ||
| + | Chip revision is: 01 | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | |||
| + | |||
| + | <syntaxhighlight> | ||
| + | flint -d <dev> -i <file> | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | |||
| + | == OEM firmware == | ||
| + | |||
| + | mellanox.com | ||
| + | support | ||
| + | OEM firmwre | ||
| + | supermicro | ||
Revision as of 16:08, 7 April 2014
Check the card is detected
[root@nodeA ~]# lspci | grep -i Mellanox
06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Mellanox OFED
Don't change the kernel. Its build for the default version of the kernel.
If the kernel changes MLNX_OFED must be rebuilt for the running kernel
Installation
There are a number of options available. To see them all run:
./mlnxofedinstall --lyum install tcl tk libnl-devel gcc-gfortran
./mlnxofedinstallIt will try to update the firmware at the end of the install:
Device #1:
----------
Device: 0000:06:00.0
Part Number:
Description:
PSID: MT_1060110019
Versions: Current Available
FW 2.10.0000 N/A
Status: No matching image found
Restart the driver
Either reboot the node or run:
/etc/init.d/openibd restart
Check the state
[root@nodeB MLNX_OFED_LINUX-2.1-1.0.6-rhel6.5-x86_64]# ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:0030:48ff:ffff:e535
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 4: PortConfigurationTraining
rate: 10 Gb/sec (4X)
link_layer: InfiniBandStart the Subnet manager
the subnet manager must be running somewhere - the switch, a node or a service
Start it on the swtich
IB SM Management
Base SM
SM enable
applyThe state of the connection will become active and a LID will be assigned.
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.10.0
Hardware version: 0
Node GUID: 0x003048ffffffe534
System image GUID: 0x003048ffffffe537
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 2
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: 0x003048ffffffe535
Link layer: InfiniBandEach Link will have a separate GUID - these are basically the equivalent of a MAC address. They should be unique to rvery device unless someone has been messing around.
Communation is based of the LID - this
Subnet manager
Only one subnet manager needs to be running. An extra instances will be used if the running on fails. If there are multiple back ups there is an election to decide who takes over.
The subnet manager assigns the LIDs and builds the routing table. This can take a while depending on how complicated the topology is.
If the SM is running on the switch it can be managed under the IB SM MGMT tab.
Testing
Again there are numerous options, but they must be the same on both sides.
ib_read_bw <any ip on system>[root@nodeB MLNX_OFED_LINUX-2.1-1.0.6-rhel6.5-x86_64]# ib_read_bw
************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
RDMA_Read BW Test
Dual-port : OFF Device : mlx4_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
CQ Moderation : 100
Mtu : 2048[B]
Link type : IB
Outstand reads : 16
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x02 QPN 0x0058 PSN 0xfd98dc OUT 0x10 RKey 0x001900 VAddr 0x007f247ebd0000
remote address: LID 0x03 QPN 0x0058 PSN 0xcdbe1f OUT 0x10 RKey 0x001900 VAddr 0x007f6012610000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
65536 1000 6041.81 6037.05 0.096593
---------------------------------------------------------------------------------------
ib_read_bw -a -b. If warnings stop cpuspeed, bios cpu max perf, should be around 14
ib_read_lat -a shows latnecies
Bad perforance?
Verify the fabric using melanox tools. It should be version 2 or higher.
ibdiagnet
</bahs>
<syntaxhighlight>
#clear the counters
ibdiagnet -pc
# run
ibdiagnet -P all=1Summary
-I- Stage Warnings Errors Comment
-I- Discovery 0 0
-I- Lids Check 0 0
-I- Links Check 0 0
-I- Subnet Manager 0 0
-I- Port Counters 2 0
-I- Nodes Information 0 2
-I- Speed / Width checks 0 0
-I- Partition Keys 0 0
-I- Alias GUIDs 0 0
vim /var/tmp/ibdiagnet2/ibdiagnet2.log
vim /var/tmp/ibdiagnet2/ibdiagnet2.pmibdiagnet -P all=1 --ber_test --pm_pause_time 30
ibdiagnet -P all=1 --get_cable_info
Firmware updates
mst start
mst status
[root@nodeB MLNX_OFED_LINUX-2.1-1.0.6-rhel6.5-x86_64]# mst status
MST modules:
------------
MST PCI module loaded
MST PCI configuration module loaded
MST devices:
------------
/dev/mst/mt4099_pciconf0 - PCI configuration cycles access.
domain:bus:dev.fn=0000:06:00.0 addr.reg=88 data.reg=92
Chip revision is: 01
/dev/mst/mt4099_pci_cr0 - PCI direct access.
domain:bus:dev.fn=0000:06:00.0 bar=0xdf900000 size=0x100000
Chip revision is: 01
flint -d <dev> -i <file>
OEM firmware
mellanox.com support OEM firmwre supermicro