Graphcore M2000 System Debug

From Define Wiki
Jump to navigation Jump to search

No 100GbE link

During the bringup of our initial IPU M2000 units, it was noticed that the 100GbE interfaces would not give a link. More correctly, they did provide a link as the unit was powered on but as the fans ramped down after a minute or so, the link would drop.

The management interface would show a link, as expected.

It was identified that there was an issue with the software on the early IPU M2000 which mean the GW and IPUs were not being powered on automatically.

Accessing the IPU M2000 BMC

Identify the BMC's IP address

To access the IPU M2000's BMC, first it is necessary to identify the unit IP address. This can be achieved by a number of methods but it is usually simple enough to look through the /var/log/syslog syslog to find the IP address offered by the DHCP server:

ipuuser@ipu-host-2:~$ sudo grep dhcpd /var/log/syslog* | grep -i discover
/var/log/syslog.1:Jan 20 14:23:29 ipu-host-2 dhcpd[3474]: DHCPDISCOVER from 70:69:79:20:13:b4 via enp65s0f1

ipuuser@ipu-host-2:~$ sudo grep 70:69:79:20:13:b4 /var/log/syslog*
/var/log/syslog:Jan 21 06:27:08 ipu-host-2 dhcpd[3474]: DHCPREQUEST for 10.1.1.1 from 70:69:79:20:13:b4 via enp65s0f1
/var/log/syslog:Jan 21 06:27:08 ipu-host-2 dhcpd[3474]: DHCPACK on 10.1.1.1 to 70:69:79:20:13:b4 via enp65s0f1

Access the IPU's BMC using SSH

SSH to the IP address identified above using the login credentials of root and 0penBmc:

ipuuser@ipu-host-2:~$ ssh 10.1.1.1 -l root
The authenticity of host '10.1.1.1 (10.1.1.1)' can't be established.
RSA key fingerprint is SHA256:SW5Bwfpj/WjkwYu8eQoVefyquJhLmUM3AxCrzW0eYfA.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.1.1.1' (RSA) to the list of known hosts.
root@10.1.1.1's password: 
=======================================================================
Graphcore IPU-Machine ipum-p2-rev1 - OpenBMC gc-v1.6.0-21-g2ad233f 

Some useful commands:
  ipum-utils             Collection of Power and Boot Control Commands.
  ipum-diags             Enter Maintenance Mode. For low level testing including BIST.
  ipum_sysfpga           Utility for Upgrading System FPGA.
  obmc-console-client    Serial Console Access to Gateway SOC. To exit press "Enter" and then "~.".

This login is using the default root password. It is advised to change it, using Redfish or GUI.
=======================================================================
root@ipum-p2-rev1-8204721-0012:~# 

Power on GW and IPUs

Run the following command:

root@ipum-p2-rev1-8204721-0012:~# ipum-utils power_on