Graphcore M2000 System Software Validation
Prerequisites
Before validating which IPU-M2000 system software is running on the IPU-M2000s, follow the steps given in the M2000 Bringup.
Basic Validation
The easiest way to verify which version of the IPU-M2000 system software is installed on the IPU-M2000s is to use the included rack_tool application.
- Log in as ipuuser
- Run the following command:
ipuuser@ipu-host-2:~$ rack_tool status 17:01:36: Reading configuration from /home/ipuuser/.rack_tool/rack_config.json m01 BMC:[ UP ] GW:[ UP ] RNIC:[ UP ] Version:[ 2.0.0-rc.3+a51e75a ]
Verify V-IPU server and IPU-M2000 version compatibility
To verify that the versions of the V-IPU server and IPU-M2000 system software are compatible:
- Log in as ipuuser
- Run the following command:
ipuuser@ipu-host-2:~$ vipu-admin -v ipuuser@ipu-host-2:~$ vipu-admin –-server-version
The outputted versions should match:
ipuuser@ipu-host-2:~$ vipu-admin -v 1.12.3 ipuuser@ipu-host-2:~$ vipu-admin --server-version version: 1.12.3 host: unix:///var/run/vipu/vipu-server.sock
Software upgrade of IPU-M2000 units
If the IPU-M2000 system software version is not the latest available, or an incompatibility is shown between the IPU-M2000 system software and the VIPU-server, it may be necessary to update the IPU-M2000 system software.
- First check that all IPU-M2000s are up and available by running:
ipuuser@ipu-host-2:~$ rack_tool status 17:06:14: Reading configuration from /home/ipuuser/.rack_tool/rack_config.json m01 BMC:[ UP ] GW:[ UP ] RNIC:[ UP ] Version:[ 2.0.0-rc.3+a51e75a ]
- Trigger the upgrade by running:
ipuuser@ipu-host-2:~$ rack_tool upgrade
- In cases where specific upgrades fail on a particular machine, that specific upgrade can be restarted by running:
ipuuser@ipu-host-2:~$ rack_tool upgrade --select a01
(where a01 refers to a name specified in the rack_config.json file)
IPU-M2000 test with rack-tool
Start with a built-in self test:
ipuuser@ipu-host-3:~$ rack_tool bist 19:36:05: Reading configuration from /home/ipuuser/.rack_tool/rack_config.json 19:36:05: Started BIST on m01 19:47:47: Done BIST on m01. Logfile at: /home/ipuuser/IPU-M_releases/IPU_M_SW-2.0.0-rc.3+a51e75a/maintenance_tools/logs/m01_bist_20210128T1936.log 19:47:47: BIST succeeded for all machines 19:47:47: Time spent: 11m42s
Then a vipu-test:
ipuuser@ipu-host-3:~$ rack_tool vipu-test 19:08:50: Reading configuration from /home/ipuuser/.rack_tool/rack_config.json 19:08:50: create agent (m01): success. 19:08:50: create cluster (ipums): success. 19:12:45: Showing test results for cluster ipums Test Type | Duration | Passed | Summary ------------------------------------------------------------------------- Sync-Link | 0.00s | 0/0 | Sync Link test skipped Cabling | 2.38s | 0/0 | All cables connected as expected IPU-Link | 101.60s | 16/16 | All Links Passed Traffic | 130.05s | 1/1 | Traffic test passed GW-Link | 0.00s | 0/0 | GW Link test skipped Version | 0.00s | 6/6 | All component versions are consistent -------------------------------------------------------------------------
The vipu-test process creates an agent and a cluster on the IPU machine which will be required for later steps. These can be created manually but we'll leave these ones in place:
ipuuser@ipu-host-3:~$ vipu-admin list agents Agent | Host | Port | Status | Agent Version | IPUs | Last Error ---------------------------------------------------------------------------- m01 | 10.1.2.1 | 8080 | Up | 1.12.3 | 4/4 | ---------------------------------------------------------------------------- ipuuser@ipu-host-3:~$ vipu-admin list cluster Cluster | GW Topology | ILDs | ILD Topology | IPUs -------------------------------------------------------- ipums | LINE | 1 | MESH | 4 -------------------------------------------------------- ipuuser@ipu-host-3:~$
Now create a partition:
ipuuser@ipu-host-3:~$ vipu-admin create partition AllIPUs --cluster ipums --size 4 --reconfigurable