Graphcore M2000 Direct Connect Bringup

From Define Wiki
Revision as of 22:25, 2 February 2021 by David (talk | contribs) (Created page with "== Hardware Spec == At the time of writing, the current spec host system is based around: * AS-1024US-TRT (or similar H12 Ultra) * 2 x AMD 7402 CPUs * 16 x 32GB DIMMs * 2 x MC...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Hardware Spec

At the time of writing, the current spec host system is based around:

  • AS-1024US-TRT (or similar H12 Ultra)
  • 2 x AMD 7402 CPUs
  • 16 x 32GB DIMMs
  • 2 x MCX556A-EDAT (though 1 x single port version should suffice)
  • At least on local drive for O/S and software install

Operating System

At the time of writing, the latest support version of Linux is Ubuntu 18.04.4 LTS. Install this as normal (minimal installation is fine). CentOS 7.2 and CentOS 8 are also supported but, as yet, untested at Boston.

Prerequisite Packages

A number of packages need to be installed using aptitude:

root@ipu-host-2:~# apt-get install apt-transport-https ibverbs-utils openjdk-8-jdk python3-virtualenv autoconf ipmitool php-cli python3-wheel automake jq php-curl qtcreator bc kcachegrind policykit-1 rdma-core build-essential libaio-dev protobuf-compiler screen ccache libboost-all-dev python-boto3 software-properties-common clang libeigen3-dev python-dev sshpass cmake libjson-c-dev python-lxml subversion curl libjson-c-doc python-numpy swig direnv libpci-dev python-pip sysfsutils dkms libpixman-1-dev python-pytest tar emacs libprotobuf-dev python-recommonmark tmux ethtool libtool python-requests u-boot-tools exuberant-ctags lldpad python-setuptools unzip flex m4 python-wheel valgrind g++ minicom python-yaml vim gawk moreutils python2 virtualenv gcc net-tools python3 wdiff gdb netcat python3-dev wget git parallel python3-numpy zip golang-go pciutils python3-pip htop perl python3-setuptools

And a few python packages which can be installed via pip:

root@ipu-host-2:~# pip install autograd paramiko pylint scp jstyleson pep8 pyyaml yapf mock pexpect requests

User Accounts

Account Overview

The following user accounts are required on the host system:

Accounts Function
root A root user account secured with a password is recommended.
itadmin An admin account secured with a password is recommended. Home folder located at /home/itadmin using bash shell.
ipuuser An account dedicated to IPU software and IPU-M2000 management software is mandatory. Home folder located at /home/ipuuser using bash shell.
poplaruser An account dedicated to Poplar software is mandatory. Home folder located at /home/poplaruser using bash shell.


The following user accounts are present on the IPU M2000 system:

Login to Username Password
IPU-M2000 BMC OS root 0penBmc
IPU-M2000 GW OS itadmin ChangeMeFdh5P

Create Users on Host system

Create users with useradd:

root@ipu-host-2:~# useradd -m itadmin
root@ipu-host-2:~# useradd -m ipuuser
root@ipu-host-2:~# useradd -m poplaruser

And set passwords with passwd (repeat for all users):

root@ipu-host-2:~# passwd itadmin
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully

Double check that the default shell for each of these users is bash. Edit the /etc/passwd file as appropriate, but each of these new users should look something like:

ipuuser@ipu-host-2:~$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
.
.
.
itadmin:x:1001:1001::/home/itadmin:/bin/bash
ipuuser:x:1002:1002::/home/ipuuser:/bin/bash
poplaruser:x:1003:1003::/home/poplaruser:/bin/bash

Add sudo rights

Edit the /etc/sudoers file to look something like:

ipuuser@ipuhost:~$ sudo cat /etc/sudoers
#
# This file MUST be edited with the 'visudo' command as root.
#
# Please consider adding local content in /etc/sudoers.d/ instead of
# directly modifying this file.
#
# See the man page for details on how to write a sudoers file.
#
Defaults	env_reset
Defaults	mail_badpass
Defaults	secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"

# Host alias specification

# User alias specification

# Cmnd alias specification

# User privilege specification
root	ALL=(ALL:ALL) ALL
itadmin	ALL=(ALL:ALL) ALL
ipuuser	ALL=(ALL:ALL) ALL

# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL

# Allow members of group sudo to execute any command
%sudo	ALL=(ALL:ALL) ALL

# See sudoers(5) for more information on "#include" directives:

#includedir /etc/sudoers.d

Networking

Network Configuration Overview

The Host system should have interfaces configured as follows:

Port Role Link
Speed
IP address Configured From
enp65s0f0 External connectivity to IT infrastructure 1GbE 192.168.8.20x/20 Static or DHCP lease
enp65s0f1 Management of IPU-M2000 1GbE 10.1.3.101/22 Static
enp129s0f0 Unused 100GbE N/A N/A
enp129s0f1 RDMA interface to IPU-M2000 100GbE 10.1.5.5/24 Static

The IPU M2000 system should have interfaces configured as follows:

Port Role Link
Speed
IP address Configured From
IPU
Mgmt Port
BMC+GW management ports 1GbE BMC: 10.1.1.1/22
GW: 10.1.2.1/22
Static lease from DHCP server
IPU
100GbE #1
Host-link data-plane link to IPU-M2000s 100GbE 10.1.5.2/30 Static lease from DHCP server

Onboard Intel Network Drivers

Build Intel NIC drivers

It may be necessary to install a temporary NIC in order to pull down build tools and drivers
Install build-essentials

ipuuser@ipuhost:~$ sudo apt-get install build-essentials

Copy the Intel NIC driver to the system and decompress the archive:

ipuuser@ipuhost:~$ tar zxvf i40e-2.13.10.tar.gz

Switch to the source directory:

ipuuser@ipuhost:~$ cd nic_temp/i40e-2.13.10/src/

Make the module:

ipuuser@ipuhost:~$ make install

Modprobe the new driver module:

ipuuser@ipuhost:~$ sudo modprobe i40e

Configure IP address for onboard NIC

Edit the /etc/netplan/01-netcfg.yaml file to look something like this:

ipuuser@ipuhost:~$ cat /etc/netplan/01-netcfg.yaml
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  renderer: networkd
  ethernets:
    enp65s0f0:
      addresses:
        - 192.168.8.201/20
      gateway4: 192.168.5.1
      nameservers:
        addresses: [192.168.5.3, 192.168.5.2]

Apply the Netplan config:

ipuuser@ipuhost:~$ sudo netplay apply

Mellanox Network Drivers

Add Mellanox Repos

Add the Mellanox repositories so we can install the necessary Mellanox drivers:

cd /etc/apt/sources.list.d/
curl -LO https://linux.mellanox.com/public/repo/mlnx_ofed/latest/ubuntu18.04/mellanox_mlnx_ofed.list

Edit the .list file so it looks like this:

ipuuser@ipu-host-2:/etc/apt/sources.list.d$ cat mellanox_mlnx_ofed.list 
#
# Mellanox Technologies Ltd. public repository configuration file.
# For more information, refer to http://linux.mellanox.com
#

# [mlnx_ofed_5.2-1.0.4.0_base]
deb [trusted=yes] http://linux.mellanox.com/public/repo/mlnx_ofed/5.2-1.0.4.0/ubuntu18.04/$(ARCH) ./

Pull down a copy of the Mellanox GPG key:

root@ipu-host-3:/etc/apt/sources.list.d# wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | sudo apt-key add -

Update the aptitude database and install the Mellanox OFED package:

root@ipu-host-2:~$ apt update

Install and configure Mellanox drivers

root@ipu-host-2:~$ apt-get install mlnx-ofed-all

Ensure the interfaces are configured to Ethernet rather than Infiniband:

root@ipu-host-2:~$ mlxconfig -d /dev/mst/mt4121_pciconf0 query
root@ipu-host-2:~$ mlxconfig -d /dev/mst/mt4121_pciconf0 set LINK_TYPE_P1=2
root@ipu-host-2:~$ mlxconfig -d /dev/mst/mt4121_pciconf0 set LINK_TYPE_P2=2

Reboot the server

Netplan configuration

Networking on the Host system is configured by Netplan. The /etc/netplan/01-netcfg.yaml configuration file should look something like this (to reflect the table above):

root@ipu-host-2:~# cat /etc/netplan/01-netcfg.yaml 
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  renderer: networkd
  ethernets:
    enp65s0f0:
      addresses:
        - 192.168.8.202/20
      gateway4: 192.168.5.1
      nameservers:
        addresses: [192.168.5.3, 192.168.5.2]
    enp65s0f1:
      addresses:
        - 10.1.3.101/22
    enp129s0f1:
      addresses:
        - 10.1.5.5/24

Apply the Netplan configuration:

root@ipu-host-2:~# netplan apply

DHCP Server

Installation

An isc-dhcp-server is required on the host system to provide the IPU M2000 system with the appropriate IP address (detailed above). It can be installed from the standard repos"

root@ipu-host-2:~# apt-get install isc-dhcp-server

Interface Configuration file

The /etc/default/isc-dhcp-server file dictates which interfaces will be serviced by the DHCP server service. It should look something like:

root@ipu-host-2:~# cat /etc/default/isc-dhcp-server
INTERFACESv4="enp65s0f1 enp129s0f1"
INTERFACESv6=""

Main DHCP configuration file

The /etc/dhcp/dhcpd.conf file provides configuration information for the DHCP server service. It should look something like:

root@ipu-host-2:~# cat /etc/dhcp/dhcpd.conf
default-lease-time 600;
max-lease-time 1200;
ddns-update-style none;
authoritave;
log-facility local7;

subnet 10.1.5.0 netmask 255.255.255.0 {
    option subnet-mask 255.255.255.0;
    range 10.1.5.2 10.1.5.2;
}
subnet 10.1.0.0 netmask 255.255.252.0 {
        option subnet-mask 255.255.252.0;
}
host ipum1bmc { hardware ethernet 70:69:79:20:13:b4; fixed-address 10.1.1.1; }
host ipum1gw { hardware ethernet 70:69:79:20:13:b5; fixed-address 10.1.2.1; }

host ipum1mx { hardware ethernet 0c:42:a1:78:89:cd; fixed-address 10.1.5.2; }

Start and enable the DHCP Service

Start the DHCP service with:

root@ipu-host-2:~# systemctl start isc-dhcp-server

Enable the DHCP service with:

root@ipu-host-2:~# systemctl enable isc-dhcp-server

Check the status of the DHCP service with:

root@ipu-host-2:~# systemctl status isc-dhcp-server
● isc-dhcp-server.service - ISC DHCP IPv4 server
   Loaded: loaded (/lib/systemd/system/isc-dhcp-server.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2021-01-19 16:43:43 GMT; 1 day 22h ago
     Docs: man:dhcpd(8)
 Main PID: 3474 (dhcpd)
    Tasks: 1 (limit: 19660)
   CGroup: /system.slice/isc-dhcp-server.service
           └─3474 dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf

Jan 21 15:36:08 ipu-host-2 dhcpd[3474]: DHCPREQUEST for 10.1.1.1 from 70:69:79:20:13:b4 via enp65s0f1
Jan 21 15:36:08 ipu-host-2 dhcpd[3474]: DHCPACK on 10.1.1.1 to 70:69:79:20:13:b4 via enp65s0f1
Jan 21 15:36:11 ipu-host-2 dhcpd[3474]: DHCPREQUEST for 10.1.2.1 from 70:69:79:20:13:b5 via enp65s0f1
Jan 21 15:36:11 ipu-host-2 dhcpd[3474]: DHCPACK on 10.1.2.1 to 70:69:79:20:13:b5 via enp65s0f1
Jan 21 15:36:24 ipu-host-2 dhcpd[3474]: DHCPREQUEST for 10.1.5.2 from 0c:42:a1:78:89:cd via enp129s0f1
Jan 21 15:36:24 ipu-host-2 dhcpd[3474]: DHCPACK on 10.1.5.2 to 0c:42:a1:78:89:cd via enp129s0f1
Jan 21 15:41:07 ipu-host-2 dhcpd[3474]: DHCPREQUEST for 10.1.1.1 from 70:69:79:20:13:b4 via enp65s0f1
Jan 21 15:41:07 ipu-host-2 dhcpd[3474]: DHCPACK on 10.1.1.1 to 70:69:79:20:13:b4 via enp65s0f1
Jan 21 15:41:10 ipu-host-2 dhcpd[3474]: DHCPREQUEST for 10.1.2.1 from 70:69:79:20:13:b5 via enp65s0f1
Jan 21 15:41:10 ipu-host-2 dhcpd[3474]: DHCPACK on 10.1.2.1 to 70:69:79:20:13:b5 via enp65s0f1

NTP Server Service

Installation

NTP service is recommended to provide network time configuration to IPU-M2000 systems. It can be installed from the Ubuntu repositories:

root@ipu-host-2:~# apt-get install ntp

Configuration

The etc/ntp.conf file details the configuration for the NTP server; it should look something like this:

root@ipu-host-2:~# grep -v "#" /etc/ntp.conf 

driftfile /var/lib/ntp/ntp.drift

leapfile /usr/share/zoneinfo/leap-seconds.list

includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys
fudge 127.127.1.0 stratum 10
pool 0.ubuntu.pool.ntp.org iburst
pool 1.ubuntu.pool.ntp.org iburst
pool 2.ubuntu.pool.ntp.org iburst
pool 3.ubuntu.pool.ntp.org iburst
pool ntp.ubuntu.com

restrict 127.0.0.1
restrict ::1
restrict source notrap nomodify noquery

Start and enable the NTP Service

Start the NTP service with:

root@ipu-host-2:~# systemctl start ntp

Enable the NTP service with:

root@ipu-host-2:~# systemctl enable ntp

Check the status of the NTP service with:

root@ipu-host-2:~# systemctl status ntp
● ntp.service - Network Time Service
   Loaded: loaded (/lib/systemd/system/ntp.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:ntpd(8)

Syslog

Syslog is a software utility for forwarding log messages in an IP network.

Configuration

The /etc/rsyslog.conf file should look like this:

root@ipu-host-2:~# grep "^[^#;]" /etc/rsyslog.conf 
module(load="imuxsock") # provides support for local system logging
module(load="imudp")
input(type="imudp" port="514")
module(load="imtcp")
input(type="imtcp" port="514")
module(load="imklog" permitnonkernelfacility="on")
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$RepeatedMsgReduction on
$FileOwner syslog
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022
$PrivDropToUser syslog
$PrivDropToGroup syslog
$WorkDirectory /var/spool/rsyslog
$IncludeConfig /etc/rsyslog.d/*.conf

The /etc/rsyslog.d/99_ipum.conf file should look like this:

root@ipu-host-2:~# grep "^[^#;]" /etc/rsyslog.d/99_ipum.conf 
$template precise,"%fromhost-ip%,%HOSTNAME%,%syslogpriority%,%syslogfacility%,%timegenerated::fulltime%,%syslogtag%,%msg%\n"
	:HOSTNAME, contains, "ipum" /var/log/ipulogs/ipulogs;precise
	& ~

The /etc/rsyslog.d/99_dhcpd.conf file should look like this:

root@ipu-host-2:~# grep "^[^#;]" /etc/rsyslog.d/99_dhcpd.conf
local7.* /var/log/dhcpd.log

Graphcore Software Installation

The following Graphcore software packages need to be installed on the server:

  • V-IPU server contains management and control software for IPU resource control, built-in self-test (BIST) and monitoring of the IPU-M2000s and IPUs. There is a V-IPU Admin Guide and a V-IPU User Guide available.
  • IPU-M2000 system software contains the latest IPU-M2000 resident software for update, if required. It also includes the server resident tool rack_tool which is required for updating the IPU-M2000s resident software and testing the system hardware.

V-IPU server installation

Both the release notes and the V-IPU software release tarball are available from the Graphcore download portal https://downloads.graphcore.ai

An installation script called install.sh is included with the V-IPU tarball. The installation script has been tested and verified to work with Ubuntu and CentOS distros that use systemd as the default service manager. The installation script needs to be executed with root privileges (sudo ./install.sh) as it copies the vipu-server, vipu-admin and vipu binaries to /usr/local/bin.

Ensure you are logged in as ipuuser

The script will configure and start vipu-server.service.

In the following example, the system is cabled according to the standard instructions where “enp65s0f0” is the host server interface that connects to the top IPU-M2000 management port at the top of the stack.

ipuuser@ipu-host-2:~$ sudo ./install.sh
Do you want to start the vipu-server as a service in this host? Note that you should have vipu-server running only in one host and use vipu/vipu-admin to connect to it from all other hosts.

(N/y) y

Choose an interface to use for agent auto-discovery: eno0

enp65s0f0
enp65s0f1
lo

Enter disable to deactivate the auto-discovery
Which interface should be used for auto-discovery? enp65s0f0

- vipu-server will be configured to be run as a service in this host
- Initialising /etc/vipu/config.hcl

IPU-M2000 Server Software Installation

The IPU-M2000 system software contains a set of upgradable software and FPGA sub- components that are targeted to be executed on the IPU-M2000 units. The release also contains the tool rack_tool which is used for the software upgrade and other rack related tasks targeting the IPU-M2000s.

Ensure you are logged in as ipuuser

Go to the Graphcore download portal https://downloads.graphcore.ai and download the latest release into your home directory

Unpack the tarball:

ipuuser@ipu-host-2:~$ tar xvf IPU_M_SW-2.0.0.rc.3.tar

Install the software/tools:

ipuuser@ipu-host-2:~$ cd ~/IPU_M_SW-2.0.0-rc.3+a51e75a/maintenance_tools/
ipuuser@ipu-host-2:~$ ./install.sh

rack_tool Configuration

rack_tool requires a config file which contains information on all the IPU-M2000s it will control. The information in the config file defines all IP addresses of the BMC, GW and RNIC interfaces.
Create a directory for the configuration file:

ipuuser@ipu-host-2:~$ mkdir -p $HOME/.rack_tool

Your /home/ipuuser/.rack_tool/rack_config.json should look something like this:

ipuuser@ipu-host-2:~$ cat /home/ipuuser/.rack_tool/rack_config.json 
{
    "global_credentials": {
        "bmc_username": "root",
        "bmc_passwd": "0penBmc",
        "gw_username": "itadmin",
        "gw_passwd": "ChangeMeFdh5P"
    },
    "gw_root_overlay": "/home/ipuuser/IPU-M_releases/IPU_M_SW-2.0.0-rc.3+a51e75a/maintenance_tools/ipu_pod_config/root-overlay/",
    "machines": [
        {
            "name": "m01",
            "bmc_ip": "10.1.1.1",
            "gw_ip": "10.1.2.1",
            "mx_ip": "10.1.5.2"
        }
    ]
}

Copy root-overlay file system

A root-overlay file system is used to pass configuration of the NTP and syslog into the IPU- M2000 software. The rack_config.json file above refers to the path of these files. The path is either relative to the location of the rack_config.json or an absolute path. The easiest is to copy over the files to the default location:

ipuuser@ipu-host-2:~$ cd /home/ipuuser/IPU-M_releases/IPU_M_SW-2.0.0-rc.3+a51e75a/maintenance_tools/ipu_pod_config
ipuuser@ipu-host-2:~$ cp -r root-overlay /home/ipuuser/.rack_tool/