ROCm Installation and Setup, tooling and info

From Define Wiki
Jump to navigation Jump to search

Show theoretical bandwidth between gpus

root@gpu1:~# rocm-smi --shownodesbw


============================ ROCm System Management Interface ============================
======================================= Bandwidth ========================================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7
GPU0   N/A          50000-200000 50000-50000  0-0          0-0          0-0          50000-100000 0-0
GPU1   50000-200000 N/A          0-0          50000-50000  0-0          50000-50000  0-0          0-0
GPU2   50000-50000  0-0          N/A          50000-200000 50000-100000 0-0          0-0          0-0
GPU3   0-0          50000-50000  50000-200000 N/A          0-0          0-0          0-0          50000-50000
GPU4   0-0          0-0          50000-100000 0-0          N/A          50000-200000 50000-50000  0-0
GPU5   0-0          50000-50000  0-0          0-0          50000-200000 N/A          0-0          50000-50000
GPU6   50000-100000 0-0          0-0          0-0          50000-50000  0-0          N/A          50000-200000
GPU7   0-0          0-0          0-0          50000-50000  0-0          50000-50000  50000-200000 N/A
Format: min-max; Units: mps
"0-0" min-max bandwidth indicates devices are not connected directly
================================== End of ROCm SMI Log ===================================

Check numa setup

root@gpu1:~# rocm-smi --showtoponuma


============================ ROCm System Management Interface ============================
======================================= Numa Nodes =======================================
GPU[0]		: (Topology) Numa Node: 3
GPU[0]		: (Topology) Numa Affinity: 3
GPU[1]		: (Topology) Numa Node: 3
GPU[1]		: (Topology) Numa Affinity: 3
GPU[2]		: (Topology) Numa Node: 2
GPU[2]		: (Topology) Numa Affinity: 2
GPU[3]		: (Topology) Numa Node: 2
GPU[3]		: (Topology) Numa Affinity: 2
GPU[4]		: (Topology) Numa Node: 7
GPU[4]		: (Topology) Numa Affinity: 7
GPU[5]		: (Topology) Numa Node: 7
GPU[5]		: (Topology) Numa Affinity: 7
GPU[6]		: (Topology) Numa Node: 6
GPU[6]		: (Topology) Numa Affinity: 6
GPU[7]		: (Topology) Numa Node: 6
GPU[7]		: (Topology) Numa Affinity: 6
================================== End of ROCm SMI Log ===================================


Check the GPU Arch supported

# arch for the mi210 is gfx90a 
# use rocminfo

david@amin-dev-mi210:~$ rocminfo | grep gfx
  Name:                    gfx90a
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
  Name:                    gfx90a
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-