Lustre: Problems with df on lustre clients

df hangs because OST is not accessible

This will occur when OSTs are offline or inactive (in this instance they were added through IML and then removed, but not removed fully.

[root@hyalite ~]# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
lustrefs-MDT0000_UUID        1.2T       14.6G        1.1T   1% /mnt/lustrefs[MDT:0]
lustrefs-OST0000_UUID       36.4T        7.5T       27.0T  22% /mnt/lustrefs[OST:0]
lustrefs-OST0001_UUID       36.4T        8.2T       26.4T  24% /mnt/lustrefs[OST:1]
lustrefs-OST0002_UUID       36.4T        7.2T       27.3T  21% /mnt/lustrefs[OST:2]
lustrefs-OST0003_UUID       36.4T        8.0T       26.5T  23% /mnt/lustrefs[OST:3]
lustrefs-OST0004_UUID       36.4T        6.8T       27.8T  20% /mnt/lustrefs[OST:4]
lustrefs-OST0005_UUID       36.4T        6.6T       28.0T  19% /mnt/lustrefs[OST:5]
lustrefs-OST0006_UUID       36.4T        5.1T       29.4T  15% /mnt/lustrefs[OST:6]
lustrefs-OST0007_UUID       36.4T        5.8T       28.7T  17% /mnt/lustrefs[OST:7]
lustrefs-OST0008_UUID       54.6T      168.5G       51.7T   0% /mnt/lustrefs[OST:8]
lustrefs-OST0009_UUID       54.6T      146.9G       51.7T   0% /mnt/lustrefs[OST:9]
OST000a             : inactive device
OST000b             : inactive device

filesystem summary:       400.1T       55.6T      324.4T  15% /mnt/lustrefs

* It’ll be because of the inactive devices, to correct this; 
* Run; 
<syntaxhighlight>
 lctl set_param osc.lustrefs-OST000a-*.active=0
 lctl set_param osc.lustrefs-OST000b-*.active=0

And its worked ok again

[root@hyalite ~]#  lctl set_param osc.lustrefs-OST000a-*.active=0
osc.lustrefs-OST000a-osc-ffff881070ee7000.active=0
[root@hyalite ~]#  lctl set_param osc.lustrefs-OST000b-*.active=0
osc.lustrefs-OST000b-osc-ffff881070ee7000.active=0
[root@hyalite ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md126            867G  365G  458G  45% /
tmpfs                  32G   76K   32G   1% /dev/shm
/dev/md127            496M   27M  444M   6% /boot
/dev/md125            7.9G  152M  7.4G   2% /tmp
/dev/md123             16G  3.1G   12G  21% /var
/dev/md122            9.9G  501M  8.9G   6% /var/lib/mysql/cmdaemon_mon
172.23.19.42@tcp1:172.23.19.41@tcp1:/lustrefs
                      401T   56T  325T  15% /mnt/lustrefs

Another command to check out the devices within lustre is:

[root@hyalite ~]# lctl dl 
  0 UP mgc MGC172.23.19.42@tcp1 0d48eca7-fb5f-d53f-3bee-e6b1a6745dcc 5
  1 UP lov lustrefs-clilov-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 4
  2 UP lmv lustrefs-clilmv-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 4
  3 UP mdc lustrefs-MDT0000-mdc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
  4 UP osc lustrefs-OST0000-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
  5 UP osc lustrefs-OST0002-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
  6 UP osc lustrefs-OST0003-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
  7 UP osc lustrefs-OST0001-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
  8 UP osc lustrefs-OST0005-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
  9 UP osc lustrefs-OST0007-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
 10 UP osc lustrefs-OST0004-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
 11 UP osc lustrefs-OST0006-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
 12 UP osc lustrefs-OST0009-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
 13 UP osc lustrefs-OST0008-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
 14 UP osc lustrefs-OST000a-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
 15 UP osc lustrefs-OST000b-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5

ptlrpcd_rcv loop CPU 100%

Seems related to: https://jira.hpdd.intel.com/browse/LU-5787
Drop caches to resolve; *NOTE* Only do this when no users are running jobs - something wierd happened when we did this on a compute node with a user job. Clear jobs first then drop caches

    lctl set_param ldlm.namespaces.*.lru_size=clear
    or
    echo 1 > /proc/sys/vm/drop_caches

Lustre: Problems with df on lustre clients

df hangs because OST is not accessible

ptlrpcd_rcv loop CPU 100%

Navigation menu

Search