Lustre: Problems with df on lustre clients
Jump to navigation
Jump to search
df hangs because OST is not accessible
- This will occur when OSTs are offline or inactive (in this instance they were added through IML and then removed, but not removed fully.
[root@hyalite ~]# lfs df -h
UUID bytes Used Available Use% Mounted on
lustrefs-MDT0000_UUID 1.2T 14.6G 1.1T 1% /mnt/lustrefs[MDT:0]
lustrefs-OST0000_UUID 36.4T 7.5T 27.0T 22% /mnt/lustrefs[OST:0]
lustrefs-OST0001_UUID 36.4T 8.2T 26.4T 24% /mnt/lustrefs[OST:1]
lustrefs-OST0002_UUID 36.4T 7.2T 27.3T 21% /mnt/lustrefs[OST:2]
lustrefs-OST0003_UUID 36.4T 8.0T 26.5T 23% /mnt/lustrefs[OST:3]
lustrefs-OST0004_UUID 36.4T 6.8T 27.8T 20% /mnt/lustrefs[OST:4]
lustrefs-OST0005_UUID 36.4T 6.6T 28.0T 19% /mnt/lustrefs[OST:5]
lustrefs-OST0006_UUID 36.4T 5.1T 29.4T 15% /mnt/lustrefs[OST:6]
lustrefs-OST0007_UUID 36.4T 5.8T 28.7T 17% /mnt/lustrefs[OST:7]
lustrefs-OST0008_UUID 54.6T 168.5G 51.7T 0% /mnt/lustrefs[OST:8]
lustrefs-OST0009_UUID 54.6T 146.9G 51.7T 0% /mnt/lustrefs[OST:9]
OST000a : inactive device
OST000b : inactive device
filesystem summary: 400.1T 55.6T 324.4T 15% /mnt/lustrefs
* It’ll be because of the inactive devices, to correct this;
* Run;
<syntaxhighlight>
lctl set_param osc.lustrefs-OST000a-*.active=0
lctl set_param osc.lustrefs-OST000b-*.active=0- And its worked ok again
[root@hyalite ~]# lctl set_param osc.lustrefs-OST000a-*.active=0
osc.lustrefs-OST000a-osc-ffff881070ee7000.active=0
[root@hyalite ~]# lctl set_param osc.lustrefs-OST000b-*.active=0
osc.lustrefs-OST000b-osc-ffff881070ee7000.active=0
[root@hyalite ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md126 867G 365G 458G 45% /
tmpfs 32G 76K 32G 1% /dev/shm
/dev/md127 496M 27M 444M 6% /boot
/dev/md125 7.9G 152M 7.4G 2% /tmp
/dev/md123 16G 3.1G 12G 21% /var
/dev/md122 9.9G 501M 8.9G 6% /var/lib/mysql/cmdaemon_mon
172.23.19.42@tcp1:172.23.19.41@tcp1:/lustrefs
401T 56T 325T 15% /mnt/lustrefsAnother command to check out the devices within lustre is:
[root@hyalite ~]# lctl dl
0 UP mgc MGC172.23.19.42@tcp1 0d48eca7-fb5f-d53f-3bee-e6b1a6745dcc 5
1 UP lov lustrefs-clilov-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 4
2 UP lmv lustrefs-clilmv-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 4
3 UP mdc lustrefs-MDT0000-mdc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
4 UP osc lustrefs-OST0000-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
5 UP osc lustrefs-OST0002-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
6 UP osc lustrefs-OST0003-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
7 UP osc lustrefs-OST0001-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
8 UP osc lustrefs-OST0005-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
9 UP osc lustrefs-OST0007-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
10 UP osc lustrefs-OST0004-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
11 UP osc lustrefs-OST0006-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
12 UP osc lustrefs-OST0009-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
13 UP osc lustrefs-OST0008-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
14 UP osc lustrefs-OST000a-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5
15 UP osc lustrefs-OST000b-osc-ffff881070ee7000 32bcb3c7-3977-99f9-f3f4-0c1914ccec79 5ptlrpcd_rcv loop CPU 100%
- Seems related to: https://jira.hpdd.intel.com/browse/LU-5787
- Drop caches to resolve; *NOTE* Only do this when no users are running jobs - something wierd happened when we did this on a compute node with a user job. Clear jobs first then drop caches
echo 1 > /proc/sys/vm/drop_caches