Lustre: General steps for debugging lustre (IEEL) problems
Jump to navigation
Jump to search
In this situation we had the following Lustre setup;
- 2x MDS nodes in a HA configuration
- 4x OSS nodes not in a HA configuration (direct attached storage)
Verify Network Connectivity
- Can the systems ping one another?
- Can the client ping all the of the lustre nodes?
- Check the LNET (Lustre Network) to ensure networking is working on all nodes also.
# check the IPs are reported correctly on each node
[root@lustre01-mds1 ~]# lctl list_nids
10.10.17.193@tcp
[root@lustre02-mds1 ~]# lctl list_nids
10.10.17.194@tcp
# Can we ping through LNET/lctl
[root@lustre02-mds1 ~]# lctl ping 10.10.17.194
12345-0@lo
12345-10.10.17.194@tcp
[root@lustre02-mds1 ~]# lctl ping 10.10.17.195
failed to ping 10.10.17.195@tcp: Input/output error
# note .195 doesnt exist on the fabric so the above is just to demonstrate the output to expectCheck the disks / arrays are reported and mounted
- Verify the RAID arrays are being reported correctly and healthy (using the LSI storcli utility)
- Depending on where StorCli was installed and if its setup in your $PATH, the commands below may need to be updated.
# check everything the controller reports. (LOT of output)
/usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 show all
# check the drives and their status
/usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show
# Note
# their state should be ONLINE
#
# check if there are any rebuilds in place
/usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show rebuild- Verify that the MDT / MGT and OST are all mounted on the systems
[root@lustre01-mds1 ~]# df -h | grep lustre
/dev/sda 9.5G 24M 9.0G 1% /lustre/mgt
/dev/sdc 1.3T 92M 1.2T 1% /lustre/lfs2-mdt
/dev/sdb 1.3T 92M 1.2T 1% /lustre/lfs1-mdt
[root@lustre02-oss1 ~]# df -h | grep lustre
/dev/sdb 59T 27G 56T 1% /lustre/lfs2-ost00
[root@lustre02-oss2 ~]# df -h | grep lustre
/dev/sdb 59T 31G 56T 1% /lustre/lfs2-ost01- Check on the (active) MDS(s) that all the OSTs are connected OK. All devices should report as 'UP'
[root@lustre01-mds1 ~]# lctl dl
0 UP osd-ldiskfs lfs2-MDT0000-osd lfs2-MDT0000-osd_UUID 11
1 UP osd-ldiskfs lfs1-MDT0000-osd lfs1-MDT0000-osd_UUID 11
2 UP osd-ldiskfs MGS-osd MGS-osd_UUID 5
3 UP mgs MGS MGS 25
4 UP mgc MGC10.10.17.193@tcp 540e2380-c858-82c0-c76c-81562f403c2e 5
5 UP mds MDS MDS_uuid 3
6 UP lod lfs2-MDT0000-mdtlov lfs2-MDT0000-mdtlov_UUID 4
7 UP mdt lfs2-MDT0000 lfs2-MDT0000_UUID 17
8 UP mdd lfs2-MDD0000 lfs2-MDD0000_UUID 4
9 UP qmt lfs2-QMT0000 lfs2-QMT0000_UUID 4
10 UP osp lfs2-OST0000-osc-MDT0000 lfs2-MDT0000-mdtlov_UUID 5
11 UP osp lfs2-OST0001-osc-MDT0000 lfs2-MDT0000-mdtlov_UUID 5
12 UP osp lfs2-OST0002-osc-MDT0000 lfs2-MDT0000-mdtlov_UUID 5
13 UP osp lfs2-OST0003-osc-MDT0000 lfs2-MDT0000-mdtlov_UUID 5
14 UP lwp lfs2-MDT0000-lwp-MDT0000 lfs2-MDT0000-lwp-MDT0000_UUID 5
15 UP lod lfs1-MDT0000-mdtlov lfs1-MDT0000-mdtlov_UUID 4
16 UP mdt lfs1-MDT0000 lfs1-MDT0000_UUID 17
17 UP mdd lfs1-MDD0000 lfs1-MDD0000_UUID 4
18 UP qmt lfs1-QMT0000 lfs1-QMT0000_UUID 4
19 UP osp lfs1-OST0000-osc-MDT0000 lfs1-MDT0000-mdtlov_UUID 5
20 UP osp lfs1-OST0001-osc-MDT0000 lfs1-MDT0000-mdtlov_UUID 5
21 UP osp lfs1-OST0002-osc-MDT0000 lfs1-MDT0000-mdtlov_UUID 5
22 UP osp lfs1-OST0003-osc-MDT0000 lfs1-MDT0000-mdtlov_UUID 5
23 UP lwp lfs1-MDT0000-lwp-MDT0000 lfs1-MDT0000-lwp-MDT0000_UUID 5- Mount on a client system and verify all targets are accessible
[root@iml ~]# df -h
# just to confirm the lustre fs is mounted ok
[root@iml ~]# df -h | grep lfs
10.10.17.193@tcp0:10.10.17.194@tcp0:/lfs1
233T 4.1T 218T 2% /mnt/lfs1
10.10.17.193@tcp0:10.10.17.194@tcp0:/lfs2
233T 5.2T 216T 3% /mnt/lfs2
# use the lfs tool to check the individal targets and confirm they are accessible (they will be reported as inaccessble if down)
[root@iml ~]# lfs df -h
UUID bytes Used Available Use% Mounted on
lfs1-MDT0000_UUID 1.2T 347.8M 1.1T 0% /mnt/lfs1[MDT:0]
lfs1-OST0000_UUID 58.2T 198.6G 55.1T 0% /mnt/lfs1[OST:0]
lfs1-OST0001_UUID 58.2T 161.3G 55.1T 0% /mnt/lfs1[OST:1]
lfs1-OST0002_UUID 58.2T 186.8G 55.1T 0% /mnt/lfs1[OST:2]
lfs1-OST0003_UUID 58.2T 3.5T 51.7T 6% /mnt/lfs1[OST:3]
filesystem summary: 232.8T 4.1T 217.0T 2% /mnt/lfs1
UUID bytes Used Available Use% Mounted on
lfs2-MDT0000_UUID 1.2T 349.8M 1.1T 0% /mnt/lfs2[MDT:0]
lfs2-OST0000_UUID 58.2T 175.8G 55.1T 0% /mnt/lfs2[OST:0]
lfs2-OST0001_UUID 58.2T 1.3T 54.0T 2% /mnt/lfs2[OST:1]
lfs2-OST0002_UUID 58.2T 187.4G 55.1T 0% /mnt/lfs2[OST:2]
lfs2-OST0003_UUID 58.2T 3.5T 51.8T 6% /mnt/lfs2[OST:3]
filesystem summary: 232.8T 5.2T 215.9T 2% /mnt/lfs2File system freezes
- When commands like df seem to freeze - this can mean that one of the tagets is not accessible
[root@iml ~]# df -h
^C
[root@iml ~]# lfs df -h
UUID bytes Used Available Use% Mounted on
lfs1-MDT0000_UUID 1.2T 91.3M 1.1T 0% /mnt/lfs1[MDT:0]
lfs1-OST0000_UUID 58.2T 118.2M 55.3T 0% /mnt/lfs1[OST:0]
lfs1-OST0001_UUID 58.2T 79.2M 55.3T 0% /mnt/lfs1[OST:1]
lfs1-OST0002_UUID 58.2T 79.2M 55.3T 0% /mnt/lfs1[OST:2]
lfs1-OST0003_UUID 58.2T 118.2M 55.3T 0% /mnt/lfs1[OST:3]
filesystem summary: 232.8T 394.7M 221.1T 0% /mnt/lfs1
UUID bytes Used Available Use% Mounted on
lfs2-MDT0000_UUID 1.2T 91.3M 1.1T 0% /mnt/lfs2[MDT:0]
OST0000 : inactive device
lfs2-OST0001_UUID 58.2T 30.1G 55.3T 0% /mnt/lfs2[OST:1]
lfs2-OST0002_UUID 58.2T 25.1G 55.3T 0% /mnt/lfs2[OST:2]
lfs2-OST0003_UUID 58.2T 30.1G 55.3T 0% /mnt/lfs2[OST:3]
filesystem summary: 174.6T 85.3G 165.8T 0% /mnt/lfs2- Above we can see that OST0000 is offline (which is due in this instance to a ping issue, but we'd step through the process of;
- Check network connectivity
- Check the disks / RAID arrays are OK.
- Is the OST mounted on the system
- Are the lustre modules inserted correcrly (lsmod / dmesg /var/log/messages for ERRORs)
- A reboot can sometimes help if all the above is ok.
Example process for replacing drives
- In this scenario we ended up with 4x UBAD drives and 1x UGOOD which was a replacement drive inserted. (If a disk is improperly removed then re-attached to the RAID controller, it will be recognised as UBAD (Unconfigured Bad). This does not mean the drive is bad but means the configuration state is (or both) trying to re-attach it if the disk you are re-connecting is new or was working should have no negative effect but before using it you need to change it to good)
# get the IDs of the UBAD drives / IDs are reported as 4:16 which represents the controller enclosure and slot.
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show | grep UBad
4:16 21 UBad - 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:17 22 UBad - 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:18 23 UBad - 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:19 24 UBad - 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
# The enclosure above is 4 and slots 16-19. So we set the disks to GOOD using /e4 and /s16 etc
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /e4 /s16 set good
Controller = 0
Status = Success
Description = Set Drive Good Succeeded.
# repeat for other disks
storcli64 /c0 /e4 /s17 set good
storcli64 /c0 /e4 /s18 set good
storcli64 /c0 /e4 /s19 set good
# these drives will now be set as foreign - so lets check and see
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /fall show
Controller = 0
Status = Success
Description = Operation on foreign configuration Succeeded
FOREIGN CONFIGURATION :
=====================
----------------------------------------
DG EID:Slot Type State Size NoVDs
----------------------------------------
0 - RAID6 Frgn 58.210 TB 1
----------------------------------------
NoVDs - Number of VDs in disk group|DG - Diskgroup
Total foreign drive groups = 1
# Now lets import the old configuration to restore things to the way they were
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /fall import
Controller = 0
Status = Success
Description = Successfully imported foreign configuration
# and we check the output again to see the state of the drives
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.
Drive Information :
=================
------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
------------------------------------------------------------------------------
4:0 5 Onln 0 233.312 GB SATA HDD N N 512B WDC WD2503ABYZ-011FA0 U
4:1 6 Onln 0 233.312 GB SATA HDD N N 512B WDC WD2503ABYZ-011FA0 U
4:2 7 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:3 8 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:4 9 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:5 10 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:6 11 Rbld 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:7 12 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:8 13 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:9 14 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:10 15 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:11 16 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:12 17 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:13 18 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:14 19 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:15 20 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:16 21 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:17 22 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:18 23 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
4:19 24 Onln 1 3.637 TB SATA HDD N N 512B HGST HUS724040ALA640 U
------------------------------------------------------------------------------
# Note our UGOOD drive which was replaced on site is now in a rebuild state.
# Lets check the rebuild state of the system.
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show rebuild
Controller = 0
Status = Success
Description = Show Drive Rebuild Status Succeeded.
---------------------------------------------------------------
Drive-ID Progress% Status Estimated Time Left
---------------------------------------------------------------
/c0/e4/s0 - Not in progress -
/c0/e4/s1 - Not in progress -
/c0/e4/s2 - Not in progress -
/c0/e4/s3 - Not in progress -
/c0/e4/s4 - Not in progress -
/c0/e4/s5 - Not in progress -
/c0/e4/s6 0 In progress 1 Days 0 Hours 28 Minutes
/c0/e4/s7 - Not in progress -
/c0/e4/s8 - Not in progress -
/c0/e4/s9 - Not in progress -
/c0/e4/s10 - Not in progress -
/c0/e4/s11 - Not in progress -
/c0/e4/s12 - Not in progress -
/c0/e4/s13 - Not in progress -
/c0/e4/s14 - Not in progress -
/c0/e4/s15 - Not in progress -
/c0/e4/s16 - Not in progress -
/c0/e4/s17 - Not in progress -
/c0/e4/s18 - Not in progress -
/c0/e4/s19 - Not in progress -
---------------------------------------------------------------
# Or focus in on the drive in question to get a rebuild time.
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /e4 /s6 show rebuild
Controller = 0
Status = Success
Description = Show Drive Rebuild Status Succeeded.
----------------------------------------------------
Drive-ID Progress% Status Estimated Time Left
----------------------------------------------------
/c0/e4/s6 0 In progress 22 Hours 36 Minutes
----------------------------------------------------