Lustre: General steps for debugging lustre (IEEL) problems

In this situation we had the following Lustre setup;

2x MDS nodes in a HA configuration
4x OSS nodes not in a HA configuration (direct attached storage)

Verify Network Connectivity

Can the systems ping one another?
Can the client ping all the of the lustre nodes?
Check the LNET (Lustre Network) to ensure networking is working on all nodes also.

# check the IPs are reported correctly on each node
[root@lustre01-mds1 ~]# lctl list_nids 
10.10.17.193@tcp
[root@lustre02-mds1 ~]# lctl list_nids
10.10.17.194@tcp

# Can we ping through LNET/lctl
[root@lustre02-mds1 ~]# lctl ping 10.10.17.194
12345-0@lo
12345-10.10.17.194@tcp
[root@lustre02-mds1 ~]# lctl ping 10.10.17.195
failed to ping 10.10.17.195@tcp: Input/output error
# note .195 doesnt exist on the fabric so the above is just to demonstrate the output to expect

Check the disks / arrays are reported and mounted

Verify the RAID arrays are being reported correctly and healthy (using the LSI storcli utility)
Depending on where StorCli was installed and if its setup in your $PATH, the commands below may need to be updated.

# check everything the controller reports. (LOT of output)
/usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 show all

# check the drives and their status
/usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show

# Note
# their state should be ONLINE 
#

# check if there are any rebuilds in place 
/usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0  /eall /sall show rebuild

Verify that the MDT / MGT and OST are all mounted on the systems

[root@lustre01-mds1 ~]# df -h | grep lustre 
/dev/sda              9.5G   24M  9.0G   1% /lustre/mgt
/dev/sdc              1.3T   92M  1.2T   1% /lustre/lfs2-mdt
/dev/sdb              1.3T   92M  1.2T   1% /lustre/lfs1-mdt
[root@lustre02-oss1 ~]# df -h | grep lustre 
/dev/sdb               59T   27G   56T   1% /lustre/lfs2-ost00
[root@lustre02-oss2 ~]# df -h | grep lustre 
/dev/sdb               59T   31G   56T   1% /lustre/lfs2-ost01

Check on the (active) MDS(s) that all the OSTs are connected OK. All devices should report as 'UP'

[root@lustre01-mds1 ~]# lctl dl 
  0 UP osd-ldiskfs lfs2-MDT0000-osd lfs2-MDT0000-osd_UUID 11
  1 UP osd-ldiskfs lfs1-MDT0000-osd lfs1-MDT0000-osd_UUID 11
  2 UP osd-ldiskfs MGS-osd MGS-osd_UUID 5
  3 UP mgs MGS MGS 25
  4 UP mgc MGC10.10.17.193@tcp 540e2380-c858-82c0-c76c-81562f403c2e 5
  5 UP mds MDS MDS_uuid 3
  6 UP lod lfs2-MDT0000-mdtlov lfs2-MDT0000-mdtlov_UUID 4
  7 UP mdt lfs2-MDT0000 lfs2-MDT0000_UUID 17
  8 UP mdd lfs2-MDD0000 lfs2-MDD0000_UUID 4
  9 UP qmt lfs2-QMT0000 lfs2-QMT0000_UUID 4
 10 UP osp lfs2-OST0000-osc-MDT0000 lfs2-MDT0000-mdtlov_UUID 5
 11 UP osp lfs2-OST0001-osc-MDT0000 lfs2-MDT0000-mdtlov_UUID 5
 12 UP osp lfs2-OST0002-osc-MDT0000 lfs2-MDT0000-mdtlov_UUID 5
 13 UP osp lfs2-OST0003-osc-MDT0000 lfs2-MDT0000-mdtlov_UUID 5
 14 UP lwp lfs2-MDT0000-lwp-MDT0000 lfs2-MDT0000-lwp-MDT0000_UUID 5
 15 UP lod lfs1-MDT0000-mdtlov lfs1-MDT0000-mdtlov_UUID 4
 16 UP mdt lfs1-MDT0000 lfs1-MDT0000_UUID 17
 17 UP mdd lfs1-MDD0000 lfs1-MDD0000_UUID 4
 18 UP qmt lfs1-QMT0000 lfs1-QMT0000_UUID 4
 19 UP osp lfs1-OST0000-osc-MDT0000 lfs1-MDT0000-mdtlov_UUID 5
 20 UP osp lfs1-OST0001-osc-MDT0000 lfs1-MDT0000-mdtlov_UUID 5
 21 UP osp lfs1-OST0002-osc-MDT0000 lfs1-MDT0000-mdtlov_UUID 5
 22 UP osp lfs1-OST0003-osc-MDT0000 lfs1-MDT0000-mdtlov_UUID 5
 23 UP lwp lfs1-MDT0000-lwp-MDT0000 lfs1-MDT0000-lwp-MDT0000_UUID 5

Mount on a client system and verify all targets are accessible

[root@iml ~]# df -h
# just to confirm the lustre fs is mounted ok 
[root@iml ~]# df -h | grep lfs
10.10.17.193@tcp0:10.10.17.194@tcp0:/lfs1
                      233T  4.1T  218T   2% /mnt/lfs1
10.10.17.193@tcp0:10.10.17.194@tcp0:/lfs2
                      233T  5.2T  216T   3% /mnt/lfs2

# use the lfs tool to check the individal targets and confirm they are accessible (they will be reported as inaccessble if down)
[root@iml ~]# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
lfs1-MDT0000_UUID           1.2T      347.8M        1.1T   0% /mnt/lfs1[MDT:0]
lfs1-OST0000_UUID          58.2T      198.6G       55.1T   0% /mnt/lfs1[OST:0]
lfs1-OST0001_UUID          58.2T      161.3G       55.1T   0% /mnt/lfs1[OST:1]
lfs1-OST0002_UUID          58.2T      186.8G       55.1T   0% /mnt/lfs1[OST:2]
lfs1-OST0003_UUID          58.2T        3.5T       51.7T   6% /mnt/lfs1[OST:3]

filesystem summary:       232.8T        4.1T      217.0T   2% /mnt/lfs1

UUID                       bytes        Used   Available Use% Mounted on
lfs2-MDT0000_UUID           1.2T      349.8M        1.1T   0% /mnt/lfs2[MDT:0]
lfs2-OST0000_UUID          58.2T      175.8G       55.1T   0% /mnt/lfs2[OST:0]
lfs2-OST0001_UUID          58.2T        1.3T       54.0T   2% /mnt/lfs2[OST:1]
lfs2-OST0002_UUID          58.2T      187.4G       55.1T   0% /mnt/lfs2[OST:2]
lfs2-OST0003_UUID          58.2T        3.5T       51.8T   6% /mnt/lfs2[OST:3]

filesystem summary:       232.8T        5.2T      215.9T   2% /mnt/lfs2

File system freezes

When commands like df seem to freeze - this can mean that one of the tagets is not accessible

[root@iml ~]# df -h
^C
[root@iml ~]# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
lfs1-MDT0000_UUID           1.2T       91.3M        1.1T   0% /mnt/lfs1[MDT:0]
lfs1-OST0000_UUID          58.2T      118.2M       55.3T   0% /mnt/lfs1[OST:0]
lfs1-OST0001_UUID          58.2T       79.2M       55.3T   0% /mnt/lfs1[OST:1]
lfs1-OST0002_UUID          58.2T       79.2M       55.3T   0% /mnt/lfs1[OST:2]
lfs1-OST0003_UUID          58.2T      118.2M       55.3T   0% /mnt/lfs1[OST:3]

filesystem summary:       232.8T      394.7M      221.1T   0% /mnt/lfs1

UUID                       bytes        Used   Available Use% Mounted on
lfs2-MDT0000_UUID           1.2T       91.3M        1.1T   0% /mnt/lfs2[MDT:0]
OST0000             : inactive device
lfs2-OST0001_UUID          58.2T       30.1G       55.3T   0% /mnt/lfs2[OST:1]
lfs2-OST0002_UUID          58.2T       25.1G       55.3T   0% /mnt/lfs2[OST:2]
lfs2-OST0003_UUID          58.2T       30.1G       55.3T   0% /mnt/lfs2[OST:3]

filesystem summary:       174.6T       85.3G      165.8T   0% /mnt/lfs2

Above we can see that OST0000 is offline (which is due in this instance to a ping issue, but we'd step through the process of;

Check network connectivity
Check the disks / RAID arrays are OK.
Is the OST mounted on the system
Are the lustre modules inserted correcrly (lsmod / dmesg /var/log/messages for ERRORs)
A reboot can sometimes help if all the above is ok.

Example process for replacing drives

In this scenario we ended up with 4x UBAD drives and 1x UGOOD which was a replacement drive inserted. (If a disk is improperly removed then re-attached to the RAID controller, it will be recognised as UBAD (Unconfigured Bad). This does not mean the drive is bad but means the configuration state is (or both) trying to re-attach it if the disk you are re-connecting is new or was working should have no negative effect but before using it you need to change it to good)

# get the IDs of the UBAD drives / IDs are reported as 4:16 which represents the controller enclosure and slot. 
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show | grep UBad
4:16     21 UBad   -   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:17     22 UBad   -   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:18     23 UBad   -   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:19     24 UBad   -   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U 

# The enclosure above is 4 and slots 16-19. So we set the disks to GOOD using /e4 and /s16 etc
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /e4 /s16 set good
Controller = 0
Status = Success
Description = Set Drive Good Succeeded.

# repeat for other disks 
storcli64 /c0 /e4 /s17 set good
storcli64 /c0 /e4 /s18 set good
storcli64 /c0 /e4 /s19 set good

# these drives will now be set as foreign - so lets check and see 
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /fall show
Controller = 0
Status = Success
Description = Operation on foreign configuration Succeeded


FOREIGN CONFIGURATION :
=====================

----------------------------------------
DG EID:Slot Type  State      Size NoVDs 
----------------------------------------
 0 -        RAID6 Frgn  58.210 TB     1 
----------------------------------------

NoVDs - Number of VDs in disk group|DG - Diskgroup
Total foreign drive groups = 1

# Now lets import the old configuration to restore things to the way they were
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /fall import 
Controller = 0
Status = Success
Description = Successfully imported foreign configuration

# and we check the output again to see the state of the drives
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive Information :
=================

------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                 Sp 
------------------------------------------------------------------------------
4:0       5 Onln   0 233.312 GB SATA HDD N   N  512B WDC WD2503ABYZ-011FA0 U  
4:1       6 Onln   0 233.312 GB SATA HDD N   N  512B WDC WD2503ABYZ-011FA0 U  
4:2       7 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:3       8 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:4       9 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:5      10 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:6      11 Rbld   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:7      12 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:8      13 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:9      14 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:10     15 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:11     16 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:12     17 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:13     18 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:14     19 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:15     20 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:16     21 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:17     22 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:18     23 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:19     24 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
------------------------------------------------------------------------------

# Note our UGOOD drive which was replaced on site is now in a rebuild state. 
# Lets check the rebuild state of the system. 
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show rebuild 
Controller = 0
Status = Success
Description = Show Drive Rebuild Status Succeeded.


---------------------------------------------------------------
Drive-ID   Progress% Status          Estimated Time Left       
---------------------------------------------------------------
/c0/e4/s0  -         Not in progress -                         
/c0/e4/s1  -         Not in progress -                         
/c0/e4/s2  -         Not in progress -                         
/c0/e4/s3  -         Not in progress -                         
/c0/e4/s4  -         Not in progress -                         
/c0/e4/s5  -         Not in progress -                         
/c0/e4/s6  0         In progress     1 Days 0 Hours 28 Minutes 
/c0/e4/s7  -         Not in progress -                         
/c0/e4/s8  -         Not in progress -                         
/c0/e4/s9  -         Not in progress -                         
/c0/e4/s10 -         Not in progress -                         
/c0/e4/s11 -         Not in progress -                         
/c0/e4/s12 -         Not in progress -                         
/c0/e4/s13 -         Not in progress -                         
/c0/e4/s14 -         Not in progress -                         
/c0/e4/s15 -         Not in progress -                         
/c0/e4/s16 -         Not in progress -                         
/c0/e4/s17 -         Not in progress -                         
/c0/e4/s18 -         Not in progress -                         
/c0/e4/s19 -         Not in progress -                         
---------------------------------------------------------------



# Or focus in on the drive in question to get a rebuild time. 
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /e4 /s6 show rebuild 
Controller = 0
Status = Success
Description = Show Drive Rebuild Status Succeeded.


----------------------------------------------------
Drive-ID  Progress% Status      Estimated Time Left 
----------------------------------------------------
/c0/e4/s6         0 In progress 22 Hours 36 Minutes 
----------------------------------------------------

Lustre: General steps for debugging lustre (IEEL) problems

Contents

Verify Network Connectivity

Check the disks / arrays are reported and mounted

File system freezes

Example process for replacing drives

Navigation menu

Search