Lustre: General steps for debugging lustre (IEEL) problems

From Define Wiki
Revision as of 08:48, 10 August 2016 by David (talk | contribs)
Jump to navigation Jump to search

In this situation we had the following Lustre setup;

  • 2x MDS nodes in a HA configuration
  • 4x OSS nodes not in a HA configuration (direct attached storage)

Verify Network Connectivity

  • Can the systems ping one another?
  • Can the client ping all the of the lustre nodes?
  • Check the LNET (Lustre Network) to ensure networking is working on all nodes also.
# check the IPs are reported correctly on each node
[root@lustre01-mds1 ~]# lctl list_nids 
10.10.17.193@tcp
[root@lustre02-mds1 ~]# lctl list_nids
10.10.17.194@tcp

# Can we ping through LNET/lctl
[root@lustre02-mds1 ~]# lctl ping 10.10.17.194
12345-0@lo
12345-10.10.17.194@tcp
[root@lustre02-mds1 ~]# lctl ping 10.10.17.195
failed to ping 10.10.17.195@tcp: Input/output error
# note .195 doesnt exist on the fabric so the above is just to demonstrate the output to expect

Check the disks / arrays are reported and mounted

  • Verify the RAID arrays are being reported correctly and healthy (using the LSI storcli utility)
  • Depending on where StorCli was installed and if its setup in your $PATH, the commands below may need to be updated.
# check everything the controller reports. (LOT of output)
/usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 show all

# check the drives and their status
/usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show

# Note
# their state should be ONLINE 
#

# check if there are any rebuilds in place 
/usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0  /eall /sall show rebuild
  • Verify that the MDT / MGT and OST are all mounted on the systems
[root@lustre01-mds1 ~]# df -h | grep lustre 
/dev/sda              9.5G   24M  9.0G   1% /lustre/mgt
/dev/sdc              1.3T   92M  1.2T   1% /lustre/lfs2-mdt
/dev/sdb              1.3T   92M  1.2T   1% /lustre/lfs1-mdt
[root@lustre02-oss1 ~]# df -h | grep lustre 
/dev/sdb               59T   27G   56T   1% /lustre/lfs2-ost00
[root@lustre02-oss2 ~]# df -h | grep lustre 
/dev/sdb               59T   31G   56T   1% /lustre/lfs2-ost01

File system freezes

  • When commands like df seem to free - this can mean that one of the tagets is not accessible
[root@iml ~]# df -h
^C
[root@iml ~]# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
lfs1-MDT0000_UUID           1.2T       91.3M        1.1T   0% /mnt/lfs1[MDT:0]
lfs1-OST0000_UUID          58.2T      118.2M       55.3T   0% /mnt/lfs1[OST:0]
lfs1-OST0001_UUID          58.2T       79.2M       55.3T   0% /mnt/lfs1[OST:1]
lfs1-OST0002_UUID          58.2T       79.2M       55.3T   0% /mnt/lfs1[OST:2]
lfs1-OST0003_UUID          58.2T      118.2M       55.3T   0% /mnt/lfs1[OST:3]

filesystem summary:       232.8T      394.7M      221.1T   0% /mnt/lfs1

UUID                       bytes        Used   Available Use% Mounted on
lfs2-MDT0000_UUID           1.2T       91.3M        1.1T   0% /mnt/lfs2[MDT:0]
OST0000             : inactive device
lfs2-OST0001_UUID          58.2T       30.1G       55.3T   0% /mnt/lfs2[OST:1]
lfs2-OST0002_UUID          58.2T       25.1G       55.3T   0% /mnt/lfs2[OST:2]
lfs2-OST0003_UUID          58.2T       30.1G       55.3T   0% /mnt/lfs2[OST:3]

filesystem summary:       174.6T       85.3G      165.8T   0% /mnt/lfs2
  • Above we can see that OST0000 is offline (which is due in this instance to a ping issue, but we'd step through the process of;
  1. Check network connectivity
  2. Check the disks / RAID arrays are OK.
  3. Is the OST mounted on the system
  4. Are the lustre modules inserted correcrly (lsmod / dmesg /var/log/messages for ERRORs)
  5. A reboot can sometimes help if all the above is ok.

Example process for replacing drives

  • In this scenario we ended up with 4x UBAD drives and 1x UGOOD which was a replacement drive inserted. (If a disk is improperly removed then re-attached to the RAID controller, it will be recognised as UBAD (Unconfigured Bad). This does not mean the drive is bad but means the configuration state is (or both) trying to re-attach it if the disk you are re-connecting is new or was working should have no negative effect but before using it you need to change it to good)
# get the IDs of the UBAD drives / IDs are reported as 4:16 which represents the controller enclosure and slot. 
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show | grep UBad
4:16     21 UBad   -   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:17     22 UBad   -   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:18     23 UBad   -   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:19     24 UBad   -   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U 

# The enclosure above is 4 and slots 16-19. So we set the disks to GOOD using /e4 and /s16 etc
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /e4 /s16 set good
Controller = 0
Status = Success
Description = Set Drive Good Succeeded.

# repeat for other disks 
storcli64 /c0 /e4 /s17 set good
storcli64 /c0 /e4 /s18 set good
storcli64 /c0 /e4 /s19 set good

# these drives will now be set as foreign - so lets check and see 
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /fall show
Controller = 0
Status = Success
Description = Operation on foreign configuration Succeeded


FOREIGN CONFIGURATION :
=====================

----------------------------------------
DG EID:Slot Type  State      Size NoVDs 
----------------------------------------
 0 -        RAID6 Frgn  58.210 TB     1 
----------------------------------------

NoVDs - Number of VDs in disk group|DG - Diskgroup
Total foreign drive groups = 1

# Now lets import the old configuration to restore things to the way they were
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /fall import 
Controller = 0
Status = Success
Description = Successfully imported foreign configuration

# and we check the output again to see the state of the drives
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive Information :
=================

------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                 Sp 
------------------------------------------------------------------------------
4:0       5 Onln   0 233.312 GB SATA HDD N   N  512B WDC WD2503ABYZ-011FA0 U  
4:1       6 Onln   0 233.312 GB SATA HDD N   N  512B WDC WD2503ABYZ-011FA0 U  
4:2       7 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:3       8 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:4       9 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:5      10 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:6      11 Rbld   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:7      12 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:8      13 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:9      14 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:10     15 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:11     16 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:12     17 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:13     18 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:14     19 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:15     20 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:16     21 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:17     22 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:18     23 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
4:19     24 Onln   1   3.637 TB SATA HDD N   N  512B HGST HUS724040ALA640  U  
------------------------------------------------------------------------------

# Note our UGOOD drive which was replaced on site is now in a rebuild state. 
# Lets check the rebuild state of the system. 
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /eall /sall show rebuild 
Controller = 0
Status = Success
Description = Show Drive Rebuild Status Succeeded.


---------------------------------------------------------------
Drive-ID   Progress% Status          Estimated Time Left       
---------------------------------------------------------------
/c0/e4/s0  -         Not in progress -                         
/c0/e4/s1  -         Not in progress -                         
/c0/e4/s2  -         Not in progress -                         
/c0/e4/s3  -         Not in progress -                         
/c0/e4/s4  -         Not in progress -                         
/c0/e4/s5  -         Not in progress -                         
/c0/e4/s6  0         In progress     1 Days 0 Hours 28 Minutes 
/c0/e4/s7  -         Not in progress -                         
/c0/e4/s8  -         Not in progress -                         
/c0/e4/s9  -         Not in progress -                         
/c0/e4/s10 -         Not in progress -                         
/c0/e4/s11 -         Not in progress -                         
/c0/e4/s12 -         Not in progress -                         
/c0/e4/s13 -         Not in progress -                         
/c0/e4/s14 -         Not in progress -                         
/c0/e4/s15 -         Not in progress -                         
/c0/e4/s16 -         Not in progress -                         
/c0/e4/s17 -         Not in progress -                         
/c0/e4/s18 -         Not in progress -                         
/c0/e4/s19 -         Not in progress -                         
---------------------------------------------------------------



# Or focus in on the drive in question to get a rebuild time. 
[root@lustre02-oss2 ~]# /usr/local/MegaRAID\ Storage\ Manager/StorCLI/storcli64 /c0 /e4 /s6 show rebuild 
Controller = 0
Status = Success
Description = Show Drive Rebuild Status Succeeded.


----------------------------------------------------
Drive-ID  Progress% Status      Estimated Time Left 
----------------------------------------------------
/c0/e4/s6         0 In progress 22 Hours 36 Minutes 
----------------------------------------------------