Bright: Health Checks

From Define Wiki
Jump to navigation Jump to search

Check the status of Health Checks

  • High level PASS/FAIL summary
  cmsh 
  device list

More Health Details - Showhealth

$ cmsh / device 
[hyalite->device]% showhealth
Device           AlertLevel Failed                       Thresholds           Unknown             
---------------- ---------- ---------------------------- -------------------- --------------------
compute004       20         rogueprocess                                                          
compute006       40         dmesg                                             smart               
compute007       40         dmesg                                             smart               
compute008       40         dmesg, rogueprocess                                                   
compute009       40         dmesg                                             smart               
compute010       40         dmesg                                                                 
compute011       40         dmesg                                             smart               
compute020       40         dmesg                                                                 
compute024       20         rogueprocess

Check why a node is failing

[root@hyalite ~]# cmsh 
[hyalite]% device 
[hyalite->device]% check -n compute020 * 
Health Check                 Value            Age (sec.) Info Message                                                                                       
---------------------------- ---------------- ---------- ---------------------------------------------------------------------------------------------------
DeviceIsUp                   PASS             1                                                                                                             
ManagedServicesOk            PASS             2                                                                                                             
diskspace:2% 10% 20%         PASS             2                                                                                                             
dmesg                        FAIL             2          /* 2 */ matches for regex: invoked oom-killer                                                     +
ib                           PASS             2                                                                                                             
interfaces                   PASS             2                                                                                                             
ipmihealth                   PASS             1                                                                                                             
mounts                       PASS             2                                                                                                             
nslcd                        PASS             2                                                                                                             
ntp                          PASS             2                                                                                                             
rogueprocess                 PASS             2                                                                                                             
schedulers                   PASS             2                                                                                                             
smart                        PASS             2                                                                                                             
ssh2node                     PASS             1