Difference between revisions of "Bright: Health Checks"

From Define Wiki
Jump to navigation Jump to search
(Created page with "== Check the status of Health Checks == * High level PASS/FAIL summary <syntaxhighlight> cmsh device list </syntaxhighlight> == Check why a node is failing == <syntaxhighlight> [root@hyalite ~]# cmsh...")
 
 
Line 6: Line 6:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 +
== More Health Details - Showhealth ==
 +
<syntaxhighlight>
 +
$ cmsh / device
 +
[hyalite->device]% showhealth
 +
Device          AlertLevel Failed                      Thresholds          Unknown           
 +
---------------- ---------- ---------------------------- -------------------- --------------------
 +
compute004      20        rogueprocess                                                         
 +
compute006      40        dmesg                                            smart             
 +
compute007      40        dmesg                                            smart             
 +
compute008      40        dmesg, rogueprocess                                                 
 +
compute009      40        dmesg                                            smart             
 +
compute010      40        dmesg                                                               
 +
compute011      40        dmesg                                            smart             
 +
compute020      40        dmesg                                                               
 +
compute024      20        rogueprocess                                                     
 +
</syntaxhighlight>
 
== Check why a node is failing ==
 
== Check why a node is failing ==
 
<syntaxhighlight>
 
<syntaxhighlight>

Latest revision as of 23:53, 29 November 2015

Check the status of Health Checks

  • High level PASS/FAIL summary
  cmsh 
  device list

More Health Details - Showhealth

$ cmsh / device 
[hyalite->device]% showhealth
Device           AlertLevel Failed                       Thresholds           Unknown             
---------------- ---------- ---------------------------- -------------------- --------------------
compute004       20         rogueprocess                                                          
compute006       40         dmesg                                             smart               
compute007       40         dmesg                                             smart               
compute008       40         dmesg, rogueprocess                                                   
compute009       40         dmesg                                             smart               
compute010       40         dmesg                                                                 
compute011       40         dmesg                                             smart               
compute020       40         dmesg                                                                 
compute024       20         rogueprocess

Check why a node is failing

[root@hyalite ~]# cmsh 
[hyalite]% device 
[hyalite->device]% check -n compute020 * 
Health Check                 Value            Age (sec.) Info Message                                                                                       
---------------------------- ---------------- ---------- ---------------------------------------------------------------------------------------------------
DeviceIsUp                   PASS             1                                                                                                             
ManagedServicesOk            PASS             2                                                                                                             
diskspace:2% 10% 20%         PASS             2                                                                                                             
dmesg                        FAIL             2          /* 2 */ matches for regex: invoked oom-killer                                                     +
ib                           PASS             2                                                                                                             
interfaces                   PASS             2                                                                                                             
ipmihealth                   PASS             1                                                                                                             
mounts                       PASS             2                                                                                                             
nslcd                        PASS             2                                                                                                             
ntp                          PASS             2                                                                                                             
rogueprocess                 PASS             2                                                                                                             
schedulers                   PASS             2                                                                                                             
smart                        PASS             2                                                                                                             
ssh2node                     PASS             1