Difference between revisions of "Bright: Health Checks"
Jump to navigation
Jump to search
(Created page with "== Check the status of Health Checks == * High level PASS/FAIL summary <syntaxhighlight> cmsh device list </syntaxhighlight> == Check why a node is failing == <syntaxhighlight> [root@hyalite ~]# cmsh...") |
|||
| Line 6: | Line 6: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| + | == More Health Details - Showhealth == | ||
| + | <syntaxhighlight> | ||
| + | $ cmsh / device | ||
| + | [hyalite->device]% showhealth | ||
| + | Device AlertLevel Failed Thresholds Unknown | ||
| + | ---------------- ---------- ---------------------------- -------------------- -------------------- | ||
| + | compute004 20 rogueprocess | ||
| + | compute006 40 dmesg smart | ||
| + | compute007 40 dmesg smart | ||
| + | compute008 40 dmesg, rogueprocess | ||
| + | compute009 40 dmesg smart | ||
| + | compute010 40 dmesg | ||
| + | compute011 40 dmesg smart | ||
| + | compute020 40 dmesg | ||
| + | compute024 20 rogueprocess | ||
| + | </syntaxhighlight> | ||
== Check why a node is failing == | == Check why a node is failing == | ||
<syntaxhighlight> | <syntaxhighlight> | ||
Latest revision as of 23:53, 29 November 2015
Check the status of Health Checks
- High level PASS/FAIL summary
cmsh
device listMore Health Details - Showhealth
$ cmsh / device
[hyalite->device]% showhealth
Device AlertLevel Failed Thresholds Unknown
---------------- ---------- ---------------------------- -------------------- --------------------
compute004 20 rogueprocess
compute006 40 dmesg smart
compute007 40 dmesg smart
compute008 40 dmesg, rogueprocess
compute009 40 dmesg smart
compute010 40 dmesg
compute011 40 dmesg smart
compute020 40 dmesg
compute024 20 rogueprocessCheck why a node is failing
[root@hyalite ~]# cmsh
[hyalite]% device
[hyalite->device]% check -n compute020 *
Health Check Value Age (sec.) Info Message
---------------------------- ---------------- ---------- ---------------------------------------------------------------------------------------------------
DeviceIsUp PASS 1
ManagedServicesOk PASS 2
diskspace:2% 10% 20% PASS 2
dmesg FAIL 2 /* 2 */ matches for regex: invoked oom-killer +
ib PASS 2
interfaces PASS 2
ipmihealth PASS 1
mounts PASS 2
nslcd PASS 2
ntp PASS 2
rogueprocess PASS 2
schedulers PASS 2
smart PASS 2
ssh2node PASS 1