Difference between revisions of "Check disk failure and send alert"

Latest revision as of 15:04, 2 July 2015

when no disks failed, the output of storcli is:

[root@disk-test-node1 ~]# storcli64 /c0/vall show 
Controller = 0
Status = Success
Description = None


Virtual Drives :
==============

-------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC     Size Name 
-------------------------------------------------------------
0/0   RAID6 Optl  RW     No      RWBD  -   ON  1.063 TB      
-------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|B=Blocked|Consist=Consistent|
R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

when disk failure occurs, the output is:

[root@disk-test-node1 ~]# storcli64 /c0/vall show 
Controller = 0
Status = Success
Description = None


Virtual Drives :
==============

-------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC     Size Name 
-------------------------------------------------------------
0/0   RAID6 Pdgd  RW     No      RWBD  -   ON  1.063 TB      
-------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|B=Blocked|Consist=Consistent|
R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

To check if any disk failure occurs, we can use this command:

[root@disk-test-node1 ~]# storcli64 /c0/vall show |grep '\ Optl\ '
0/0   RAID6 Optl  RW     No      RWBD  -   ON  1.063 TB

To add health check in bright:

1- Adding the healthcheck.

# cmsh
% monitoring healthchecks
% add <healthcheck_name>
% set command <path_to_your_script>
% commit

2- Configuring the healthcheck

% monitoring setup healthconf <category_name>
% add <healthcheck_name>
% set checkinterval <interval>
% commit

You can then add a fail action if the healthcheck fails like getting an email alert or powering the node off. You can find more information about metrics and monitoring in Bright in chapter 9 of Bright 7.0 admin manual.

We can use this scirpt for checking the RAID:

#!/bin/sh

allVD=`storcli64 /call/vall show |grep 'RAID[0-9]'|wc -l`
optVD=`storcli64 /call/vall show |grep '\ Optl\ '|wc -l`
logPath="/tmp/raidCheck.log";
hostname=`hostname`

if [ $optVD -eq $allVD ]; then 
	echo PASS;
else
	echo FAIL;
	storcli64 /call/eall/sall show > $logPath
	mail -a $logPath -s "RAID ERROR! at $hostname" pol.llovet@gmail.com,HPC@boston.co.uk < /dev/null;
fi

Difference between revisions of "Check disk failure and send alert"

Latest revision as of 15:04, 2 July 2015

Navigation menu

Search

@@ Line 25: / Line 25: @@
 </syntaxhighlight>
-To add health check in bright:
-- Adding the healthcheck.
-<syntaxhighlight>
-# cmsh
-% monitoring healthchecks
-% add <healthcheck_name>
-% set command <path_to_your_script>
-% commit
-</syntaxhighlight>
-- Configuring the healthcheck
-<syntaxhighlight>
-% monitoring setup healthconf <category_name>
-% add <healthcheck_name>
-% set checkinterval <interval>
-% commit
-</syntaxhighlight>
-You can then add a fail action if the healthcheck fails like getting an email alert or powering the node off. You can find more information about metrics and monitoring in Bright in chapter 9 of Bright 7.0 admin manual.
 when disk failure occurs, the output is:
@@ Line 73: / Line 54: @@
 [root@disk-test-node1 ~]# storcli64 /c0/vall show |grep '\ Optl\ '
 /0   RAID6 Optl  RW     No      RWBD  -   ON  1.063 TB
+</syntaxhighlight>
+To add health check in bright:
+- Adding the healthcheck.
+<syntaxhighlight>
+# cmsh
+% monitoring healthchecks
+% add <healthcheck_name>
+% set command <path_to_your_script>
+% commit
+</syntaxhighlight>
+- Configuring the healthcheck
+<syntaxhighlight>
+% monitoring setup healthconf <category_name>
+% add <healthcheck_name>
+% set checkinterval <interval>
+% commit
+</syntaxhighlight>
+You can then add a fail action if the healthcheck fails like getting an email alert or powering the node off. You can find more information about metrics and monitoring in Bright in chapter 9 of Bright 7.0 admin manual.
+We can use this scirpt for checking the RAID:
+<syntaxhighlight>
+#!/bin/sh
+allVD=`storcli64 /call/vall show |grep 'RAID[0-9]'|wc -l`
+optVD=`storcli64 /call/vall show |grep '\ Optl\ '|wc -l`
+logPath="/tmp/raidCheck.log";
+hostname=`hostname`
+if [ $optVD -eq $allVD ]; then
+	echo PASS;
+else
+	echo FAIL;
+	storcli64 /call/eall/sall show > $logPath
+	mail -a $logPath -s "RAID ERROR! at $hostname" pol.llovet@gmail.com,HPC@boston.co.uk < /dev/null;
+fi
 </syntaxhighlight>