Difference between revisions of "Check disk failure and send alert"
Jump to navigation
Jump to search
| Line 76: | Line 76: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
You can then add a fail action if the healthcheck fails like getting an email alert or powering the node off. You can find more information about metrics and monitoring in Bright in chapter 9 of Bright 7.0 admin manual. | You can then add a fail action if the healthcheck fails like getting an email alert or powering the node off. You can find more information about metrics and monitoring in Bright in chapter 9 of Bright 7.0 admin manual. | ||
| + | |||
| + | We can use this scirpt for checking the RAID: | ||
| + | <syntaxhighlight> | ||
| + | #!/bin/sh | ||
| + | |||
| + | allVD=`storcli64 /call/vall show |grep 'RAID[0-9]'|wc -l` | ||
| + | optVD=`storcli64 /call/vall show |grep '\ Optl\ '|wc -l` | ||
| + | logPath="/tmp/raidCheck.log"; | ||
| + | hostname=`hostname` | ||
| + | |||
| + | if [ $optVD -eq $allVD ]; then | ||
| + | echo PASS; | ||
| + | else | ||
| + | echo FAIL; | ||
| + | storcli64 /call/eall/sall show > $logPath | ||
| + | mail -a $logPath -s "RAID ERROR! at $hostname" pol.llovet@gmail.com,HPC@boston.co.uk < /dev/null; | ||
| + | fi | ||
| + | |||
| + | </syntaxhighlight> | ||
Latest revision as of 15:04, 2 July 2015
when no disks failed, the output of storcli is:
[root@disk-test-node1 ~]# storcli64 /c0/vall show
Controller = 0
Status = Success
Description = None
Virtual Drives :
==============
-------------------------------------------------------------
DG/VD TYPE State Access Consist Cache Cac sCC Size Name
-------------------------------------------------------------
0/0 RAID6 Optl RW No RWBD - ON 1.063 TB
-------------------------------------------------------------
Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|B=Blocked|Consist=Consistent|
R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency
when disk failure occurs, the output is:
[root@disk-test-node1 ~]# storcli64 /c0/vall show
Controller = 0
Status = Success
Description = None
Virtual Drives :
==============
-------------------------------------------------------------
DG/VD TYPE State Access Consist Cache Cac sCC Size Name
-------------------------------------------------------------
0/0 RAID6 Pdgd RW No RWBD - ON 1.063 TB
-------------------------------------------------------------
Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|B=Blocked|Consist=Consistent|
R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check ConsistencyTo check if any disk failure occurs, we can use this command:
[root@disk-test-node1 ~]# storcli64 /c0/vall show |grep '\ Optl\ '
0/0 RAID6 Optl RW No RWBD - ON 1.063 TB
To add health check in bright:
1- Adding the healthcheck.
# cmsh
% monitoring healthchecks
% add <healthcheck_name>
% set command <path_to_your_script>
% commit2- Configuring the healthcheck
% monitoring setup healthconf <category_name>
% add <healthcheck_name>
% set checkinterval <interval>
% commitYou can then add a fail action if the healthcheck fails like getting an email alert or powering the node off. You can find more information about metrics and monitoring in Bright in chapter 9 of Bright 7.0 admin manual.
We can use this scirpt for checking the RAID:
#!/bin/sh
allVD=`storcli64 /call/vall show |grep 'RAID[0-9]'|wc -l`
optVD=`storcli64 /call/vall show |grep '\ Optl\ '|wc -l`
logPath="/tmp/raidCheck.log";
hostname=`hostname`
if [ $optVD -eq $allVD ]; then
echo PASS;
else
echo FAIL;
storcli64 /call/eall/sall show > $logPath
mail -a $logPath -s "RAID ERROR! at $hostname" pol.llovet@gmail.com,HPC@boston.co.uk < /dev/null;
fi