Check disk failure and send alert

From Define Wiki
Revision as of 10:33, 2 July 2015 by Chenhui (talk | contribs)
Jump to navigation Jump to search

when no disks failed, the output of storcli is:

[root@disk-test-node1 ~]# storcli64 /c0/vall show 
Controller = 0
Status = Success
Description = None


Virtual Drives :
==============

-------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC     Size Name 
-------------------------------------------------------------
0/0   RAID6 Optl  RW     No      RWBD  -   ON  1.063 TB      
-------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|B=Blocked|Consist=Consistent|
R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

To add health check in bright:

1- Adding the healthcheck.

# cmsh
% monitoring healthchecks
% add <healthcheck_name>
% set command <path_to_your_script>
% commit

2- Configuring the healthcheck

% monitoring setup healthconf <category_name>
% add <healthcheck_name>
% set checkinterval <interval>
% commit

You can then add a fail action if the healthcheck fails like getting an email alert or powering the node off. You can find more information about metrics and monitoring in Bright in chapter 9 of Bright 7.0 admin manual.

when disk failure occurs, the output is:

[root@disk-test-node1 ~]# storcli64 /c0/vall show 
Controller = 0
Status = Success
Description = None


Virtual Drives :
==============

-------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC     Size Name 
-------------------------------------------------------------
0/0   RAID6 Pdgd  RW     No      RWBD  -   ON  1.063 TB      
-------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|B=Blocked|Consist=Consistent|
R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

To check if any disk failure occurs, we can use this command:

[root@disk-test-node1 ~]# storcli64 /c0/vall show |grep '\ Optl\ '
0/0   RAID6 Optl  RW     No      RWBD  -   ON  1.063 TB