Hardware Monitoring
Monitoring Concept and Definitions
A discussion of the concepts of monitoring, along with definitions of terms used, is appropriate at this point. The features of the monitoring framework covered later on in this chapter will then be understood more clearly.
Metric
A metric is a property of a device that can be monitored. It has a numeric value and can have units, unless it is unknown, i.e. has a null value. Examples are: temperature, load average, free space. A metric can be a built-in, which means it is an integral part of the monitoring framework, or it can be a standalone script. The word metric is often used to mean the script or object associated with a metric as well as a metric value. The context makes it clear which is meant.
Action
An action is a standalone script or a built-in command that is executed when a condition is met. This condition can be:
- health checking
- threshold checking
- state fliping
Threshhold
A threshold is a particular value in a sampled metric. A sample can cross the threshold, thereby entering or leaving a zone that is demarcated by the threshold.
A threshold can be configured to launch an action according to threshold crossing conditions. The "New Threshold" dialog of cmgui has three action launch configuration options:
- Enter: if the sample has entered into the zone and the previous sample was not in the zone
- Leave: if the sample has left the zone and the previous sample was in the zone
- During: if the sample was in the zone, and the previous sample was also in the zone
A threshold zone also has a settable severity associated with it. This value is processed for the AlertLevel metric when an action is triggered by a threshold event.
Health Check
A health check value is ta state that response to running a health check script at a regular time interval with three possible response values: PASS, FAIL or UNKNOWN.