Add Prometheus alert rule for inactive md devices
* node-exporter exposes metrics to Prometheus about total and active md devices (e.g. disks in mdadm RAID arrays) * Add alert that fires when a RAID disk fails or becomes inactive for another reason
This commit is contained in:
parent
3352388fe6
commit
84d6cfe7b3
|
@ -496,6 +496,13 @@ data:
|
||||||
annotations:
|
annotations:
|
||||||
description: device {{$labels.device}} on node {{$labels.instance}} is running
|
description: device {{$labels.device}} on node {{$labels.instance}} is running
|
||||||
full within the next 2 hours (mounted at {{$labels.mountpoint}})
|
full within the next 2 hours (mounted at {{$labels.mountpoint}})
|
||||||
|
- alert: InactiveRAIDDisk
|
||||||
|
expr: node_md_disks - node_md_disks_active > 0
|
||||||
|
for: 10m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
description: '{{$value}} RAID disk(s) on node {{$labels.instance}} are inactive'
|
||||||
prometheus.rules.yaml: |
|
prometheus.rules.yaml: |
|
||||||
groups:
|
groups:
|
||||||
- name: prometheus.rules
|
- name: prometheus.rules
|
||||||
|
|
Loading…
Reference in New Issue