Add Prometheus alert rule for inactive md devices

* node-exporter exposes metrics to Prometheus about total and
active md devices (e.g. disks in mdadm RAID arrays)
* Add alert that fires when a RAID disk fails or becomes inactive
for another reason
This commit is contained in:
Dalton Hubble 2018-07-10 00:20:30 -07:00
parent 3352388fe6
commit 84d6cfe7b3
1 changed files with 7 additions and 0 deletions

View File

@ -496,6 +496,13 @@ data:
annotations: annotations:
description: device {{$labels.device}} on node {{$labels.instance}} is running description: device {{$labels.device}} on node {{$labels.instance}} is running
full within the next 2 hours (mounted at {{$labels.mountpoint}}) full within the next 2 hours (mounted at {{$labels.mountpoint}})
- alert: InactiveRAIDDisk
expr: node_md_disks - node_md_disks_active > 0
for: 10m
labels:
severity: warning
annotations:
description: '{{$value}} RAID disk(s) on node {{$labels.instance}} are inactive'
prometheus.rules.yaml: | prometheus.rules.yaml: |
groups: groups:
- name: prometheus.rules - name: prometheus.rules