Commit Graph

10 Commits

Author SHA1 Message Date
Becca Powell
49a9dc9b8b Fix typo in Prometheus alerting rules 2018-08-21 16:55:49 -07:00
Dalton Hubble
84d6cfe7b3 Add Prometheus alert rule for inactive md devices
* node-exporter exposes metrics to Prometheus about total and
active md devices (e.g. disks in mdadm RAID arrays)
* Add alert that fires when a RAID disk fails or becomes inactive
for another reason
2018-07-10 00:20:30 -07:00
Dalton Hubble
d770393dbc Add etcd metrics, Prometheus scrapes, and Grafana dash
* Use etcd v3.3 --listen-metrics-urls to expose only metrics
data via http://0.0.0.0:2381 on controllers
* Add Prometheus discovery for etcd peers on controller nodes
* Temporarily drop two noisy Prometheus alerts
2018-04-03 20:31:00 -07:00
Dalton Hubble
9307e97c46 addons: Update Prometheus from v2.1.0 to v2.2.0
* Annotate Prometheus service to scrape metrics from
Prometheus itself (enables Prometheus* alerts)
* Update kube-state-metrics addon-resizer to 1.7
* Use port 8080 for kube-state-metrics
* Add PrometheusNotIngestingSamples alert rule
* Change K8SKubeletDown alert rule to fire when 10%
of kubelets are down, not 1%
  * https://github.com/coreos/prometheus-operator/pull/1032
2018-03-09 00:20:40 -08:00
Dalton Hubble
064ce83f25 addons: Update Prometheus to v2.1.0
* Change service discovery to relabel jobs to align with
rule expressions in upstream examples
* Use a separate service account for prometheus instead
of granting roles to the namespace's default
* Use a separate service account for node-exporter
* Update node-exporter and kube-state-metrics exporters
2018-01-27 21:00:15 -08:00
Dalton Hubble
65f006e6cc addons: Sync prometheus alerts to upstream
* https://github.com/coreos/prometheus-operator/pull/774
2017-12-01 23:24:08 -08:00
Dalton Hubble
63ab117205 addons: Add prometheus rules for DaemonSets
* https://github.com/coreos/prometheus-operator/pull/755
2017-11-16 23:51:21 -08:00
Dalton Hubble
1cd262e712 addons: Fix prometheus K8SApiServerLatency alert rule
* https://github.com/coreos/prometheus-operator/issues/751
2017-11-16 23:37:15 -08:00
Dalton Hubble
159443bae7 addons: Add better alerting rules to Prometheus manifests
* Adapt the coreos/prometheus-operator alerting rules for Typhoon,
https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus/manifests
* Add controller manager and scheduler shim services to let
prometheus discover them via service endpoints
* Fix several alert rules to use service endpoint discovery
* A few rules still don't do much, but they default to green
2017-11-10 20:57:47 -08:00
Dalton Hubble
d046d45769 addons: Include Prometheus and node-exporter manifests 2017-10-24 22:58:59 -07:00