typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-04 07:54:38 +02:00

Author	SHA1	Message	Date
Dalton Hubble	2b5dfece93	Update Grafana from v6.7.1 to v6.7.2 * https://github.com/grafana/grafana/releases/tag/v6.7.2	2020-04-04 13:13:19 -07:00
Dalton Hubble	d47d40b517	Refresh Prometheus rules/alerts and Grafana dashboards * Refresh upstream Prometheus rules and alerts and Grafana dashboards * All Loki recording rules for convenience	2020-03-31 21:53:01 -07:00
Dalton Hubble	076b8e3c42	Update Prometheus from v2.17.0 to v2.17.1 * https://github.com/prometheus/prometheus/releases/tag/v2.17.1	2020-03-26 22:17:13 -07:00
Dalton Hubble	e556bc2167	Update Prometheus from v2.17.0-rc.3 to v2.17.0 * https://github.com/prometheus/prometheus/releases/tag/v2.17.0	2020-03-24 23:15:49 -07:00
Dalton Hubble	ddc1ff5348	Update Grafana from v6.6.2 to v6.7.1 * https://github.com/grafana/grafana/releases/tag/v6.7.1	2020-03-21 15:27:55 -07:00
Dalton Hubble	61557e89a6	Update Prometheus from v2.16.0 to v2.17.0-rc.3 * https://github.com/prometheus/prometheus/releases/tag/v2.17.0-rc.3	2020-03-19 22:38:05 -07:00
Dalton Hubble	75fb4e5d11	Remove Container Linux Update Operator (CLUO) addon * Stop providing example manifests for the Container Linux Update Operator (CLUO) * CLUO requires patches to support Kubernetes v1.16+, but the project and push access is rather unowned * CLUO hasn't been in active use in our clusters and won't be relevant beyond Container Linux. Not to say folks can't patch it and run it on their own. Examples just aren't provided here Related: https://github.com/coreos/container-linux-update-operator/pull/197	2020-03-16 22:05:17 -07:00
Dalton Hubble	c4683c5bad	Refresh Prometheus alerts and Grafana dashboards * Add 2 min wait before KubeNodeUnreachable to be less noisy on premeptible clusters * Add a BlackboxProbeFailure alert for any failing probes for services annotated `prometheus.io/probe: true`	2020-03-02 20:08:37 -08:00
Dalton Hubble	f4d260645c	Update node-exporter from v0.18.1 to v1.0.0-rc.0 * Update mdadm alert rule; node-exporter adds `state` label to `node_md_disks` and removes `node_md_disks_active` * https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.0	2020-02-25 22:29:52 -08:00
Dalton Hubble	d9219a6722	Update nginx-ingress from v0.29.0 to v0.30.0 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.30.0	2020-02-25 22:11:59 -08:00
Dalton Hubble	60c7eb85ee	Update nginx-ingress from v0.28.0 to v0.29.0 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.29.0	2020-02-22 15:57:59 -08:00
Dalton Hubble	4c964b56a0	Update kube-state-metrics from v1.9.4 to v1.9.5 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.5	2020-02-22 15:21:10 -08:00
Dalton Hubble	1fbd6835f2	Update Grafana from v6.6.1 to v6.6.2 * https://github.com/grafana/grafana/releases/tag/v6.6.2	2020-02-22 15:19:24 -08:00
Dalton Hubble	7ca03e5219	Update Prometheus from v1.15.2 to v1.16.0 * https://github.com/prometheus/prometheus/releases/tag/v2.16.0	2020-02-14 12:10:56 -08:00
Dalton Hubble	34c3d7cc39	Update Grafana from v6.6.0 to v6.6.1 * https://github.com/grafana/grafana/releases/tag/v6.6.1	2020-02-08 14:50:33 -08:00
Dalton Hubble	e339fbd2b6	Update kube-state-metrics from v1.9.3 to v1.9.4 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.4	2020-02-04 21:33:34 -08:00
Dalton Hubble	b19ba16afa	Update nginx-ingress from v0.27.1 to v0.28.0 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.28.0	2020-01-30 18:00:23 -08:00
Dalton Hubble	d127a7345c	Update Grafana from v6.5.3 to v6.6.0 * https://github.com/grafana/grafana/releases/tag/v6.6.0	2020-01-27 20:46:32 -08:00
Dalton Hubble	d5b7ce8f27	Update kube-state-metrics from v1.9.2 to v1.9.3 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.3	2020-01-23 00:03:16 -08:00
Dalton Hubble	bda73264f7	Update nginx-ingress from v0.26.1 to v0.27.1 * Change runAsUser from 33 to 101 for new alpine-based image * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.0 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.1	2020-01-20 15:22:16 -08:00
Dalton Hubble	03ff3a9cf3	Update kube-state-metrics from v1.9.1 to v1.9.2 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.2	2020-01-18 15:32:10 -08:00
Dalton Hubble	48703f9906	Update Grafana from v6.5.2 to v6.5.3 * https://github.com/grafana/grafana/releases/tag/v6.5.3	2020-01-18 15:30:39 -08:00
Dalton Hubble	0e2fc89f78	Update kube-state-metrics from v1.9.0 to v1.9.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.1	2020-01-11 14:15:55 -08:00
Dalton Hubble	73588cfad3	Update Prometheus from v2.15.1 to v2.15.2 * https://github.com/prometheus/prometheus/releases/tag/v2.15.2	2020-01-06 22:08:34 -08:00
Dalton Hubble	bb586b60da	Reduce Prometheus addon's node-exporter tolerations * Change node-exporter DaemonSet tolerations from tolerating all possible NoSchedule taints to tolerating the master taint and the not ready taint (we'd like metrics regardless) * Users who add custom node taints must add their custom taints to the addon node-exporter DaemonSet. As an addon, its expected users copy and manipulate manifests out-of-band in their own systems	2020-01-06 21:24:24 -08:00
Dalton Hubble	43e05b9131	Enable kube-proxy metrics and allow Prometheus scrapes * Configure kube-proxy --metrics-bind-address=0.0.0.0 (default 127.0.0.1) to serve metrics on 0.0.0.0:10249 * Add firewall rules to allow Prometheus (resides on a worker) to scrape kube-proxy service endpoints on controllers or workers * Add a clusterIP: None service for kube-proxy endpoint discovery	2020-01-06 21:11:18 -08:00
Dalton Hubble	a4e843693f	Update Prometheus from v2.15.0 to v2.15.1 * https://github.com/prometheus/prometheus/releases/tag/v2.15.1	2019-12-26 09:12:55 -05:00
Dalton Hubble	f48e43c0b1	Update Prometheus from v2.14.0 to v2.15.0 * https://github.com/prometheus/prometheus/releases/tag/v2.15.0	2019-12-24 10:52:19 -05:00
Dalton Hubble	52d11096dc	Update kube-state-metrics from v1.9.0-rc.1 to v1.9.0 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0	2019-12-20 13:53:37 -08:00
Dalton Hubble	0ecb995890	Update kube-state-metrics from v1.8.0 to v1.9.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0	2019-12-14 17:20:49 -08:00
Dalton Hubble	1b9fa2e688	Update Grafana from v6.5.1 to v6.5.2 * https://github.com/grafana/grafana/releases/tag/v6.5.2	2019-12-14 15:25:48 -08:00
Dalton Hubble	178afe4a9b	Reduce apiserver metrics cardinality and extraneous labels * Stop mapping node labels to targets discovered via Kubernetes nodes (e.g. etcd, kubelet, cadvisor). It is rarely useful to store node labels (e.g. kubernetes.io/os=linux) on these metrics * kube-apiserver's apiserver_request_duration_seconds_bucket metric has a high cardinality that includes labels for the API group, verb, scope, resource, and component for each object type, including for each CRD. This one metric has ~10k time series in a typical cluster (btw 10-40% of total) * Removing the apiserver request duration outright would make latency alerts a NoOp and break a Grafana apiserver panel. Instead, drop series that have a "group" label. Effectively, only request durations for core Kubernetes APIs will be kept (e.g. cardinality won't grow with each CRD added). This reduces the metric to ~2k unique series	2019-12-08 22:48:25 -08:00
Dalton Hubble	26674083b6	Update Grafana from v6.5.0 to v6.5.1 * https://github.com/grafana/grafana/releases/tag/v6.5.1	2019-11-28 14:11:25 -08:00
Dalton Hubble	030a4cec19	Update Grafana from v6.4.4 to v6.5.0 * https://grafana.com/docs/guides/whats-new-in-v6-5/	2019-11-25 22:45:58 -08:00
Dalton Hubble	ddea7dc452	Use new resource dashboards in Grafana deployment * kubernetes-mixin pod resource dashboards were split into two ConfigMap parts because they provide richer networking details * New dashboards have been used by the author at the global level, but were missing in the per-cluster Grafana tracked here	2019-11-25 22:27:11 -08:00
Dalton Hubble	525ae23305	Add node-exporter alerts and Grafana dashboard * Add Prometheus alerts from node-exporter * Add Grafana dashboard nodes.json, from node-exporter * Not adding recording rules, since those are only used by some node-exporter USE dashboards not being included	2019-11-16 13:47:20 -08:00
Dalton Hubble	42b6df89c8	Update Prometheus from v2.14.0-rc.0 to v2.14.0 * https://github.com/prometheus/prometheus/releases/tag/v2.14.0	2019-11-13 13:41:11 -08:00
Dalton Hubble	a8b7792338	Update Grafana from v6.4.3 to v6.4.4 * https://github.com/grafana/grafana/releases/tag/v6.4.4	2019-11-07 12:00:25 -08:00
Dalton Hubble	a3807086d4	Update Prometheus from v2.13.1 to v2.14.0-rc.0 * Happy PromCon 2019! * https://github.com/prometheus/prometheus/releases/tag/v2.14.0-rc.0	2019-11-07 11:48:23 -08:00
Dalton Hubble	d4573092b5	Improve Kubelet and Compute Resource dashboards * Add cluster filter to Kubelet dashboard * Add network details in resource dashboards * https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/275 * https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/284 * https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/285	2019-10-28 02:22:15 -07:00
Dalton Hubble	eb7b6d39f2	Improve minor aspects of CoreDNS and nginx-ingress dashboards * Add default 10s refresh rate to custom dashboards to match those from Kubernetes * Show labels for "instance" as "pod" for clarity * Add cluster filter for internal use	2019-10-20 23:16:55 -07:00
Dalton Hubble	33d4c2fd68	Add explicit annotation for Prometheus port to scrape * Without the prometheus.io/port annotation, Prometheus service discovery can scrape other Prometheus ports that may be available. * For example, Prometheus sidecars (not included) may be scraped and that may be unintended	2019-10-20 16:05:09 -07:00
Dalton Hubble	de90cb9246	Remove kube-state-metrics addon-resizer * addon-resizer is outdated and has been dropped from kube-state-metrics examples. Those using it should look to the cluster-proportional-vertical-autoscaler. * Eliminate addon-resizer log spew * Remove associated Role and RoleBinding * Also fix kube-state-metrics readinessProbe port	2019-10-20 16:03:29 -07:00
Dalton Hubble	68da420adc	Refresh Prometheus rules/alerts and Grafana dashboards * Update Prometheus rules/alerts and Grafana dashboards * Remove dashboards that were moved to node-exporter, they may be added back later if valuable * Remove kube-prometheus based rules/alerts (ClockSkew alert)	2019-10-19 17:43:47 -07:00
Dalton Hubble	130c97f8eb	Update Prometheus from v2.13.0 to v2.13.1 * https://github.com/prometheus/prometheus/releases/tag/v2.13.1	2019-10-18 00:10:25 -07:00
Dalton Hubble	271d2f6b52	Update Grafana from v6.4.2 to v6.4.3 * https://github.com/grafana/grafana/releases/tag/v6.4.3	2019-10-18 00:08:39 -07:00
Dalton Hubble	e4ac1027c8	Update Grafana from v6.4.1 to v6.4.2 * https://github.com/grafana/grafana/releases/tag/v6.4.2	2019-10-15 22:58:43 -07:00
Dalton Hubble	69188af565	Rename CLUO label from "app" to "name" * Match the labeling pattern in other addons	2019-10-15 00:05:02 -07:00
Dalton Hubble	ab72f1ab2d	Update Prometheus from v2.12.0 to v2.13.0 * https://github.com/prometheus/prometheus/releases/tag/v2.13.0	2019-10-06 18:22:20 -07:00
Dalton Hubble	19de38b30d	Fix Prometheus etcd metrics scraping * Prometheus was configured to use kubernetes discovery of etcd targets based on nodes matching the node label node-role.kubernetes.io/controller=true * Kubernetes v1.16 stopped permitting node role labels node-role.kubernetes.io/* so Typhoon renamed these labels (no longer any association with roles) to node.kubermetes.io/controller=true * As a result, Prometheus didn't discover etcd targets, etcd metrics were missing, etcd alerts were ineffective, and the etcd Grafana dashboard was empty * Introduced: https://github.com/poseidon/typhoon/pull/543	2019-10-03 19:07:05 -07:00

1 2 3 4 5

236 Commits