typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-03 22:34:38 +02:00

Author	SHA1	Message	Date
Dalton Hubble	43e05b9131	Enable kube-proxy metrics and allow Prometheus scrapes * Configure kube-proxy --metrics-bind-address=0.0.0.0 (default 127.0.0.1) to serve metrics on 0.0.0.0:10249 * Add firewall rules to allow Prometheus (resides on a worker) to scrape kube-proxy service endpoints on controllers or workers * Add a clusterIP: None service for kube-proxy endpoint discovery	2020-01-06 21:11:18 -08:00
Dalton Hubble	a4e843693f	Update Prometheus from v2.15.0 to v2.15.1 * https://github.com/prometheus/prometheus/releases/tag/v2.15.1	2019-12-26 09:12:55 -05:00
Dalton Hubble	f48e43c0b1	Update Prometheus from v2.14.0 to v2.15.0 * https://github.com/prometheus/prometheus/releases/tag/v2.15.0	2019-12-24 10:52:19 -05:00
Dalton Hubble	52d11096dc	Update kube-state-metrics from v1.9.0-rc.1 to v1.9.0 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0	2019-12-20 13:53:37 -08:00
Dalton Hubble	0ecb995890	Update kube-state-metrics from v1.8.0 to v1.9.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0	2019-12-14 17:20:49 -08:00
Dalton Hubble	1b9fa2e688	Update Grafana from v6.5.1 to v6.5.2 * https://github.com/grafana/grafana/releases/tag/v6.5.2	2019-12-14 15:25:48 -08:00
Dalton Hubble	178afe4a9b	Reduce apiserver metrics cardinality and extraneous labels * Stop mapping node labels to targets discovered via Kubernetes nodes (e.g. etcd, kubelet, cadvisor). It is rarely useful to store node labels (e.g. kubernetes.io/os=linux) on these metrics * kube-apiserver's apiserver_request_duration_seconds_bucket metric has a high cardinality that includes labels for the API group, verb, scope, resource, and component for each object type, including for each CRD. This one metric has ~10k time series in a typical cluster (btw 10-40% of total) * Removing the apiserver request duration outright would make latency alerts a NoOp and break a Grafana apiserver panel. Instead, drop series that have a "group" label. Effectively, only request durations for core Kubernetes APIs will be kept (e.g. cardinality won't grow with each CRD added). This reduces the metric to ~2k unique series	2019-12-08 22:48:25 -08:00
Dalton Hubble	26674083b6	Update Grafana from v6.5.0 to v6.5.1 * https://github.com/grafana/grafana/releases/tag/v6.5.1	2019-11-28 14:11:25 -08:00
Dalton Hubble	030a4cec19	Update Grafana from v6.4.4 to v6.5.0 * https://grafana.com/docs/guides/whats-new-in-v6-5/	2019-11-25 22:45:58 -08:00
Dalton Hubble	ddea7dc452	Use new resource dashboards in Grafana deployment * kubernetes-mixin pod resource dashboards were split into two ConfigMap parts because they provide richer networking details * New dashboards have been used by the author at the global level, but were missing in the per-cluster Grafana tracked here	2019-11-25 22:27:11 -08:00
Dalton Hubble	525ae23305	Add node-exporter alerts and Grafana dashboard * Add Prometheus alerts from node-exporter * Add Grafana dashboard nodes.json, from node-exporter * Not adding recording rules, since those are only used by some node-exporter USE dashboards not being included	2019-11-16 13:47:20 -08:00
Dalton Hubble	42b6df89c8	Update Prometheus from v2.14.0-rc.0 to v2.14.0 * https://github.com/prometheus/prometheus/releases/tag/v2.14.0	2019-11-13 13:41:11 -08:00
Dalton Hubble	a8b7792338	Update Grafana from v6.4.3 to v6.4.4 * https://github.com/grafana/grafana/releases/tag/v6.4.4	2019-11-07 12:00:25 -08:00
Dalton Hubble	a3807086d4	Update Prometheus from v2.13.1 to v2.14.0-rc.0 * Happy PromCon 2019! * https://github.com/prometheus/prometheus/releases/tag/v2.14.0-rc.0	2019-11-07 11:48:23 -08:00
Dalton Hubble	d4573092b5	Improve Kubelet and Compute Resource dashboards * Add cluster filter to Kubelet dashboard * Add network details in resource dashboards * https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/275 * https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/284 * https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/285	2019-10-28 02:22:15 -07:00
Dalton Hubble	eb7b6d39f2	Improve minor aspects of CoreDNS and nginx-ingress dashboards * Add default 10s refresh rate to custom dashboards to match those from Kubernetes * Show labels for "instance" as "pod" for clarity * Add cluster filter for internal use	2019-10-20 23:16:55 -07:00
Dalton Hubble	33d4c2fd68	Add explicit annotation for Prometheus port to scrape * Without the prometheus.io/port annotation, Prometheus service discovery can scrape other Prometheus ports that may be available. * For example, Prometheus sidecars (not included) may be scraped and that may be unintended	2019-10-20 16:05:09 -07:00
Dalton Hubble	de90cb9246	Remove kube-state-metrics addon-resizer * addon-resizer is outdated and has been dropped from kube-state-metrics examples. Those using it should look to the cluster-proportional-vertical-autoscaler. * Eliminate addon-resizer log spew * Remove associated Role and RoleBinding * Also fix kube-state-metrics readinessProbe port	2019-10-20 16:03:29 -07:00
Dalton Hubble	68da420adc	Refresh Prometheus rules/alerts and Grafana dashboards * Update Prometheus rules/alerts and Grafana dashboards * Remove dashboards that were moved to node-exporter, they may be added back later if valuable * Remove kube-prometheus based rules/alerts (ClockSkew alert)	2019-10-19 17:43:47 -07:00
Dalton Hubble	130c97f8eb	Update Prometheus from v2.13.0 to v2.13.1 * https://github.com/prometheus/prometheus/releases/tag/v2.13.1	2019-10-18 00:10:25 -07:00
Dalton Hubble	271d2f6b52	Update Grafana from v6.4.2 to v6.4.3 * https://github.com/grafana/grafana/releases/tag/v6.4.3	2019-10-18 00:08:39 -07:00
Dalton Hubble	e4ac1027c8	Update Grafana from v6.4.1 to v6.4.2 * https://github.com/grafana/grafana/releases/tag/v6.4.2	2019-10-15 22:58:43 -07:00
Dalton Hubble	69188af565	Rename CLUO label from "app" to "name" * Match the labeling pattern in other addons	2019-10-15 00:05:02 -07:00
Dalton Hubble	ab72f1ab2d	Update Prometheus from v2.12.0 to v2.13.0 * https://github.com/prometheus/prometheus/releases/tag/v2.13.0	2019-10-06 18:22:20 -07:00
Dalton Hubble	19de38b30d	Fix Prometheus etcd metrics scraping * Prometheus was configured to use kubernetes discovery of etcd targets based on nodes matching the node label node-role.kubernetes.io/controller=true * Kubernetes v1.16 stopped permitting node role labels node-role.kubernetes.io/* so Typhoon renamed these labels (no longer any association with roles) to node.kubermetes.io/controller=true * As a result, Prometheus didn't discover etcd targets, etcd metrics were missing, etcd alerts were ineffective, and the etcd Grafana dashboard was empty * Introduced: https://github.com/poseidon/typhoon/pull/543	2019-10-03 19:07:05 -07:00
Dalton Hubble	ca7d62720e	Update Grafana from v6.3.6 to v6.4.1 * https://github.com/grafana/grafana/releases/tag/v6.4.1	2019-10-02 20:36:05 -07:00
Dalton Hubble	26f8d76755	Update kube-state-metrics from v1.7.2 to v1.8.0 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.8.0	2019-10-01 20:50:33 -07:00
Dalton Hubble	7bcf2d7831	Update nginx-ingress from v0.25.1 to v0.26.1 * Add lifecycle hook to allow draining connections for up to 5 minutes	2019-09-30 22:01:07 -07:00
Dalton Hubble	f453c54956	Update Grafana from v6.3.5 to v6.3.6 * https://github.com/grafana/grafana/releases/tag/v6.3.6	2019-09-28 15:13:46 -07:00
Dalton Hubble	9da3725738	Update Kubernetes from v1.15.3 to v1.16.0 * Drop `node-role.kubernetes.io/master` and `node-role.kubernetes.io/node` node labels * Kubelet (v1.16) now rejects the node labels used in the kubectl get nodes ROLES output * https://github.com/kubernetes/kubernetes/issues/75457	2019-09-18 22:53:06 -07:00
Dalton Hubble	dc436b8fe9	Update Grafana from v6.3.4 to v6.3.5 * https://github.com/grafana/grafana/releases/tag/v6.3.5	2019-09-07 14:21:59 -07:00
Dalton Hubble	45bc52d156	Update Grafana from v6.3.3 to v6.3.4 * https://github.com/grafana/grafana/releases/tag/v6.3.4	2019-08-31 15:59:13 -07:00
Dalton Hubble	4ef2eb7e6b	Update Prometheus from v2.11.2 to v2.12.0 * https://github.com/prometheus/prometheus/releases/tag/v2.12.0	2019-08-18 20:59:44 -07:00
Dalton Hubble	99990e3cbb	Use stable IDs for etcd, CoreDNS, and Ngnix dashboards * Use unique dashboard ID so that multiple replicas of Grafana serve dashboards with uniform paths * Fix issue where refreshing a dashboard served by one replica could show a 404 unless the request went to the same replica	2019-08-18 12:45:49 -07:00
Dalton Hubble	0c45cd0f06	Update Grafana from v6.3.2 to v6.3.3 * https://github.com/grafana/grafana/releases/tag/v6.3.3	2019-08-16 14:40:47 -07:00
Dalton Hubble	976452825e	Update Prometheus from v2.11.0 to v2.11.2 * https://github.com/prometheus/prometheus/releases/tag/v2.11.2	2019-08-14 21:26:46 -07:00
Dalton Hubble	7bc5633c38	Update nginx-ingress from v0.25.0 to v0.25.1 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.25.1	2019-08-14 21:26:46 -07:00
Dalton Hubble	eaea4d37a2	Update Grafana from v6.2.5 to v6.3.2 * https://github.com/grafana/grafana/releases/tag/v6.3.2 * https://github.com/grafana/grafana/releases/tag/v6.3.1 * https://github.com/grafana/grafana/releases/tag/v6.3.0	2019-08-07 20:01:18 -07:00
Dalton Hubble	457ad18daa	Update kube-state-metrics from v1.7.1 to v1.7.2 * Add a separate liveness and readiness probe * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.2	2019-08-07 20:00:24 -07:00
Dalton Hubble	10d4d9e565	Add Grafana dashboards for CoreDNS and Nginx Ingress Controller * Add a CoreDNS dashboard originally based on an upstream dashboard, but now customized according to preferences * Add an Nginx Ingress Controller based on an upstream dashboard, but customized according to preferences	2019-08-05 22:49:19 -07:00
Dalton Hubble	68d8717924	Refresh Prometheus rules/alerts and Grafana dashboards * Refresh rules, alerts, and dashboards from upstreams	2019-07-21 11:29:34 -07:00
Dalton Hubble	f543f08867	Compact nginx-ingress ClusterRole rules * https://github.com/kubernetes/ingress-nginx/pull/4302	2019-07-20 20:31:06 -07:00
Dalton Hubble	e0be091acc	Update kube-state-metrics from v1.7.0 to v1.7.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.1	2019-07-20 20:17:08 -07:00
Dalton Hubble	6cd3e65267	Update kube-state-metrics from v1.7.0-rc.1 to v1.7.0 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0 * Add storageclasses and verticalpodautoscalers to ClusterRole	2019-07-19 00:14:47 -07:00
Dalton Hubble	70f5cfd33e	Update kube-state-metrics from v1.6.0 to v1.7.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0-rc.0	2019-07-13 13:13:57 -07:00
Dalton Hubble	eaf59bd33f	Update Prometheus from v2.11.0-rc.0 to v2.11.0 * https://github.com/prometheus/prometheus/releases/tag/v2.11.0	2019-07-09 21:33:24 -07:00
Dalton Hubble	40640f3697	Upgrade nginx-ingress from v0.24.1 to v0.25.0 * Support networking.k8s.io/v1beta1 apiVersion * Update RBAC cluster-role for networking.k8s.io/v1beta1 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.25.0	2019-07-08 22:04:50 -07:00
Dalton Hubble	28ab746068	Update Prometheus from v2.10.0 to v2.11.0-rc.0 * https://github.com/prometheus/prometheus/releases/tag/v2.11.0-rc.0	2019-07-08 21:32:50 -07:00
Dalton Hubble	9a395dbf88	Update Grafana from v6.2.4 to v6.2.5 * https://github.com/grafana/grafana/releases/tag/v6.2.5	2019-06-29 13:21:42 -07:00
Dalton Hubble	4ad69efc43	Update Grafana from v6.2.2 to v6.2.4 * https://github.com/grafana/grafana/releases/tag/v6.2.4	2019-06-19 21:51:54 -07:00

1 2 3 4 5 ...

261 Commits