typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-03 19:04:38 +02:00

Author	SHA1	Message	Date
Dalton Hubble	90edcd3d77	Update node-exporter from v1.0.0-rc.0 to v1.0.0-rc.1 * https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.1	2020-05-15 18:03:19 -07:00
Dalton Hubble	a927c7c790	Update kube-state-metrics from v1.9.5 to v1.9.6 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.6	2020-05-15 17:42:24 -07:00
Dalton Hubble	d952576d2f	Update Grafana from v7.0.0-beta3 to v7.0.0 * https://github.com/grafana/grafana/releases/tag/7.0.0	2020-05-15 17:38:59 -07:00
Dalton Hubble	f4194cd57a	Update Grafana from v7.0.0-beta2 to v7.0.0-beta.3 * https://github.com/grafana/grafana/releases/tag/v7.0.0-beta3	2020-05-09 17:50:40 -07:00
Dalton Hubble	3f0a5d2715	Update Grafana from v7.0.0-beta1 to v7.0.0-beta2 * https://github.com/grafana/grafana/releases/tag/v7.0.0-beta2	2020-05-07 23:04:44 -07:00
Dalton Hubble	33173c0206	Update Prometheus from v2.18.0 to v2.18.1 * https://github.com/prometheus/prometheus/releases/tag/v2.18.1	2020-05-07 22:59:11 -07:00
Dalton Hubble	70f30d9c07	Update Prometheus from v2.18.0-rc.1 to v2.18.0 * https://github.com/prometheus/prometheus/releases/tag/v2.18.0	2020-05-05 22:31:11 -07:00
Dalton Hubble	6afc1643d9	Update nginx-ingress from v0.30.0 to v0.32.0 * Add support for IngressClass and RBAC authorization * Since our nginx ingress controller example uses the flag `--ingress-class=public`, add an IngressClass to go along with it Rel: https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class	2020-05-03 23:24:19 -07:00
Dalton Hubble	e71e27e769	Update Prometheus from v2.17.2 to v2.18.0-rc.1 * https://github.com/prometheus/prometheus/releases/tag/v2.18.0-rc.1	2020-04-29 20:57:48 -07:00
Dalton Hubble	64035005d4	Update Grafana from v6.7.2 to v7.0.0-beta1 * https://github.com/grafana/grafana/releases/tag/v7.0.0-beta1	2020-04-29 20:53:30 -07:00
Dalton Hubble	fd044ee117	Enable Kubelet TLS bootstrap and NodeRestriction * Enable bootstrap token authentication on kube-apiserver * Generate the bootstrap.kubernetes.io/token Secret that may be used as a bootstrap token * Generate a bootstrap kubeconfig (with a bootstrap token) to be securely distributed to nodes. Each Kubelet will use the bootstrap kubeconfig to authenticate to kube-apiserver as `system:bootstrappers` and send a node-unique CSR for kube-controller-manager to automatically approve to issue a Kubelet certificate and kubeconfig (expires in 72 hours) * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the `system:node-bootstrapper` ClusterRole * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the csr nodeclient ClusterRole * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the csr selfnodeclient ClusterRole * Enable NodeRestriction admission controller to limit the scope of Node or Pod objects a Kubelet can modify to those of the node itself * Ability for a Kubelet to delete its Node object is retained as preemptible nodes or those in auto-scaling instance groups need to be able to remove themselves on shutdown. This need continues to have precedence over any risk of a node deleting itself maliciously Security notes: 1. Issued Kubelet certificates authenticate as user `system:node:NAME` and group `system:nodes` and are limited in their authorization to perform API operations by Node authorization and NodeRestriction admission. Previously, a Kubelet's authorization was broader. This is the primary security motivation. 2. The bootstrap kubeconfig credential has the same sensitivity as the previous generated TLS client-certificate kubeconfig. It must be distributed securely to nodes. Its compromise still allows an attacker to obtain a Kubelet kubeconfig 3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers a slight security improvement. * An attacker who obtains the kubeconfig can likely obtain the bootstrap kubeconfig as well, to obtain the ability to renew their access * A compromised bootstrap kubeconfig could plausibly be handled by replacing the bootstrap token Secret, distributing the token to new nodes, and expiration. Whereas a compromised TLS-client certificate kubeconfig can't be revoked (no CRL). However, replacing a bootstrap token can be impractical in real cluster environments, so the limited lifetime is mostly a theoretical benefit. * Cluster CSR objects are visible via kubectl which is nice 4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet clients have more identity information, which can improve the utility of audits and future features Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/ Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/185	2020-04-28 19:35:33 -07:00
Dalton Hubble	84ed0a31c3	Update Prometheus from v2.17.1 to v2.17.2 * https://github.com/prometheus/prometheus/releases/tag/v2.17.2	2020-04-20 18:09:24 -07:00
Dalton Hubble	2b5dfece93	Update Grafana from v6.7.1 to v6.7.2 * https://github.com/grafana/grafana/releases/tag/v6.7.2	2020-04-04 13:13:19 -07:00
Dalton Hubble	d47d40b517	Refresh Prometheus rules/alerts and Grafana dashboards * Refresh upstream Prometheus rules and alerts and Grafana dashboards * All Loki recording rules for convenience	2020-03-31 21:53:01 -07:00
Dalton Hubble	076b8e3c42	Update Prometheus from v2.17.0 to v2.17.1 * https://github.com/prometheus/prometheus/releases/tag/v2.17.1	2020-03-26 22:17:13 -07:00
Dalton Hubble	e556bc2167	Update Prometheus from v2.17.0-rc.3 to v2.17.0 * https://github.com/prometheus/prometheus/releases/tag/v2.17.0	2020-03-24 23:15:49 -07:00
Dalton Hubble	ddc1ff5348	Update Grafana from v6.6.2 to v6.7.1 * https://github.com/grafana/grafana/releases/tag/v6.7.1	2020-03-21 15:27:55 -07:00
Dalton Hubble	61557e89a6	Update Prometheus from v2.16.0 to v2.17.0-rc.3 * https://github.com/prometheus/prometheus/releases/tag/v2.17.0-rc.3	2020-03-19 22:38:05 -07:00
Dalton Hubble	75fb4e5d11	Remove Container Linux Update Operator (CLUO) addon * Stop providing example manifests for the Container Linux Update Operator (CLUO) * CLUO requires patches to support Kubernetes v1.16+, but the project and push access is rather unowned * CLUO hasn't been in active use in our clusters and won't be relevant beyond Container Linux. Not to say folks can't patch it and run it on their own. Examples just aren't provided here Related: https://github.com/coreos/container-linux-update-operator/pull/197	2020-03-16 22:05:17 -07:00
Dalton Hubble	c4683c5bad	Refresh Prometheus alerts and Grafana dashboards * Add 2 min wait before KubeNodeUnreachable to be less noisy on premeptible clusters * Add a BlackboxProbeFailure alert for any failing probes for services annotated `prometheus.io/probe: true`	2020-03-02 20:08:37 -08:00
Dalton Hubble	f4d260645c	Update node-exporter from v0.18.1 to v1.0.0-rc.0 * Update mdadm alert rule; node-exporter adds `state` label to `node_md_disks` and removes `node_md_disks_active` * https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.0	2020-02-25 22:29:52 -08:00
Dalton Hubble	d9219a6722	Update nginx-ingress from v0.29.0 to v0.30.0 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.30.0	2020-02-25 22:11:59 -08:00
Dalton Hubble	60c7eb85ee	Update nginx-ingress from v0.28.0 to v0.29.0 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.29.0	2020-02-22 15:57:59 -08:00
Dalton Hubble	4c964b56a0	Update kube-state-metrics from v1.9.4 to v1.9.5 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.5	2020-02-22 15:21:10 -08:00
Dalton Hubble	1fbd6835f2	Update Grafana from v6.6.1 to v6.6.2 * https://github.com/grafana/grafana/releases/tag/v6.6.2	2020-02-22 15:19:24 -08:00
Dalton Hubble	7ca03e5219	Update Prometheus from v1.15.2 to v1.16.0 * https://github.com/prometheus/prometheus/releases/tag/v2.16.0	2020-02-14 12:10:56 -08:00
Dalton Hubble	34c3d7cc39	Update Grafana from v6.6.0 to v6.6.1 * https://github.com/grafana/grafana/releases/tag/v6.6.1	2020-02-08 14:50:33 -08:00
Dalton Hubble	e339fbd2b6	Update kube-state-metrics from v1.9.3 to v1.9.4 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.4	2020-02-04 21:33:34 -08:00
Dalton Hubble	b19ba16afa	Update nginx-ingress from v0.27.1 to v0.28.0 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.28.0	2020-01-30 18:00:23 -08:00
Dalton Hubble	d127a7345c	Update Grafana from v6.5.3 to v6.6.0 * https://github.com/grafana/grafana/releases/tag/v6.6.0	2020-01-27 20:46:32 -08:00
Dalton Hubble	d5b7ce8f27	Update kube-state-metrics from v1.9.2 to v1.9.3 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.3	2020-01-23 00:03:16 -08:00
Dalton Hubble	bda73264f7	Update nginx-ingress from v0.26.1 to v0.27.1 * Change runAsUser from 33 to 101 for new alpine-based image * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.0 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.1	2020-01-20 15:22:16 -08:00
Dalton Hubble	03ff3a9cf3	Update kube-state-metrics from v1.9.1 to v1.9.2 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.2	2020-01-18 15:32:10 -08:00
Dalton Hubble	48703f9906	Update Grafana from v6.5.2 to v6.5.3 * https://github.com/grafana/grafana/releases/tag/v6.5.3	2020-01-18 15:30:39 -08:00
Dalton Hubble	0e2fc89f78	Update kube-state-metrics from v1.9.0 to v1.9.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.1	2020-01-11 14:15:55 -08:00
Dalton Hubble	73588cfad3	Update Prometheus from v2.15.1 to v2.15.2 * https://github.com/prometheus/prometheus/releases/tag/v2.15.2	2020-01-06 22:08:34 -08:00
Dalton Hubble	bb586b60da	Reduce Prometheus addon's node-exporter tolerations * Change node-exporter DaemonSet tolerations from tolerating all possible NoSchedule taints to tolerating the master taint and the not ready taint (we'd like metrics regardless) * Users who add custom node taints must add their custom taints to the addon node-exporter DaemonSet. As an addon, its expected users copy and manipulate manifests out-of-band in their own systems	2020-01-06 21:24:24 -08:00
Dalton Hubble	43e05b9131	Enable kube-proxy metrics and allow Prometheus scrapes * Configure kube-proxy --metrics-bind-address=0.0.0.0 (default 127.0.0.1) to serve metrics on 0.0.0.0:10249 * Add firewall rules to allow Prometheus (resides on a worker) to scrape kube-proxy service endpoints on controllers or workers * Add a clusterIP: None service for kube-proxy endpoint discovery	2020-01-06 21:11:18 -08:00
Dalton Hubble	a4e843693f	Update Prometheus from v2.15.0 to v2.15.1 * https://github.com/prometheus/prometheus/releases/tag/v2.15.1	2019-12-26 09:12:55 -05:00
Dalton Hubble	f48e43c0b1	Update Prometheus from v2.14.0 to v2.15.0 * https://github.com/prometheus/prometheus/releases/tag/v2.15.0	2019-12-24 10:52:19 -05:00
Dalton Hubble	52d11096dc	Update kube-state-metrics from v1.9.0-rc.1 to v1.9.0 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0	2019-12-20 13:53:37 -08:00
Dalton Hubble	0ecb995890	Update kube-state-metrics from v1.8.0 to v1.9.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0	2019-12-14 17:20:49 -08:00
Dalton Hubble	1b9fa2e688	Update Grafana from v6.5.1 to v6.5.2 * https://github.com/grafana/grafana/releases/tag/v6.5.2	2019-12-14 15:25:48 -08:00
Dalton Hubble	178afe4a9b	Reduce apiserver metrics cardinality and extraneous labels * Stop mapping node labels to targets discovered via Kubernetes nodes (e.g. etcd, kubelet, cadvisor). It is rarely useful to store node labels (e.g. kubernetes.io/os=linux) on these metrics * kube-apiserver's apiserver_request_duration_seconds_bucket metric has a high cardinality that includes labels for the API group, verb, scope, resource, and component for each object type, including for each CRD. This one metric has ~10k time series in a typical cluster (btw 10-40% of total) * Removing the apiserver request duration outright would make latency alerts a NoOp and break a Grafana apiserver panel. Instead, drop series that have a "group" label. Effectively, only request durations for core Kubernetes APIs will be kept (e.g. cardinality won't grow with each CRD added). This reduces the metric to ~2k unique series	2019-12-08 22:48:25 -08:00
Dalton Hubble	26674083b6	Update Grafana from v6.5.0 to v6.5.1 * https://github.com/grafana/grafana/releases/tag/v6.5.1	2019-11-28 14:11:25 -08:00
Dalton Hubble	030a4cec19	Update Grafana from v6.4.4 to v6.5.0 * https://grafana.com/docs/guides/whats-new-in-v6-5/	2019-11-25 22:45:58 -08:00
Dalton Hubble	ddea7dc452	Use new resource dashboards in Grafana deployment * kubernetes-mixin pod resource dashboards were split into two ConfigMap parts because they provide richer networking details * New dashboards have been used by the author at the global level, but were missing in the per-cluster Grafana tracked here	2019-11-25 22:27:11 -08:00
Dalton Hubble	525ae23305	Add node-exporter alerts and Grafana dashboard * Add Prometheus alerts from node-exporter * Add Grafana dashboard nodes.json, from node-exporter * Not adding recording rules, since those are only used by some node-exporter USE dashboards not being included	2019-11-16 13:47:20 -08:00
Dalton Hubble	42b6df89c8	Update Prometheus from v2.14.0-rc.0 to v2.14.0 * https://github.com/prometheus/prometheus/releases/tag/v2.14.0	2019-11-13 13:41:11 -08:00
Dalton Hubble	a8b7792338	Update Grafana from v6.4.3 to v6.4.4 * https://github.com/grafana/grafana/releases/tag/v6.4.4	2019-11-07 12:00:25 -08:00

1 2 3 4 5

248 Commits