typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-04 18:14:38 +02:00

Author	SHA1	Message	Date
Dalton Hubble	41a9d86bc3	Add NetworkPolicy to limit traffic into Prometheus * Allow traffic from Grafana to Prometheus in monitoring * Allow traffic from Prometheus to Prometheus in monitoring * NetworkPolicy denies non-whitelisted traffic. Define policy to allow other access	2019-03-23 21:38:34 -07:00
Dalton Hubble	36e31fc9fa	Add liveness and readiness probes to Grafana * https://github.com/grafana/grafana/issues/3302	2019-03-23 17:55:37 -07:00
Dalton Hubble	6dd2731046	Set cpu/memory resources requests/limits for some addons * Set resource requests and limits for Grafana and CLUO * Set resource requests for Prometheus, but allow usage to grow since needs vary widely * Leave nginx without resource requests/limits for now, its typically well behaved	2019-03-20 00:15:08 -07:00
Dalton Hubble	aa630003a4	Refresh Prometheus rules and Grafana dashboards * Refresh rules and dashboards from upstreams * Organize dashboards and stay below the ConfigMap size limit	2019-03-17 13:23:04 -07:00
Dalton Hubble	e0bee2e417	Update Prometheus from v2.7.2 to v2.8.0 * https://github.com/prometheus/prometheus/releases/tag/v2.8.0	2019-03-13 22:11:38 -07:00
Dalton Hubble	4d9a692424	Update Prometheus from v2.7.1 to v2.7.2 * https://github.com/prometheus/prometheus/releases/tag/v2.7.2	2019-03-04 23:08:12 -08:00
Dalton Hubble	e483c81ce9	Improve Prometheus rules and alerts and Grafana dashboards * Collate upstream rules, alerts, and dashboards and tune for use in Typhoon * Previously, a well-chosen (but older) set of rules, alerts, and dashboards were maintained to reflect metric name changes	2019-02-18 12:19:23 -08:00
Dalton Hubble	b13a651cfe	Drop metrics that are unset, high cardinality, or extraneous * https://github.com/coreos/prometheus-operator/pull/2387 * https://github.com/coreos/prometheus-operator/pull/1959	2019-02-10 23:56:11 -08:00
Dalton Hubble	9c59f393a5	Add Kubernetes pod name to metrics discovered from service endpoints * Prometheus queries from some upstreams use joins of node-exporter and kube-state-metrics metrics by (namespace,pod). Add the Kubernetes pod name to service endpoint metrics * Rename the kubernetes_namespace field to namespace * Honor labels since kube-state-metrics already include a `pod` field that should not be overridden	2019-02-10 23:54:30 -08:00
Dalton Hubble	949ce21fb2	Update Prometheus from v2.7.0 to v2.7.1 * https://github.com/prometheus/prometheus/releases/tag/v2.7.1	2019-02-02 00:13:24 -08:00
Dalton Hubble	130daeac26	Update Prometheus from v2.6.1 to v2.7.0	2019-01-29 22:31:20 -08:00
Dalton Hubble	f5ff003d0e	Update node-exporter from v0.15.2 to v0.17.0 * node-exporter renamed multiple metrics that are reflected in changes to Prometheus rules and Grafana dashboard expressions	2019-01-22 01:14:00 -08:00
Dalton Hubble	d697dd46dc	Allow kube-state-metrics PodDisruptionBudget metrics * Update kube-state-metrics ClusterRole to allow collecting poddisruptionbudget metrics (exported as kube_poddisruptionbudget_) https://github.com/kubernetes/kube-state-metrics/pull/551 * Bump addon-resizer from v1.7 to v1.8.4	2019-01-22 01:12:32 -08:00
Dalton Hubble	67fb9602e7	Update Prometheus from v2.6.0 to v2.6.1 * https://github.com/prometheus/prometheus/releases/tag/v2.6.1	2019-01-15 21:13:40 -08:00
Dalton Hubble	1d27dc6528	Update kube-state-metrics exporter from v1.4.0 to v1.5.0 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.5.0	2019-01-12 14:24:57 -08:00
Dalton Hubble	ea8b0d1c84	Update Prometheus addon from v2.5.0 to v2.6.0 * https://github.com/prometheus/prometheus/releases/tag/v2.6.0	2018-12-27 07:35:12 -08:00
Dalton Hubble	7de03a1279	Fix Prometheus etcd scrape config for DigitalOcean * Kubelet uses a node's hostname as the node name, which isn't resolvable on DigitalOcean. On DigitalOcean, the node name was set to the internal IP until #337 switched to instead configuring kube-apiserver to prefer the InternalIP for communication * Explicitly configure etcd scrapes to target each controller by internal IP and port 2381 (replace __address__)	2018-11-06 23:02:45 -08:00
Dalton Hubble	be9f7b87d6	Update Prometheus from v2.4.3 to v2.5.0 * https://github.com/prometheus/prometheus/releases/tag/v2.5.0	2018-11-06 22:16:12 -08:00
Dalton Hubble	a10d6977b8	Update Prometheus from v2.4.2 to v2.4.3 * https://github.com/prometheus/prometheus/releases/tag/v2.4.3	2018-10-16 21:29:41 -07:00
Dalton Hubble	5eb4078d68	Add docker/default seccomp to control plane and addons * Annotate pods, deployments, and daemonsets to start containers with the Docker runtime's default seccomp profile * Overrides Kubernetes default behavior which started containers with seccomp=unconfined * https://docs.docker.com/engine/security/seccomp/#pass-a-profile-for-a-container	2018-10-16 20:07:29 -07:00
Dalton Hubble	032a24133b	Update Prometheus from v2.3.2 to v2.4.2 * https://github.com/prometheus/prometheus/releases/tag/v2.4.0 * https://github.com/prometheus/prometheus/releases/tag/v2.4.1 * https://github.com/prometheus/prometheus/releases/tag/v2.4.2	2018-09-21 22:27:11 -07:00
Dalton Hubble	4ba090feb0	Update kube-state-metrics from v1.3.1 to v1.4.0	2018-08-29 09:37:50 -07:00
Becca Powell	49a9dc9b8b	Fix typo in Prometheus alerting rules	2018-08-21 16:55:49 -07:00
Dalton Hubble	02cd8eb8d3	Update Prometheus from v2.3.1 to v2.3.2 * https://github.com/prometheus/prometheus/releases/tag/v2.3.2	2018-07-14 14:25:49 -07:00
Dalton Hubble	84d6cfe7b3	Add Prometheus alert rule for inactive md devices * node-exporter exposes metrics to Prometheus about total and active md devices (e.g. disks in mdadm RAID arrays) * Add alert that fires when a RAID disk fails or becomes inactive for another reason	2018-07-10 00:20:30 -07:00
Dalton Hubble	05b99178ae	Update prometheus from v2.3.0 to v2.3.1 * https://github.com/prometheus/prometheus/releases/tag/v2.3.1	2018-06-19 21:43:50 -07:00
Dalton Hubble	cbe646fba6	Label namespaces to ease writing Network Policies	2018-06-09 11:45:11 -07:00
Dalton Hubble	c166b2ba33	Update prometheus from v2.2.1 to v2.3.0	2018-06-09 11:43:10 -07:00
Dalton Hubble	32a9a83190	Add Prometheus liveness and readiness probes	2018-05-30 22:34:07 -07:00
Dalton Hubble	c2b719dc75	Configure Prometheus to scrape Kubelets directly * Use Kubelet bearer token authn/authz to scrape metrics * Drop RBAC permission from nodes/proxy to nodes/metrics * Stop proxying kubelet scrapes through the apiserver, since this required higher privilege (nodes/proxy) and can add load to the apiserver on large clusters	2018-05-14 23:06:50 -07:00
Dalton Hubble	a54e3c0da1	Fix Prometheus data dir to /var/lib/prometheus * A data volume (emptyDir) is mounted to /var/lib/prometheus * Users could swap emptyDir for any desired volume if data persistence is desired. Prometheus previously defaulted to keeping its data in ./data relative to /prometheus. Override this behavior to store data in /var/lib/prometheus	2018-05-01 22:05:27 -07:00
Dalton Hubble	9789881243	Update kube-state-metrics from v1.3.0 to v1.3.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.3.1	2018-04-15 17:10:02 -07:00
Dalton Hubble	6b08bde479	Use k8s.gcr.io instead of gcr.io/google_containers * Kubernetes recommends using the alias to fetch images from the nearest GCR regional mirror, to abstract the use of GCR, and to drop names containing 'google' * https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ	2018-04-08 12:57:52 -07:00
Dalton Hubble	f4b2396718	Return Prometheus deployment to be a worker workload * Expose etcd metrics to workers so Prometheus can run on a worker, rather than a controller * Drop temporary firewall rules allowing Prometheus to run on a controller and scrape targes * Related to https://github.com/poseidon/typhoon/pull/175	2018-04-08 12:20:00 -07:00
Dalton Hubble	7186aa46da	Update kube-state-metrics from v1.2.0 to v1.3.0 * https://github.com/kubernetes/kube-state-metrics/pull/412 * https://github.com/kubernetes/kube-state-metrics/pull/413	2018-04-04 21:04:13 -07:00
Dalton Hubble	d770393dbc	Add etcd metrics, Prometheus scrapes, and Grafana dash * Use etcd v3.3 --listen-metrics-urls to expose only metrics data via http://0.0.0.0:2381 on controllers * Add Prometheus discovery for etcd peers on controller nodes * Temporarily drop two noisy Prometheus alerts	2018-04-03 20:31:00 -07:00
Dalton Hubble	46226a8015	Update Prometheus from 2.2.0 to 2.2.1	2018-03-18 15:56:44 -07:00
Dalton Hubble	42708f9a70	Update Prometheus from v2.2.0-rc.1 to v2.2.0 * https://github.com/prometheus/prometheus/releases/tag/v2.2.0	2018-03-09 00:20:40 -08:00
Dalton Hubble	9307e97c46	addons: Update Prometheus from v2.1.0 to v2.2.0 * Annotate Prometheus service to scrape metrics from Prometheus itself (enables Prometheus* alerts) * Update kube-state-metrics addon-resizer to 1.7 * Use port 8080 for kube-state-metrics * Add PrometheusNotIngestingSamples alert rule * Change K8SKubeletDown alert rule to fire when 10% of kubelets are down, not 1% * https://github.com/coreos/prometheus-operator/pull/1032	2018-03-09 00:20:40 -08:00
Paul Saunders	86420fd507	Rename namespace manifests to be applied first * Ensure kubectl apply -R creates manifests in the right order	2018-02-22 01:04:30 -08:00
Dalton Hubble	2c10d24113	addons: Switch to apps/v1 workload APIs * Deployments now belong to the apps/v1 API group * DaemonSets now belong to the apps/v1 API group * RBAC types now belong to the rbac.authorization.k8s.io/v1 API group	2018-02-10 23:56:31 -08:00
Dalton Hubble	064ce83f25	addons: Update Prometheus to v2.1.0 * Change service discovery to relabel jobs to align with rule expressions in upstream examples * Use a separate service account for prometheus instead of granting roles to the namespace's default * Use a separate service account for node-exporter * Update node-exporter and kube-state-metrics exporters	2018-01-27 21:00:15 -08:00
Dalton Hubble	996651c605	Update kube-state-metrics version and RBAC cluster role * https://github.com/kubernetes/kube-state-metrics/pull/345 * https://github.com/kubernetes/kube-state-metrics/pull/334	2018-01-15 08:33:44 -08:00
Dalton Hubble	65f006e6cc	addons: Sync prometheus alerts to upstream * https://github.com/coreos/prometheus-operator/pull/774	2017-12-01 23:24:08 -08:00
Dalton Hubble	63ab117205	addons: Add prometheus rules for DaemonSets * https://github.com/coreos/prometheus-operator/pull/755	2017-11-16 23:51:21 -08:00
Dalton Hubble	1cd262e712	addons: Fix prometheus K8SApiServerLatency alert rule * https://github.com/coreos/prometheus-operator/issues/751	2017-11-16 23:37:15 -08:00
Dalton Hubble	159443bae7	addons: Add better alerting rules to Prometheus manifests * Adapt the coreos/prometheus-operator alerting rules for Typhoon, https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus/manifests * Add controller manager and scheduler shim services to let prometheus discover them via service endpoints * Fix several alert rules to use service endpoint discovery * A few rules still don't do much, but they default to green	2017-11-10 20:57:47 -08:00
Dalton Hubble	f570af9418	addons: Update from Prometheus v1.8.2 to v2.0.0	2017-11-08 22:48:23 -08:00
Dalton Hubble	10b977d54a	addons: Set kube-state-metrics to have clusterIP None * kube-state-metrics service exists to facilitate prometheus discovery	2017-11-05 17:54:09 -08:00
Dalton Hubble	b7a268fc45	addons: Add prometheus alertmanager flag * Pass -alertmanager.url to work with a user's in-cluster alertmanager deployment, if any	2017-11-05 15:50:46 -08:00

1 2 3

106 Commits