Commit Graph

380 Commits

Author SHA1 Message Date
Dalton Hubble
e838d4dc3d Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh upstream Prometheus rules/alerts and Grafana dashboards
2020-09-13 15:03:27 -07:00
Dalton Hubble
979c092ef6 Reduce apiserver metrics cardinality of non-core APIs
* Reduce `apiserver_request_duration_seconds_count` cardinality
by dropping series for non-core Kubernetes APIs. This is done
to match `apiserver_request_duration_seconds_count` relabeling
* These two relabels must be performed the same way to avoid
affecting new SLO calculations (upcoming)
* See https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/498

Related: https://github.com/poseidon/typhoon/pull/596
2020-09-13 14:47:49 -07:00
Dalton Hubble
eb093af9ed Drop Kubelet labelmap relabel for node_name
* Originally, Kubelet and CAdvisor metrics used a labelmap
relabel to add Kubernetes SD node labels onto timeseries
* With https://github.com/poseidon/typhoon/pull/596 that
relabel was dropped since node labels aren't usually that
valuable. `__meta_kubernetes_node_name` was retained but
the field name is empty
* Favor just using Prometheus server-side `instance` in
queries that require some node identifier for aggregation
or debugging

Fix https://github.com/poseidon/typhoon/issues/823
2020-09-12 19:40:00 -07:00
Dalton Hubble
d236628e53 Update Prometheus from v2.20.0 to v2.21.0
* https://github.com/prometheus/prometheus/releases/tag/v2.21.0
2020-09-12 19:20:54 -07:00
Dalton Hubble
000c11edf6 Update IngressClass resources to networking.k8s.io/v1
* Kubernetes v1.19 graduated Ingress and IngressClass from
networking.k8s.io/v1beta1 to networking.k8s.io/v1
2020-09-10 23:25:53 -07:00
Dalton Hubble
29b16c3fc0 Change seccomp annotations to seccompProfile
* seccomp graduated to GA in Kubernetes v1.19. Support for
seccomp alpha annotations will be removed in v1.22
* Replace seccomp annotations with the GA seccompProfile
field in the PodTemplate securityContext
* Switch profile from `docker/default` to `runtime/default`
(no effective change, since docker is the runtime)
* Verify with docker inspect SecurityOpt. Without the profile,
you'd see `seccomp=unconfined`

Related: https://github.com/poseidon/terraform-render-bootstrap/pull/215
2020-09-10 01:15:07 -07:00
Dalton Hubble
d45dfdbf91 Update nginx-ingress from v0.34.1 to v0.35.0
* Repo changed to k8s.gcr.io/ingress-nginx/controller
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.35.0
2020-08-29 13:38:28 -07:00
Dalton Hubble
a504264e24 Update Grafana from v7.1.4 to v7.1.5
* https://github.com/grafana/grafana/releases/tag/v7.1.5
2020-08-27 08:52:07 -07:00
Dalton Hubble
58def65a09 Update Grafana from v7.1.3 to v7.1.4
* https://github.com/grafana/grafana/releases/tag/v7.1.4
2020-08-22 15:40:09 -07:00
Dalton Hubble
e1d6ab2f24 Update Grafana from v7.1.1 to v7.1.3
* https://github.com/grafana/grafana/releases/tag/v7.1.3
* https://github.com/grafana/grafana/releases/tag/v7.1.2
2020-08-08 18:59:49 -07:00
Dalton Hubble
2aef42d4f6 Update Prometheus from v2.19.2 to v2.20.0
* https://github.com/prometheus/prometheus/releases/tag/v2.20.0
2020-07-25 16:37:28 -07:00
Dalton Hubble
b7d67757de Update Grafana from v7.1.0 to v7.1.1
* https://github.com/grafana/grafana/releases/tag/v7.1.1
2020-07-25 16:33:40 -07:00
Dalton Hubble
618f8b30fd Update CoreDNS from v1.6.7 to v1.7.0
* https://coredns.io/2020/06/15/coredns-1.7.0-release/
* Update Grafana dashboard with revised metrics names
2020-07-25 15:51:31 -07:00
Dalton Hubble
efd4a0319d Update Grafana from v7.0.6 to v7.1.0
* https://github.com/grafana/grafana/releases/tag/v7.1.0
2020-07-18 13:54:56 -07:00
Dalton Hubble
a8d3d3bb12 Update ingress-nginx from v0.33.0 to v0.34.1
* Switch to ingress-nginx controller images from us.grc.io (eu, asia
can also be used if desired)
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.0
2020-07-15 22:43:49 -07:00
Dalton Hubble
dfd2a0ec23 Update Grafana from v7.0.5 to v7.0.6
* https://github.com/grafana/grafana/releases/tag/v7.0.6
2020-07-09 21:10:48 -07:00
Dalton Hubble
e3bf7d8f9b Update Prometheus from v2.19.1 to v2.19.2
* https://github.com/prometheus/prometheus/releases/tag/v2.19.2
2020-07-09 21:08:55 -07:00
Dalton Hubble
74e025c9e4 Update Grafana from v7.0.4 to v7.0.5
* https://github.com/grafana/grafana/releases/tag/v7.0.5
2020-07-05 15:49:34 -07:00
Dalton Hubble
21178868db Revert "Update Prometheus from v2.19.1 to v2.19.2"
* Prometheus has not published the v1.19.2
* This reverts commit 81b6f54169.
2020-06-27 14:53:58 -07:00
Dalton Hubble
81b6f54169 Update Prometheus from v2.19.1 to v2.19.2
* https://github.com/prometheus/prometheus/releases/tag/v2.19.2
2020-06-27 14:34:30 -07:00
Dalton Hubble
a79ad34ba3 Update Grafana from v7.0.3 to v7.0.4
* https://github.com/grafana/grafana/releases/tag/v7.0.4
2020-06-26 02:06:38 -07:00
Dalton Hubble
99a11442c7 Update Prometheus from v2.19.0 to v2.19.1
* https://github.com/prometheus/prometheus/releases/tag/v2.19.1
2020-06-26 02:01:58 -07:00
Dalton Hubble
bc9b808d44 Update nginx-ingress from v0.32.0 to v0.33.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-0.33.0
2020-06-16 18:44:40 -07:00
Dalton Hubble
04520e447c Update node-exporter from v1.0.0 to v1.0.1
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.1
2020-06-16 17:57:09 -07:00
Dalton Hubble
c9059d3fe9 Update Prometheus from v2.19.0-rc.0 to v2.19.0
* https://github.com/prometheus/prometheus/releases/tag/v2.19.0
2020-06-09 23:05:03 -07:00
Dalton Hubble
31d02b0221 Update Prometheus from v2.18.1 to v2.19.0-rc.0
* https://github.com/prometheus/prometheus/releases/tag/v2.19.0-rc.0
2020-06-05 00:16:45 -07:00
Dalton Hubble
8f875f80f5 Update Grafana from v7.0.1 to v7.0.3
* https://github.com/grafana/grafana/releases/tag/v7.0.2
* https://github.com/grafana/grafana/releases/tag/v7.0.3
2020-06-03 12:31:58 -07:00
Dalton Hubble
16c0b9152b Update kube-state-metrics from v1.9.6 to v1.9.7
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.7
2020-06-03 11:35:10 -07:00
Dalton Hubble
187bb17d39 Update Grafana from v7.0.0 to v7.0.1
* https://github.com/grafana/grafana/releases/tag/v7.0.1
2020-05-27 21:35:24 -07:00
Dalton Hubble
abc31c3711 Update node-exporter from v1.0.0-rc.1 to v1.0.0
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0
2020-05-27 21:33:03 -07:00
Dalton Hubble
3bdddc452c Update Grafana from v7.0.0-beta2 to v7.0.0
* https://grafana.com/docs/grafana/latest/guides/whats-new-in-v7-0/
2020-05-18 23:42:32 -07:00
Dalton Hubble
2578be1f96 Rollback Grafana to v7.0.0-beta3, v7.0.0 image is missing
* Grafana hasn't published the v7.0.0 image yet
2020-05-16 12:32:10 -07:00
Dalton Hubble
90edcd3d77 Update node-exporter from v1.0.0-rc.0 to v1.0.0-rc.1
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.1
2020-05-15 18:03:19 -07:00
Dalton Hubble
a927c7c790 Update kube-state-metrics from v1.9.5 to v1.9.6
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.6
2020-05-15 17:42:24 -07:00
Dalton Hubble
d952576d2f Update Grafana from v7.0.0-beta3 to v7.0.0
* https://github.com/grafana/grafana/releases/tag/7.0.0
2020-05-15 17:38:59 -07:00
Dalton Hubble
f4194cd57a Update Grafana from v7.0.0-beta2 to v7.0.0-beta.3
* https://github.com/grafana/grafana/releases/tag/v7.0.0-beta3
2020-05-09 17:50:40 -07:00
Dalton Hubble
3f0a5d2715 Update Grafana from v7.0.0-beta1 to v7.0.0-beta2
* https://github.com/grafana/grafana/releases/tag/v7.0.0-beta2
2020-05-07 23:04:44 -07:00
Dalton Hubble
33173c0206 Update Prometheus from v2.18.0 to v2.18.1
* https://github.com/prometheus/prometheus/releases/tag/v2.18.1
2020-05-07 22:59:11 -07:00
Dalton Hubble
70f30d9c07 Update Prometheus from v2.18.0-rc.1 to v2.18.0
* https://github.com/prometheus/prometheus/releases/tag/v2.18.0
2020-05-05 22:31:11 -07:00
Dalton Hubble
6afc1643d9 Update nginx-ingress from v0.30.0 to v0.32.0
* Add support for IngressClass and RBAC authorization
* Since our nginx ingress controller example uses the flag
`--ingress-class=public`, add an IngressClass to go along
with it

Rel: https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class
2020-05-03 23:24:19 -07:00
Dalton Hubble
e71e27e769 Update Prometheus from v2.17.2 to v2.18.0-rc.1
* https://github.com/prometheus/prometheus/releases/tag/v2.18.0-rc.1
2020-04-29 20:57:48 -07:00
Dalton Hubble
64035005d4 Update Grafana from v6.7.2 to v7.0.0-beta1
* https://github.com/grafana/grafana/releases/tag/v7.0.0-beta1
2020-04-29 20:53:30 -07:00
Dalton Hubble
fd044ee117 Enable Kubelet TLS bootstrap and NodeRestriction
* Enable bootstrap token authentication on kube-apiserver
* Generate the bootstrap.kubernetes.io/token Secret that
may be used as a bootstrap token
* Generate a bootstrap kubeconfig (with a bootstrap token)
to be securely distributed to nodes. Each Kubelet will use
the bootstrap kubeconfig to authenticate to kube-apiserver
as `system:bootstrappers` and send a node-unique CSR for
kube-controller-manager to automatically approve to issue
a Kubelet certificate and kubeconfig (expires in 72 hours)
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the `system:node-bootstrapper`
ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr nodeclient ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr selfnodeclient ClusterRole
* Enable NodeRestriction admission controller to limit the
scope of Node or Pod objects a Kubelet can modify to those of
the node itself
* Ability for a Kubelet to delete its Node object is retained
as preemptible nodes or those in auto-scaling instance groups
need to be able to remove themselves on shutdown. This need
continues to have precedence over any risk of a node deleting
itself maliciously

Security notes:

1. Issued Kubelet certificates authenticate as user `system:node:NAME`
and group `system:nodes` and are limited in their authorization
to perform API operations by Node authorization and NodeRestriction
admission. Previously, a Kubelet's authorization was broader. This
is the primary security motivation.

2. The bootstrap kubeconfig credential has the same sensitivity
as the previous generated TLS client-certificate kubeconfig.
It must be distributed securely to nodes. Its compromise still
allows an attacker to obtain a Kubelet kubeconfig

3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers
a slight security improvement.
  * An attacker who obtains the kubeconfig can likely obtain the
  bootstrap kubeconfig as well, to obtain the ability to renew
  their access
  * A compromised bootstrap kubeconfig could plausibly be handled
  by replacing the bootstrap token Secret, distributing the token
  to new nodes, and expiration. Whereas a compromised TLS-client
  certificate kubeconfig can't be revoked (no CRL). However,
  replacing a bootstrap token can be impractical in real cluster
  environments, so the limited lifetime is mostly a theoretical
  benefit.
  * Cluster CSR objects are visible via kubectl which is nice

4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet
clients have more identity information, which can improve the
utility of audits and future features

Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/185
2020-04-28 19:35:33 -07:00
Dalton Hubble
84ed0a31c3 Update Prometheus from v2.17.1 to v2.17.2
* https://github.com/prometheus/prometheus/releases/tag/v2.17.2
2020-04-20 18:09:24 -07:00
Dalton Hubble
2b5dfece93 Update Grafana from v6.7.1 to v6.7.2
* https://github.com/grafana/grafana/releases/tag/v6.7.2
2020-04-04 13:13:19 -07:00
Dalton Hubble
d47d40b517 Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh upstream Prometheus rules and alerts and Grafana
dashboards
* All Loki recording rules for convenience
2020-03-31 21:53:01 -07:00
Dalton Hubble
076b8e3c42 Update Prometheus from v2.17.0 to v2.17.1
* https://github.com/prometheus/prometheus/releases/tag/v2.17.1
2020-03-26 22:17:13 -07:00
Dalton Hubble
e556bc2167 Update Prometheus from v2.17.0-rc.3 to v2.17.0
* https://github.com/prometheus/prometheus/releases/tag/v2.17.0
2020-03-24 23:15:49 -07:00
Dalton Hubble
ddc1ff5348 Update Grafana from v6.6.2 to v6.7.1
* https://github.com/grafana/grafana/releases/tag/v6.7.1
2020-03-21 15:27:55 -07:00
Dalton Hubble
61557e89a6 Update Prometheus from v2.16.0 to v2.17.0-rc.3
* https://github.com/prometheus/prometheus/releases/tag/v2.17.0-rc.3
2020-03-19 22:38:05 -07:00
Dalton Hubble
75fb4e5d11 Remove Container Linux Update Operator (CLUO) addon
* Stop providing example manifests for the Container Linux
Update Operator (CLUO)
* CLUO requires patches to support Kubernetes v1.16+, but the
project and push access is rather unowned
* CLUO hasn't been in active use in our clusters and won't be
relevant beyond Container Linux. Not to say folks can't patch
it and run it on their own. Examples just aren't provided here

Related: https://github.com/coreos/container-linux-update-operator/pull/197
2020-03-16 22:05:17 -07:00
Dalton Hubble
c4683c5bad Refresh Prometheus alerts and Grafana dashboards
* Add 2 min wait before KubeNodeUnreachable to be less
noisy on premeptible clusters
* Add a BlackboxProbeFailure alert for any failing probes
for services annotated `prometheus.io/probe: true`
2020-03-02 20:08:37 -08:00
Dalton Hubble
f4d260645c Update node-exporter from v0.18.1 to v1.0.0-rc.0
* Update mdadm alert rule; node-exporter adds `state` label to
`node_md_disks` and removes `node_md_disks_active`
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.0
2020-02-25 22:29:52 -08:00
Dalton Hubble
d9219a6722 Update nginx-ingress from v0.29.0 to v0.30.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.30.0
2020-02-25 22:11:59 -08:00
Dalton Hubble
60c7eb85ee Update nginx-ingress from v0.28.0 to v0.29.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.29.0
2020-02-22 15:57:59 -08:00
Dalton Hubble
4c964b56a0 Update kube-state-metrics from v1.9.4 to v1.9.5
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.5
2020-02-22 15:21:10 -08:00
Dalton Hubble
1fbd6835f2 Update Grafana from v6.6.1 to v6.6.2
* https://github.com/grafana/grafana/releases/tag/v6.6.2
2020-02-22 15:19:24 -08:00
Dalton Hubble
7ca03e5219 Update Prometheus from v1.15.2 to v1.16.0
* https://github.com/prometheus/prometheus/releases/tag/v2.16.0
2020-02-14 12:10:56 -08:00
Dalton Hubble
34c3d7cc39 Update Grafana from v6.6.0 to v6.6.1
* https://github.com/grafana/grafana/releases/tag/v6.6.1
2020-02-08 14:50:33 -08:00
Dalton Hubble
e339fbd2b6 Update kube-state-metrics from v1.9.3 to v1.9.4
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.4
2020-02-04 21:33:34 -08:00
Dalton Hubble
b19ba16afa Update nginx-ingress from v0.27.1 to v0.28.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.28.0
2020-01-30 18:00:23 -08:00
Dalton Hubble
d127a7345c Update Grafana from v6.5.3 to v6.6.0
* https://github.com/grafana/grafana/releases/tag/v6.6.0
2020-01-27 20:46:32 -08:00
Dalton Hubble
d5b7ce8f27 Update kube-state-metrics from v1.9.2 to v1.9.3
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.3
2020-01-23 00:03:16 -08:00
Dalton Hubble
bda73264f7 Update nginx-ingress from v0.26.1 to v0.27.1
* Change runAsUser from 33 to 101 for new alpine-based image
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.1
2020-01-20 15:22:16 -08:00
Dalton Hubble
03ff3a9cf3 Update kube-state-metrics from v1.9.1 to v1.9.2
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.2
2020-01-18 15:32:10 -08:00
Dalton Hubble
48703f9906 Update Grafana from v6.5.2 to v6.5.3
* https://github.com/grafana/grafana/releases/tag/v6.5.3
2020-01-18 15:30:39 -08:00
Dalton Hubble
0e2fc89f78 Update kube-state-metrics from v1.9.0 to v1.9.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.1
2020-01-11 14:15:55 -08:00
Dalton Hubble
73588cfad3 Update Prometheus from v2.15.1 to v2.15.2
* https://github.com/prometheus/prometheus/releases/tag/v2.15.2
2020-01-06 22:08:34 -08:00
Dalton Hubble
bb586b60da Reduce Prometheus addon's node-exporter tolerations
* Change node-exporter DaemonSet tolerations from tolerating
all possible NoSchedule taints to tolerating the master taint
and the not ready taint (we'd like metrics regardless)
* Users who add custom node taints must add their custom taints
to the addon node-exporter DaemonSet. As an addon, its expected
users copy and manipulate manifests out-of-band in their own
systems
2020-01-06 21:24:24 -08:00
Dalton Hubble
43e05b9131 Enable kube-proxy metrics and allow Prometheus scrapes
* Configure kube-proxy --metrics-bind-address=0.0.0.0 (default
127.0.0.1) to serve metrics on 0.0.0.0:10249
* Add firewall rules to allow Prometheus (resides on a worker) to
scrape kube-proxy service endpoints on controllers or workers
* Add a clusterIP: None service for kube-proxy endpoint discovery
2020-01-06 21:11:18 -08:00
Dalton Hubble
a4e843693f Update Prometheus from v2.15.0 to v2.15.1
* https://github.com/prometheus/prometheus/releases/tag/v2.15.1
2019-12-26 09:12:55 -05:00
Dalton Hubble
f48e43c0b1 Update Prometheus from v2.14.0 to v2.15.0
* https://github.com/prometheus/prometheus/releases/tag/v2.15.0
2019-12-24 10:52:19 -05:00
Dalton Hubble
52d11096dc Update kube-state-metrics from v1.9.0-rc.1 to v1.9.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0
2019-12-20 13:53:37 -08:00
Dalton Hubble
0ecb995890 Update kube-state-metrics from v1.8.0 to v1.9.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0
2019-12-14 17:20:49 -08:00
Dalton Hubble
1b9fa2e688 Update Grafana from v6.5.1 to v6.5.2
* https://github.com/grafana/grafana/releases/tag/v6.5.2
2019-12-14 15:25:48 -08:00
Dalton Hubble
178afe4a9b Reduce apiserver metrics cardinality and extraneous labels
* Stop mapping node labels to targets discovered via Kubernetes
nodes (e.g. etcd, kubelet, cadvisor). It is rarely useful to
store node labels (e.g. kubernetes.io/os=linux) on these metrics
* kube-apiserver's apiserver_request_duration_seconds_bucket metric
has a high cardinality that includes labels for the API group, verb,
scope, resource, and component for each object type, including for
each CRD. This one metric has ~10k time series in a typical cluster
(btw 10-40% of total)
* Removing the apiserver request duration outright would make latency
alerts a NoOp and break a Grafana apiserver panel. Instead, drop series
that have a "group" label. Effectively, only request durations for
core Kubernetes APIs will be kept (e.g. cardinality won't grow with
each CRD added). This reduces the metric to ~2k unique series
2019-12-08 22:48:25 -08:00
Dalton Hubble
26674083b6 Update Grafana from v6.5.0 to v6.5.1
* https://github.com/grafana/grafana/releases/tag/v6.5.1
2019-11-28 14:11:25 -08:00
Dalton Hubble
030a4cec19 Update Grafana from v6.4.4 to v6.5.0
* https://grafana.com/docs/guides/whats-new-in-v6-5/
2019-11-25 22:45:58 -08:00
Dalton Hubble
ddea7dc452 Use new resource dashboards in Grafana deployment
* kubernetes-mixin pod resource dashboards were split into
two ConfigMap parts because they provide richer networking
details
* New dashboards have been used by the author at the global
level, but were missing in the per-cluster Grafana tracked
here
2019-11-25 22:27:11 -08:00
Dalton Hubble
525ae23305 Add node-exporter alerts and Grafana dashboard
* Add Prometheus alerts from node-exporter
* Add Grafana dashboard nodes.json, from node-exporter
* Not adding recording rules, since those are only used
by some node-exporter USE dashboards not being included
2019-11-16 13:47:20 -08:00
Dalton Hubble
42b6df89c8 Update Prometheus from v2.14.0-rc.0 to v2.14.0
* https://github.com/prometheus/prometheus/releases/tag/v2.14.0
2019-11-13 13:41:11 -08:00
Dalton Hubble
a8b7792338 Update Grafana from v6.4.3 to v6.4.4
* https://github.com/grafana/grafana/releases/tag/v6.4.4
2019-11-07 12:00:25 -08:00
Dalton Hubble
a3807086d4 Update Prometheus from v2.13.1 to v2.14.0-rc.0
* Happy PromCon 2019!
* https://github.com/prometheus/prometheus/releases/tag/v2.14.0-rc.0
2019-11-07 11:48:23 -08:00
Dalton Hubble
d4573092b5 Improve Kubelet and Compute Resource dashboards
* Add cluster filter to Kubelet dashboard
* Add network details in resource dashboards
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/275
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/284
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/285
2019-10-28 02:22:15 -07:00
Dalton Hubble
eb7b6d39f2 Improve minor aspects of CoreDNS and nginx-ingress dashboards
* Add default 10s refresh rate to custom dashboards to match
those from Kubernetes
* Show labels for "instance" as "pod" for clarity
* Add cluster filter for internal use
2019-10-20 23:16:55 -07:00
Dalton Hubble
33d4c2fd68 Add explicit annotation for Prometheus port to scrape
* Without the prometheus.io/port annotation, Prometheus
service discovery can scrape other Prometheus ports that
may be available.
* For example, Prometheus sidecars (not included) may
be scraped and that may be unintended
2019-10-20 16:05:09 -07:00
Dalton Hubble
de90cb9246 Remove kube-state-metrics addon-resizer
* addon-resizer is outdated and has been dropped from
kube-state-metrics examples. Those using it should look
to the cluster-proportional-vertical-autoscaler.
* Eliminate addon-resizer log spew
* Remove associated Role and RoleBinding
* Also fix kube-state-metrics readinessProbe port
2019-10-20 16:03:29 -07:00
Dalton Hubble
68da420adc Refresh Prometheus rules/alerts and Grafana dashboards
* Update Prometheus rules/alerts and Grafana dashboards
* Remove dashboards that were moved to node-exporter, they
may be added back later if valuable
* Remove kube-prometheus based rules/alerts (ClockSkew alert)
2019-10-19 17:43:47 -07:00
Dalton Hubble
130c97f8eb Update Prometheus from v2.13.0 to v2.13.1
* https://github.com/prometheus/prometheus/releases/tag/v2.13.1
2019-10-18 00:10:25 -07:00
Dalton Hubble
271d2f6b52 Update Grafana from v6.4.2 to v6.4.3
* https://github.com/grafana/grafana/releases/tag/v6.4.3
2019-10-18 00:08:39 -07:00
Dalton Hubble
e4ac1027c8 Update Grafana from v6.4.1 to v6.4.2
* https://github.com/grafana/grafana/releases/tag/v6.4.2
2019-10-15 22:58:43 -07:00
Dalton Hubble
69188af565 Rename CLUO label from "app" to "name"
* Match the labeling pattern in other addons
2019-10-15 00:05:02 -07:00
Dalton Hubble
ab72f1ab2d Update Prometheus from v2.12.0 to v2.13.0
* https://github.com/prometheus/prometheus/releases/tag/v2.13.0
2019-10-06 18:22:20 -07:00
Dalton Hubble
19de38b30d Fix Prometheus etcd metrics scraping
* Prometheus was configured to use kubernetes discovery
of etcd targets based on nodes matching the node label
node-role.kubernetes.io/controller=true
* Kubernetes v1.16 stopped permitting node role labels
node-role.kubernetes.io/* so Typhoon renamed these labels
(no longer any association with roles) to
node.kubermetes.io/controller=true
* As a result, Prometheus didn't discover etcd targets,
etcd metrics were missing, etcd alerts were ineffective,
and the etcd Grafana dashboard was empty
* Introduced: https://github.com/poseidon/typhoon/pull/543
2019-10-03 19:07:05 -07:00
Dalton Hubble
ca7d62720e Update Grafana from v6.3.6 to v6.4.1
* https://github.com/grafana/grafana/releases/tag/v6.4.1
2019-10-02 20:36:05 -07:00
Dalton Hubble
26f8d76755 Update kube-state-metrics from v1.7.2 to v1.8.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.8.0
2019-10-01 20:50:33 -07:00
Dalton Hubble
7bcf2d7831 Update nginx-ingress from v0.25.1 to v0.26.1
* Add lifecycle hook to allow draining connections for
up to 5 minutes
2019-09-30 22:01:07 -07:00
Dalton Hubble
f453c54956 Update Grafana from v6.3.5 to v6.3.6
* https://github.com/grafana/grafana/releases/tag/v6.3.6
2019-09-28 15:13:46 -07:00
Dalton Hubble
9da3725738 Update Kubernetes from v1.15.3 to v1.16.0
* Drop `node-role.kubernetes.io/master` and
`node-role.kubernetes.io/node` node labels
* Kubelet (v1.16) now rejects the node labels used
in the kubectl get nodes ROLES output
* https://github.com/kubernetes/kubernetes/issues/75457
2019-09-18 22:53:06 -07:00
Dalton Hubble
dc436b8fe9 Update Grafana from v6.3.4 to v6.3.5
* https://github.com/grafana/grafana/releases/tag/v6.3.5
2019-09-07 14:21:59 -07:00
Dalton Hubble
45bc52d156 Update Grafana from v6.3.3 to v6.3.4
* https://github.com/grafana/grafana/releases/tag/v6.3.4
2019-08-31 15:59:13 -07:00
Dalton Hubble
4ef2eb7e6b Update Prometheus from v2.11.2 to v2.12.0
* https://github.com/prometheus/prometheus/releases/tag/v2.12.0
2019-08-18 20:59:44 -07:00
Dalton Hubble
99990e3cbb Use stable IDs for etcd, CoreDNS, and Ngnix dashboards
* Use unique dashboard ID so that multiple replicas of Grafana
serve dashboards with uniform paths
* Fix issue where refreshing a dashboard served by one replica
could show a 404 unless the request went to the same replica
2019-08-18 12:45:49 -07:00
Dalton Hubble
0c45cd0f06 Update Grafana from v6.3.2 to v6.3.3
* https://github.com/grafana/grafana/releases/tag/v6.3.3
2019-08-16 14:40:47 -07:00
Dalton Hubble
976452825e Update Prometheus from v2.11.0 to v2.11.2
* https://github.com/prometheus/prometheus/releases/tag/v2.11.2
2019-08-14 21:26:46 -07:00
Dalton Hubble
7bc5633c38 Update nginx-ingress from v0.25.0 to v0.25.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.25.1
2019-08-14 21:26:46 -07:00
Dalton Hubble
eaea4d37a2 Update Grafana from v6.2.5 to v6.3.2
* https://github.com/grafana/grafana/releases/tag/v6.3.2
* https://github.com/grafana/grafana/releases/tag/v6.3.1
* https://github.com/grafana/grafana/releases/tag/v6.3.0
2019-08-07 20:01:18 -07:00
Dalton Hubble
457ad18daa Update kube-state-metrics from v1.7.1 to v1.7.2
* Add a separate liveness and readiness probe
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.2
2019-08-07 20:00:24 -07:00
Dalton Hubble
10d4d9e565 Add Grafana dashboards for CoreDNS and Nginx Ingress Controller
* Add a CoreDNS dashboard originally based on an upstream dashboard,
but now customized according to preferences
* Add an Nginx Ingress Controller based on an upstream dashboard,
but customized according to preferences
2019-08-05 22:49:19 -07:00
Dalton Hubble
68d8717924 Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh rules, alerts, and dashboards from upstreams
2019-07-21 11:29:34 -07:00
Dalton Hubble
f543f08867 Compact nginx-ingress ClusterRole rules
* https://github.com/kubernetes/ingress-nginx/pull/4302
2019-07-20 20:31:06 -07:00
Dalton Hubble
e0be091acc Update kube-state-metrics from v1.7.0 to v1.7.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.1
2019-07-20 20:17:08 -07:00
Dalton Hubble
6cd3e65267 Update kube-state-metrics from v1.7.0-rc.1 to v1.7.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0
* Add storageclasses and verticalpodautoscalers to ClusterRole
2019-07-19 00:14:47 -07:00
Dalton Hubble
70f5cfd33e Update kube-state-metrics from v1.6.0 to v1.7.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0-rc.0
2019-07-13 13:13:57 -07:00
Dalton Hubble
eaf59bd33f Update Prometheus from v2.11.0-rc.0 to v2.11.0
* https://github.com/prometheus/prometheus/releases/tag/v2.11.0
2019-07-09 21:33:24 -07:00
Dalton Hubble
40640f3697 Upgrade nginx-ingress from v0.24.1 to v0.25.0
* Support networking.k8s.io/v1beta1 apiVersion
* Update RBAC cluster-role for networking.k8s.io/v1beta1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.25.0
2019-07-08 22:04:50 -07:00
Dalton Hubble
28ab746068 Update Prometheus from v2.10.0 to v2.11.0-rc.0
* https://github.com/prometheus/prometheus/releases/tag/v2.11.0-rc.0
2019-07-08 21:32:50 -07:00
Dalton Hubble
9a395dbf88 Update Grafana from v6.2.4 to v6.2.5
* https://github.com/grafana/grafana/releases/tag/v6.2.5
2019-06-29 13:21:42 -07:00
Dalton Hubble
4ad69efc43 Update Grafana from v6.2.2 to v6.2.4
* https://github.com/grafana/grafana/releases/tag/v6.2.4
2019-06-19 21:51:54 -07:00
Dalton Hubble
cc4f7e09ab Update node-exporter from v0.18.0 to v0.18.1
* https://github.com/prometheus/node_exporter/releases/tag/v0.18.1
2019-06-07 02:09:44 -07:00
Dalton Hubble
f5960e227d Update addon-resizer base image to distroless
* Rel: https://github.com/kubernetes/kubernetes/pull/78397
2019-06-07 00:14:54 -07:00
Dalton Hubble
d449477272 Update Grafana from v6.2.1 to v6.2.2
* https://github.com/grafana/grafana/releases/tag/v6.2.2
2019-06-07 00:07:54 -07:00
Dalton Hubble
d9e7195477 Update Grafana from v2.6.0 to v2.6.1 2019-05-27 12:25:00 -07:00
Dalton Hubble
5d2684a04d Update Grafana from v6.1.6 to v6.2.0
* https://github.com/grafana/grafana/releases/tag/v6.2.0
2019-05-26 22:00:47 -07:00
Dalton Hubble
221889cc9b Update Prometheus from v2.9.2 to v2.10.0
* https://github.com/prometheus/prometheus/releases/tag/v2.10.0
2019-05-26 21:58:28 -07:00
Dalton Hubble
222a94247c Update node_exporter from v0.17.0 to v0.18.0
* https://github.com/prometheus/node_exporter/releases/tag/v0.18.0
2019-05-17 20:01:30 +02:00
Dalton Hubble
2d19ab8457 Update kube-state-metrics from v1.6.0-rc.2 to v1.6.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.6.0
2019-05-06 21:30:49 -07:00
Jordan Pittier
fd3c81d04d Remove create/update endpoints from nginx-ingress Role (#458)
* nginx-ingress no longer requires endpoints create/update RBAC Role permissions
* https://github.com/kubernetes/ingress-nginx/pull/1527
2019-05-04 11:36:02 -07:00
Dalton Hubble
6e9b2450fe Update Grafana from v6.1.4 to v6.1.6
* https://github.com/grafana/grafana/releases/tag/v6.1.6
2019-05-04 11:14:37 -07:00
Dalton Hubble
ec5aef5c92 Refresh Prometheus rules and Grafana dashboards
* Adds several network related alerts from upstream
2019-04-27 22:41:13 -07:00
Dalton Hubble
0e94708fd8 Update kube-state-metrics from v1.5.0 to v1.6.0-rc.2
* Collect metrics Ingress resources
* Collects metrics about certificates.k8s.io certificatesigningrequests
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.6.0-rc.2
2019-04-27 20:54:40 -07:00
Dalton Hubble
2c11bad439 Update Prometheus from v2.9.1 to v2.9.2
* https://github.com/prometheus/prometheus/releases/tag/v2.9.2
2019-04-27 20:39:55 -07:00
Dalton Hubble
418597aa59 Update Grafana from v6.1.3 to v6.1.4
* https://github.com/grafana/grafana/releases/tag/v6.1.4
2019-04-18 23:30:43 -07:00
Dalton Hubble
f3174c2b7a Update Prometheus from v2.8.1 to v2.9.1
* https://github.com/prometheus/prometheus/releases/tag/v2.9.1
* https://github.com/prometheus/prometheus/releases/tag/v2.9.0
2019-04-18 23:26:32 -07:00
Dalton Hubble
a141c5fe9e Update nginx-ingress from v0.23.0 to v0.24.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.24.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.24.0
2019-04-15 21:08:22 -07:00
Dalton Hubble
1b157a2fa4 Revert "Update kube-state-metrics from v1.5.0 to v1.6.0-rc.0"
* This reverts commit 6e5d66cf66
* kube-state-metrics v1.6.0-rc.0 fires KubeDeploymentReplicasMismatch
alerts where its own Deployment doesn't have replicas available,
(kube_deployment_status_replicas_available) even though all replicas
are available according to kubectl inspection
* This problem was present even with the CSR ClusterRole fix
(https://github.com/kubernetes/kube-state-metrics/pull/717)
2019-04-13 12:37:53 -07:00
Dalton Hubble
6e5d66cf66 Update kube-state-metrics from v1.5.0 to v1.6.0-rc.0
* Adds a metrics collector for Ingress resources and other
improvements
* https://github.com/kubernetes/kube-state-metrics/pull/640
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.6.0-rc.0
2019-04-09 22:16:36 -07:00
Dalton Hubble
44c293888b Update Grafana from v6.1.1 to v6.1.3
* https://github.com/grafana/grafana/releases/tag/v6.1.3
2019-04-09 22:06:27 -07:00
Dalton Hubble
ce78d5988e Refresh Prometheus rules and Grafana dashboards
* Refresh rules and dashboards from upstreams
* Add new Kubernetes "workload" dashboards
  * View pods in a workload (deployment/daemonset/statefulset)
  * View workloads in a namespace
2019-04-06 23:31:44 -07:00
Dalton Hubble
29a3035245 Update Grafana from v6.1.0 to v6.1.1 2019-04-06 18:32:14 -07:00
Dalton Hubble
3e7a38cb13 Update Grafana from v6.0.2 to v6.1.0
* https://github.com/grafana/grafana/releases/tag/v6.1.0
2019-04-03 20:47:48 -07:00
Dalton Hubble
3e9dc28a00 Update Prometheus from v2.8.0 to v2.8.1
* https://github.com/prometheus/prometheus/releases/tag/v2.8.1
2019-03-31 17:40:20 -07:00
Dalton Hubble
41a9d86bc3 Add NetworkPolicy to limit traffic into Prometheus
* Allow traffic from Grafana to Prometheus in monitoring
* Allow traffic from Prometheus to Prometheus in monitoring
* NetworkPolicy denies non-whitelisted traffic. Define policy
to allow other access
2019-03-23 21:38:34 -07:00
Dalton Hubble
36e31fc9fa Add liveness and readiness probes to Grafana
* https://github.com/grafana/grafana/issues/3302
2019-03-23 17:55:37 -07:00
Dalton Hubble
619a0370dc Update Grafana from v6.0.1 to v6.0.2
* https://github.com/grafana/grafana/releases/tag/v6.0.2
2019-03-21 23:41:25 -07:00
Dalton Hubble
6dd2731046 Set cpu/memory resources requests/limits for some addons
* Set resource requests and limits for Grafana and CLUO
* Set resource requests for Prometheus, but allow usage
to grow since needs vary widely
* Leave nginx without resource requests/limits for now,
its typically well behaved
2019-03-20 00:15:08 -07:00
Dalton Hubble
aa630003a4 Refresh Prometheus rules and Grafana dashboards
* Refresh rules and dashboards from upstreams
* Organize dashboards and stay below the ConfigMap size
limit
2019-03-17 13:23:04 -07:00
Dalton Hubble
bf97a45b9d Remove heapster manifests from addons
* Heapster addon powers `kubectl top`
* In early Kubernetes, people legitimately used and expected
`kubectl top` to work, so the optional addon was provided
* Today the standards are different. Many better monitoring
tools exist, that are also less coupled to Kubernetes "kubectl
top" reliance on a non-core extensions means its not in-scope
for minimal Kubernetes clusters. No more exceptionalism
* Finally, Heapster isn't that useful anymore. Its manifests
have no need for Typhoon-specific modification
* Look to prior releases if you still wish to apply heapster
2019-03-17 12:41:59 -07:00
Dalton Hubble
e0bee2e417 Update Prometheus from v2.7.2 to v2.8.0
* https://github.com/prometheus/prometheus/releases/tag/v2.8.0
2019-03-13 22:11:38 -07:00
Dalton Hubble
4201eb1efa Update Grafana from v6.0.0 to v6.0.1
* https://github.com/grafana/grafana/releases/tag/v6.0.1
2019-03-09 12:44:18 -08:00