Commit Graph

125 Commits

Author SHA1 Message Date
Dalton Hubble e483c81ce9 Improve Prometheus rules and alerts and Grafana dashboards
* Collate upstream rules, alerts, and dashboards and tune for use
in Typhoon
* Previously, a well-chosen (but older) set of rules, alerts, and
dashboards were maintained to reflect metric name changes
2019-02-18 12:19:23 -08:00
Dalton Hubble 6fa3b8a13f Upgrade Grafana to v6.0.0-beta2 and enable Explore UI
* Upgrade Grafana from v5.4.3 to v6.0.0-beta2
* Enable Grafana Explore UI while still using only the Viewer
role (inspect/edit without saving)
* http://docs.grafana.org/guides/whats-new-in-v6-0/
2019-02-17 13:26:42 -08:00
Dalton Hubble 170ef74eea Remove Nginx Ingress default backend
* nginx-ingress no longer requires a configured default-backend,
it will respond with its own 404 page starting in v0.21.0
* https://github.com/kubernetes/ingress-nginx/pull/3196
2019-02-16 14:18:15 -08:00
Dalton Hubble b13a651cfe Drop metrics that are unset, high cardinality, or extraneous
* https://github.com/coreos/prometheus-operator/pull/2387
* https://github.com/coreos/prometheus-operator/pull/1959
2019-02-10 23:56:11 -08:00
Dalton Hubble 9c59f393a5 Add Kubernetes pod name to metrics discovered from service endpoints
* Prometheus queries from some upstreams use joins of node-exporter
and kube-state-metrics metrics by (namespace,pod). Add the Kubernetes
pod name to service endpoint metrics
* Rename the kubernetes_namespace field to namespace
* Honor labels since kube-state-metrics already include a `pod` field
that should not be overridden
2019-02-10 23:54:30 -08:00
Dalton Hubble 3e4b3bfb04 Raise nginx-ingress liveness/readiness timeout
* Under heavy load, avoid timeouts causing nginx-ingress
restarts https://github.com/kubernetes/ingress-nginx/pull/3737
2019-02-09 12:53:09 -08:00
Dalton Hubble 949ce21fb2 Update Prometheus from v2.7.0 to v2.7.1
* https://github.com/prometheus/prometheus/releases/tag/v2.7.1
2019-02-02 00:13:24 -08:00
Dalton Hubble 130daeac26 Update Prometheus from v2.6.1 to v2.7.0 2019-01-29 22:31:20 -08:00
Dalton Hubble f5ff003d0e Update node-exporter from v0.15.2 to v0.17.0
* node-exporter renamed multiple metrics that are reflected
in changes to Prometheus rules and Grafana dashboard expressions
2019-01-22 01:14:00 -08:00
Dalton Hubble d697dd46dc Allow kube-state-metrics PodDisruptionBudget metrics
* Update kube-state-metrics ClusterRole to allow collecting
poddisruptionbudget metrics (exported as kube_poddisruptionbudget_*)
* https://github.com/kubernetes/kube-state-metrics/pull/551
* Bump addon-resizer from v1.7 to v1.8.4
2019-01-22 01:12:32 -08:00
Dalton Hubble 2f3097ebea Update nginx-ingress from v0.21.0 to v0.22.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.22.0
2019-01-16 23:01:22 -08:00
Dalton Hubble 67fb9602e7 Update Prometheus from v2.6.0 to v2.6.1
* https://github.com/prometheus/prometheus/releases/tag/v2.6.1
2019-01-15 21:13:40 -08:00
Dalton Hubble c8a85fabe1 Update Grafana from v5.4.2 to v5.4.3
* https://github.com/grafana/grafana/releases/tag/v5.4.3
2019-01-15 21:13:16 -08:00
Dalton Hubble 1d27dc6528 Update kube-state-metrics exporter from v1.4.0 to v1.5.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.5.0
2019-01-12 14:24:57 -08:00
Dalton Hubble ea8b0d1c84 Update Prometheus addon from v2.5.0 to v2.6.0
* https://github.com/prometheus/prometheus/releases/tag/v2.6.0
2018-12-27 07:35:12 -08:00
Dalton Hubble b74bf11772 Update Grafana from v5.4.0 to v5.4.2
* https://github.com/grafana/grafana/releases/tag/v5.4.2
* https://github.com/grafana/grafana/releases/tag/v5.4.1
2018-12-15 12:39:03 -08:00
Dalton Hubble 991fb44c37 Update Grafana from v5.3.4 to v5.4.0
* https://github.com/grafana/grafana/releases/tag/v5.4.0
2018-12-06 01:33:50 -08:00
Dalton Hubble b6016d0a26 Disable Grafana login form, admin user can't be disabled
* Example manifests aim to provide a read-only dashboard visible
to any users with network access (i.e. kubectl port-forward, LAN)
* Problem: Grafana always has an admin user, even with the user
management system disabled
* Disable the login form to prevent admin login
2018-11-28 22:04:08 -08:00
Dalton Hubble 872b11b948 Update ngninx-ingress from v0.20.0 to v0.21.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.21.0
2018-11-26 21:57:34 -08:00
Dalton Hubble c8c43f3991 Update Grafana from v5.3.2 to v5.3.4
* https://github.com/grafana/grafana/releases/tag/v5.3.3
* https://github.com/grafana/grafana/releases/tag/v5.3.4
2018-11-18 16:42:50 -08:00
Dalton Hubble 7de03a1279 Fix Prometheus etcd scrape config for DigitalOcean
* Kubelet uses a node's hostname as the node name, which isn't
resolvable on DigitalOcean. On DigitalOcean, the node name was
set to the internal IP until #337 switched to instead configuring
kube-apiserver to prefer the InternalIP for communication
* Explicitly configure etcd scrapes to target each controller by
internal IP and port 2381 (replace __address__)
2018-11-06 23:02:45 -08:00
Dalton Hubble be9f7b87d6 Update Prometheus from v2.4.3 to v2.5.0
* https://github.com/prometheus/prometheus/releases/tag/v2.5.0
2018-11-06 22:16:12 -08:00
Dalton Hubble 884c8b39dc Update Grafana from v5.3.1 to v5.3.2
* https://github.com/grafana/grafana/releases/tag/v5.3.2
2018-10-28 19:44:22 -07:00
Dalton Hubble bc750aec33 Configure Heapster to source metrics from Kubelet authenticated API
* Heapster can now get nodes (i.e. kubelets) from the apiserver and
source metrics from the Kubelet authenticated API (10250) instead of
the Kubelet HTTP read-only API (10255)
* https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md
* Use the heapster service account token via Kubelet bearer token
authn/authz.
* Permit Heapster to skip CA verification. The CA cert does not contain
IP SANs and cannot since nodes get random IPs that aren't known upfront.
Heapster obtains the node list from the apiserver, so the risk of
spoofing a node is limited. For the same reason, Prometheus scrapes
must skip CA verification for scraping Kubelet's provided by the apiserver.
* https://github.com/poseidon/typhoon/blob/v1.12.1/addons/prometheus/config.yaml#L68
* Create a heapster ClusterRole to work around the default Kubernetes
`system:heapster` ClusterRole lacking the proper GET `nodes/stats`
access. See https://github.com/kubernetes/heapster/issues/1936
2018-10-18 21:03:01 -07:00
Dalton Hubble 0127ee82c1 Update nginx-ingress from v0.19.0 to v0.20.0 2018-10-16 21:35:29 -07:00
Dalton Hubble a10d6977b8 Update Prometheus from v2.4.2 to v2.4.3
* https://github.com/prometheus/prometheus/releases/tag/v2.4.3
2018-10-16 21:29:41 -07:00
Dalton Hubble 05fe923c14 Update Grafana from v5.3.0 to v5.3.1
* https://github.com/grafana/grafana/releases/tag/v5.3.1
2018-10-16 21:23:44 -07:00
Dalton Hubble 5eb4078d68 Add docker/default seccomp to control plane and addons
* Annotate pods, deployments, and daemonsets to start containers
with the Docker runtime's default seccomp profile
* Overrides Kubernetes default behavior which started containers
with seccomp=unconfined
* https://docs.docker.com/engine/security/seccomp/#pass-a-profile-for-a-container
2018-10-16 20:07:29 -07:00
Dalton Hubble 8f0d2b5db4 Update Grafana from v5.2.4 to v5.3.0 2018-10-13 23:03:31 -07:00
Dalton Hubble 032a24133b Update Prometheus from v2.3.2 to v2.4.2
* https://github.com/prometheus/prometheus/releases/tag/v2.4.0
* https://github.com/prometheus/prometheus/releases/tag/v2.4.1
* https://github.com/prometheus/prometheus/releases/tag/v2.4.2
2018-09-21 22:27:11 -07:00
Dalton Hubble dc03f7a4a9 Update nginx-ingress from 0.17.1 to 0.19.0
* If using --enable-ssl-passthrough or exposing TCP/UDP services,
be aware of https://github.com/kubernetes/ingress-nginx/pull/3038
* Workarounds until the fix merges are to stay on 0.17.1, use the
suggested development image, or revert to securityContext
`runAsNonRoot: false` for a while (less secure)
2018-09-08 17:57:01 -07:00
Dalton Hubble 1b8234eb91 Update Grafana from v5.2.2 to v5.2.4
* https://github.com/grafana/grafana/releases/tag/v5.2.3
* https://github.com/grafana/grafana/releases/tag/v5.2.4
2018-09-08 15:41:20 -07:00
Dalton Hubble 4ba090feb0 Update kube-state-metrics from v1.3.1 to v1.4.0 2018-08-29 09:37:50 -07:00
Dalton Hubble 4882fe1053 Add docs for Azure Ingress and worker pools
* Azure worker pools must be in the same region as
the cluster itself unfortunately
2018-08-27 23:30:56 -07:00
Becca Powell 49a9dc9b8b Fix typo in Prometheus alerting rules 2018-08-21 16:55:49 -07:00
Dalton Hubble dbdc3fc850 Add nginx-ingress addon manifests for bare-metal 2018-08-11 12:14:23 -07:00
Dalton Hubble e00f97c578 Update nginx-ingress from 0.16.2 to 0.17.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.17.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.17.0
2018-08-08 00:45:20 -07:00
Dalton Hubble e6720cf738 Update heapster from v1.5.3 to v1.5.4
* https://github.com/kubernetes/heapster/releases/tag/v1.5.4
2018-07-29 11:19:57 -07:00
Dalton Hubble 844f380b4e Update Grafana from v5.2.1 to v5.2.2
* https://github.com/grafana/grafana/releases/tag/v5.2.2
2018-07-29 11:12:56 -07:00
Dalton Hubble 02cd8eb8d3 Update Prometheus from v2.3.1 to v2.3.2
* https://github.com/prometheus/prometheus/releases/tag/v2.3.2
2018-07-14 14:25:49 -07:00
Dalton Hubble 84d6cfe7b3 Add Prometheus alert rule for inactive md devices
* node-exporter exposes metrics to Prometheus about total and
active md devices (e.g. disks in mdadm RAID arrays)
* Add alert that fires when a RAID disk fails or becomes inactive
for another reason
2018-07-10 00:20:30 -07:00
Dalton Hubble f40f60b83c Update Nginx Ingress controller from 0.15.0 to 0.16.2
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.16.2
* https://github.com/kubernetes/ingress-nginx/blob/master/Changelog.md
2018-07-02 22:06:22 -07:00
Dalton Hubble a3349b5c68 Update heapster from v1.5.2 to v1.5.3 2018-07-01 21:07:52 -07:00
Dalton Hubble 74dc6b0bf9 Update Grafana from 5.1.4 to 5.2.1
* http://docs.grafana.org/guides/whats-new-in-v5-2/
* https://github.com/grafana/grafana/releases/tag/v5.2.0
* https://github.com/grafana/grafana/releases/tag/v5.2.1
2018-07-01 20:55:34 -07:00
Dalton Hubble 2eaf04c68b Drop hostNetwork from nginx-ingress addon
* Both flannel and Calico support host port via `portmap`
* Allows writing NetworkPolicies that reference ingress pods in `from`
or `to`. HostNetwork pods were difficult to write network policy for
since they could circumvent the CNI network to communicate with pods on
the same node.
2018-06-22 00:46:41 -07:00
Dalton Hubble d5de41e07a Update Grafana from 5.1.3 to 5.1.4
* https://github.com/grafana/grafana/releases/tag/v5.1.4
2018-06-19 21:45:15 -07:00
Dalton Hubble 05b99178ae Update prometheus from v2.3.0 to v2.3.1
* https://github.com/prometheus/prometheus/releases/tag/v2.3.1
2018-06-19 21:43:50 -07:00
Stephen Demos 18dd7ccc09 Update CLUO from v0.6.0 to v0.7.0 2018-06-14 22:32:36 -07:00
Dalton Hubble cbe646fba6 Label namespaces to ease writing Network Policies 2018-06-09 11:45:11 -07:00
Dalton Hubble c166b2ba33 Update prometheus from v2.2.1 to v2.3.0 2018-06-09 11:43:10 -07:00