Dalton Hubble
d4573092b5
Improve Kubelet and Compute Resource dashboards
...
* Add cluster filter to Kubelet dashboard
* Add network details in resource dashboards
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/275
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/284
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/285
2019-10-28 02:22:15 -07:00
Dalton Hubble
eb7b6d39f2
Improve minor aspects of CoreDNS and nginx-ingress dashboards
...
* Add default 10s refresh rate to custom dashboards to match
those from Kubernetes
* Show labels for "instance" as "pod" for clarity
* Add cluster filter for internal use
2019-10-20 23:16:55 -07:00
Dalton Hubble
33d4c2fd68
Add explicit annotation for Prometheus port to scrape
...
* Without the prometheus.io/port annotation, Prometheus
service discovery can scrape other Prometheus ports that
may be available.
* For example, Prometheus sidecars (not included) may
be scraped and that may be unintended
2019-10-20 16:05:09 -07:00
Dalton Hubble
de90cb9246
Remove kube-state-metrics addon-resizer
...
* addon-resizer is outdated and has been dropped from
kube-state-metrics examples. Those using it should look
to the cluster-proportional-vertical-autoscaler.
* Eliminate addon-resizer log spew
* Remove associated Role and RoleBinding
* Also fix kube-state-metrics readinessProbe port
2019-10-20 16:03:29 -07:00
Dalton Hubble
68da420adc
Refresh Prometheus rules/alerts and Grafana dashboards
...
* Update Prometheus rules/alerts and Grafana dashboards
* Remove dashboards that were moved to node-exporter, they
may be added back later if valuable
* Remove kube-prometheus based rules/alerts (ClockSkew alert)
2019-10-19 17:43:47 -07:00
Dalton Hubble
130c97f8eb
Update Prometheus from v2.13.0 to v2.13.1
...
* https://github.com/prometheus/prometheus/releases/tag/v2.13.1
2019-10-18 00:10:25 -07:00
Dalton Hubble
271d2f6b52
Update Grafana from v6.4.2 to v6.4.3
...
* https://github.com/grafana/grafana/releases/tag/v6.4.3
2019-10-18 00:08:39 -07:00
Dalton Hubble
e4ac1027c8
Update Grafana from v6.4.1 to v6.4.2
...
* https://github.com/grafana/grafana/releases/tag/v6.4.2
2019-10-15 22:58:43 -07:00
Dalton Hubble
69188af565
Rename CLUO label from "app" to "name"
...
* Match the labeling pattern in other addons
2019-10-15 00:05:02 -07:00
Dalton Hubble
ab72f1ab2d
Update Prometheus from v2.12.0 to v2.13.0
...
* https://github.com/prometheus/prometheus/releases/tag/v2.13.0
2019-10-06 18:22:20 -07:00
Dalton Hubble
19de38b30d
Fix Prometheus etcd metrics scraping
...
* Prometheus was configured to use kubernetes discovery
of etcd targets based on nodes matching the node label
node-role.kubernetes.io/controller=true
* Kubernetes v1.16 stopped permitting node role labels
node-role.kubernetes.io/* so Typhoon renamed these labels
(no longer any association with roles) to
node.kubermetes.io/controller=true
* As a result, Prometheus didn't discover etcd targets,
etcd metrics were missing, etcd alerts were ineffective,
and the etcd Grafana dashboard was empty
* Introduced: https://github.com/poseidon/typhoon/pull/543
2019-10-03 19:07:05 -07:00
Dalton Hubble
ca7d62720e
Update Grafana from v6.3.6 to v6.4.1
...
* https://github.com/grafana/grafana/releases/tag/v6.4.1
2019-10-02 20:36:05 -07:00
Dalton Hubble
26f8d76755
Update kube-state-metrics from v1.7.2 to v1.8.0
...
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.8.0
2019-10-01 20:50:33 -07:00
Dalton Hubble
7bcf2d7831
Update nginx-ingress from v0.25.1 to v0.26.1
...
* Add lifecycle hook to allow draining connections for
up to 5 minutes
2019-09-30 22:01:07 -07:00
Dalton Hubble
f453c54956
Update Grafana from v6.3.5 to v6.3.6
...
* https://github.com/grafana/grafana/releases/tag/v6.3.6
2019-09-28 15:13:46 -07:00
Dalton Hubble
9da3725738
Update Kubernetes from v1.15.3 to v1.16.0
...
* Drop `node-role.kubernetes.io/master` and
`node-role.kubernetes.io/node` node labels
* Kubelet (v1.16) now rejects the node labels used
in the kubectl get nodes ROLES output
* https://github.com/kubernetes/kubernetes/issues/75457
2019-09-18 22:53:06 -07:00
Dalton Hubble
dc436b8fe9
Update Grafana from v6.3.4 to v6.3.5
...
* https://github.com/grafana/grafana/releases/tag/v6.3.5
2019-09-07 14:21:59 -07:00
Dalton Hubble
45bc52d156
Update Grafana from v6.3.3 to v6.3.4
...
* https://github.com/grafana/grafana/releases/tag/v6.3.4
2019-08-31 15:59:13 -07:00
Dalton Hubble
4ef2eb7e6b
Update Prometheus from v2.11.2 to v2.12.0
...
* https://github.com/prometheus/prometheus/releases/tag/v2.12.0
2019-08-18 20:59:44 -07:00
Dalton Hubble
99990e3cbb
Use stable IDs for etcd, CoreDNS, and Ngnix dashboards
...
* Use unique dashboard ID so that multiple replicas of Grafana
serve dashboards with uniform paths
* Fix issue where refreshing a dashboard served by one replica
could show a 404 unless the request went to the same replica
2019-08-18 12:45:49 -07:00
Dalton Hubble
0c45cd0f06
Update Grafana from v6.3.2 to v6.3.3
...
* https://github.com/grafana/grafana/releases/tag/v6.3.3
2019-08-16 14:40:47 -07:00
Dalton Hubble
976452825e
Update Prometheus from v2.11.0 to v2.11.2
...
* https://github.com/prometheus/prometheus/releases/tag/v2.11.2
2019-08-14 21:26:46 -07:00
Dalton Hubble
7bc5633c38
Update nginx-ingress from v0.25.0 to v0.25.1
...
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.25.1
2019-08-14 21:26:46 -07:00
Dalton Hubble
eaea4d37a2
Update Grafana from v6.2.5 to v6.3.2
...
* https://github.com/grafana/grafana/releases/tag/v6.3.2
* https://github.com/grafana/grafana/releases/tag/v6.3.1
* https://github.com/grafana/grafana/releases/tag/v6.3.0
2019-08-07 20:01:18 -07:00
Dalton Hubble
457ad18daa
Update kube-state-metrics from v1.7.1 to v1.7.2
...
* Add a separate liveness and readiness probe
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.2
2019-08-07 20:00:24 -07:00
Dalton Hubble
10d4d9e565
Add Grafana dashboards for CoreDNS and Nginx Ingress Controller
...
* Add a CoreDNS dashboard originally based on an upstream dashboard,
but now customized according to preferences
* Add an Nginx Ingress Controller based on an upstream dashboard,
but customized according to preferences
2019-08-05 22:49:19 -07:00
Dalton Hubble
68d8717924
Refresh Prometheus rules/alerts and Grafana dashboards
...
* Refresh rules, alerts, and dashboards from upstreams
2019-07-21 11:29:34 -07:00
Dalton Hubble
f543f08867
Compact nginx-ingress ClusterRole rules
...
* https://github.com/kubernetes/ingress-nginx/pull/4302
2019-07-20 20:31:06 -07:00
Dalton Hubble
e0be091acc
Update kube-state-metrics from v1.7.0 to v1.7.1
...
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.1
2019-07-20 20:17:08 -07:00
Dalton Hubble
6cd3e65267
Update kube-state-metrics from v1.7.0-rc.1 to v1.7.0
...
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0
* Add storageclasses and verticalpodautoscalers to ClusterRole
2019-07-19 00:14:47 -07:00
Dalton Hubble
70f5cfd33e
Update kube-state-metrics from v1.6.0 to v1.7.0-rc.1
...
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0-rc.0
2019-07-13 13:13:57 -07:00
Dalton Hubble
eaf59bd33f
Update Prometheus from v2.11.0-rc.0 to v2.11.0
...
* https://github.com/prometheus/prometheus/releases/tag/v2.11.0
2019-07-09 21:33:24 -07:00
Dalton Hubble
40640f3697
Upgrade nginx-ingress from v0.24.1 to v0.25.0
...
* Support networking.k8s.io/v1beta1 apiVersion
* Update RBAC cluster-role for networking.k8s.io/v1beta1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.25.0
2019-07-08 22:04:50 -07:00
Dalton Hubble
28ab746068
Update Prometheus from v2.10.0 to v2.11.0-rc.0
...
* https://github.com/prometheus/prometheus/releases/tag/v2.11.0-rc.0
2019-07-08 21:32:50 -07:00
Dalton Hubble
9a395dbf88
Update Grafana from v6.2.4 to v6.2.5
...
* https://github.com/grafana/grafana/releases/tag/v6.2.5
2019-06-29 13:21:42 -07:00
Dalton Hubble
4ad69efc43
Update Grafana from v6.2.2 to v6.2.4
...
* https://github.com/grafana/grafana/releases/tag/v6.2.4
2019-06-19 21:51:54 -07:00
Dalton Hubble
cc4f7e09ab
Update node-exporter from v0.18.0 to v0.18.1
...
* https://github.com/prometheus/node_exporter/releases/tag/v0.18.1
2019-06-07 02:09:44 -07:00
Dalton Hubble
f5960e227d
Update addon-resizer base image to distroless
...
* Rel: https://github.com/kubernetes/kubernetes/pull/78397
2019-06-07 00:14:54 -07:00
Dalton Hubble
d449477272
Update Grafana from v6.2.1 to v6.2.2
...
* https://github.com/grafana/grafana/releases/tag/v6.2.2
2019-06-07 00:07:54 -07:00
Dalton Hubble
d9e7195477
Update Grafana from v2.6.0 to v2.6.1
2019-05-27 12:25:00 -07:00
Dalton Hubble
5d2684a04d
Update Grafana from v6.1.6 to v6.2.0
...
* https://github.com/grafana/grafana/releases/tag/v6.2.0
2019-05-26 22:00:47 -07:00
Dalton Hubble
221889cc9b
Update Prometheus from v2.9.2 to v2.10.0
...
* https://github.com/prometheus/prometheus/releases/tag/v2.10.0
2019-05-26 21:58:28 -07:00
Dalton Hubble
222a94247c
Update node_exporter from v0.17.0 to v0.18.0
...
* https://github.com/prometheus/node_exporter/releases/tag/v0.18.0
2019-05-17 20:01:30 +02:00
Dalton Hubble
2d19ab8457
Update kube-state-metrics from v1.6.0-rc.2 to v1.6.0
...
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.6.0
2019-05-06 21:30:49 -07:00
Jordan Pittier
fd3c81d04d
Remove create/update endpoints from nginx-ingress Role ( #458 )
...
* nginx-ingress no longer requires endpoints create/update RBAC Role permissions
* https://github.com/kubernetes/ingress-nginx/pull/1527
2019-05-04 11:36:02 -07:00
Dalton Hubble
6e9b2450fe
Update Grafana from v6.1.4 to v6.1.6
...
* https://github.com/grafana/grafana/releases/tag/v6.1.6
2019-05-04 11:14:37 -07:00
Dalton Hubble
ec5aef5c92
Refresh Prometheus rules and Grafana dashboards
...
* Adds several network related alerts from upstream
2019-04-27 22:41:13 -07:00
Dalton Hubble
0e94708fd8
Update kube-state-metrics from v1.5.0 to v1.6.0-rc.2
...
* Collect metrics Ingress resources
* Collects metrics about certificates.k8s.io certificatesigningrequests
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.6.0-rc.2
2019-04-27 20:54:40 -07:00
Dalton Hubble
2c11bad439
Update Prometheus from v2.9.1 to v2.9.2
...
* https://github.com/prometheus/prometheus/releases/tag/v2.9.2
2019-04-27 20:39:55 -07:00
Dalton Hubble
418597aa59
Update Grafana from v6.1.3 to v6.1.4
...
* https://github.com/grafana/grafana/releases/tag/v6.1.4
2019-04-18 23:30:43 -07:00
Dalton Hubble
f3174c2b7a
Update Prometheus from v2.8.1 to v2.9.1
...
* https://github.com/prometheus/prometheus/releases/tag/v2.9.1
* https://github.com/prometheus/prometheus/releases/tag/v2.9.0
2019-04-18 23:26:32 -07:00
Dalton Hubble
a141c5fe9e
Update nginx-ingress from v0.23.0 to v0.24.1
...
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.24.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.24.0
2019-04-15 21:08:22 -07:00
Dalton Hubble
1b157a2fa4
Revert "Update kube-state-metrics from v1.5.0 to v1.6.0-rc.0"
...
* This reverts commit 6e5d66cf66
* kube-state-metrics v1.6.0-rc.0 fires KubeDeploymentReplicasMismatch
alerts where its own Deployment doesn't have replicas available,
(kube_deployment_status_replicas_available) even though all replicas
are available according to kubectl inspection
* This problem was present even with the CSR ClusterRole fix
(https://github.com/kubernetes/kube-state-metrics/pull/717 )
2019-04-13 12:37:53 -07:00
Dalton Hubble
6e5d66cf66
Update kube-state-metrics from v1.5.0 to v1.6.0-rc.0
...
* Adds a metrics collector for Ingress resources and other
improvements
* https://github.com/kubernetes/kube-state-metrics/pull/640
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.6.0-rc.0
2019-04-09 22:16:36 -07:00
Dalton Hubble
44c293888b
Update Grafana from v6.1.1 to v6.1.3
...
* https://github.com/grafana/grafana/releases/tag/v6.1.3
2019-04-09 22:06:27 -07:00
Dalton Hubble
ce78d5988e
Refresh Prometheus rules and Grafana dashboards
...
* Refresh rules and dashboards from upstreams
* Add new Kubernetes "workload" dashboards
* View pods in a workload (deployment/daemonset/statefulset)
* View workloads in a namespace
2019-04-06 23:31:44 -07:00
Dalton Hubble
29a3035245
Update Grafana from v6.1.0 to v6.1.1
2019-04-06 18:32:14 -07:00
Dalton Hubble
3e7a38cb13
Update Grafana from v6.0.2 to v6.1.0
...
* https://github.com/grafana/grafana/releases/tag/v6.1.0
2019-04-03 20:47:48 -07:00
Dalton Hubble
3e9dc28a00
Update Prometheus from v2.8.0 to v2.8.1
...
* https://github.com/prometheus/prometheus/releases/tag/v2.8.1
2019-03-31 17:40:20 -07:00
Dalton Hubble
41a9d86bc3
Add NetworkPolicy to limit traffic into Prometheus
...
* Allow traffic from Grafana to Prometheus in monitoring
* Allow traffic from Prometheus to Prometheus in monitoring
* NetworkPolicy denies non-whitelisted traffic. Define policy
to allow other access
2019-03-23 21:38:34 -07:00
Dalton Hubble
36e31fc9fa
Add liveness and readiness probes to Grafana
...
* https://github.com/grafana/grafana/issues/3302
2019-03-23 17:55:37 -07:00
Dalton Hubble
619a0370dc
Update Grafana from v6.0.1 to v6.0.2
...
* https://github.com/grafana/grafana/releases/tag/v6.0.2
2019-03-21 23:41:25 -07:00
Dalton Hubble
6dd2731046
Set cpu/memory resources requests/limits for some addons
...
* Set resource requests and limits for Grafana and CLUO
* Set resource requests for Prometheus, but allow usage
to grow since needs vary widely
* Leave nginx without resource requests/limits for now,
its typically well behaved
2019-03-20 00:15:08 -07:00
Dalton Hubble
aa630003a4
Refresh Prometheus rules and Grafana dashboards
...
* Refresh rules and dashboards from upstreams
* Organize dashboards and stay below the ConfigMap size
limit
2019-03-17 13:23:04 -07:00
Dalton Hubble
bf97a45b9d
Remove heapster manifests from addons
...
* Heapster addon powers `kubectl top`
* In early Kubernetes, people legitimately used and expected
`kubectl top` to work, so the optional addon was provided
* Today the standards are different. Many better monitoring
tools exist, that are also less coupled to Kubernetes "kubectl
top" reliance on a non-core extensions means its not in-scope
for minimal Kubernetes clusters. No more exceptionalism
* Finally, Heapster isn't that useful anymore. Its manifests
have no need for Typhoon-specific modification
* Look to prior releases if you still wish to apply heapster
2019-03-17 12:41:59 -07:00
Dalton Hubble
e0bee2e417
Update Prometheus from v2.7.2 to v2.8.0
...
* https://github.com/prometheus/prometheus/releases/tag/v2.8.0
2019-03-13 22:11:38 -07:00
Dalton Hubble
4201eb1efa
Update Grafana from v6.0.0 to v6.0.1
...
* https://github.com/grafana/grafana/releases/tag/v6.0.1
2019-03-09 12:44:18 -08:00
Dalton Hubble
4d9a692424
Update Prometheus from v2.7.1 to v2.7.2
...
* https://github.com/prometheus/prometheus/releases/tag/v2.7.2
2019-03-04 23:08:12 -08:00
Dalton Hubble
a08adc92b5
Update nginx-ingress from v0.22.0 to v0.23.0
...
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.23.0
2019-03-01 01:18:54 -08:00
Dalton Hubble
4ff7fe2c29
Update Grafana dashboards from upstreams
2019-02-28 23:22:07 -08:00
Dalton Hubble
daee5a9d60
Update Grafana from v6.0.0-beta3 to v6.0.0
...
* https://github.com/grafana/grafana/releases/tag/v6.0.0
* http://docs.grafana.org/guides/whats-new-in-v6-0/
2019-02-25 21:43:43 -08:00
Dalton Hubble
d10c2b4cb9
Update Grafana from v6.0.0-beta2 to v6.0.0-beta3
...
* Update Grafana dashboards
2019-02-23 13:03:25 -08:00
Dalton Hubble
e483c81ce9
Improve Prometheus rules and alerts and Grafana dashboards
...
* Collate upstream rules, alerts, and dashboards and tune for use
in Typhoon
* Previously, a well-chosen (but older) set of rules, alerts, and
dashboards were maintained to reflect metric name changes
2019-02-18 12:19:23 -08:00
Dalton Hubble
6fa3b8a13f
Upgrade Grafana to v6.0.0-beta2 and enable Explore UI
...
* Upgrade Grafana from v5.4.3 to v6.0.0-beta2
* Enable Grafana Explore UI while still using only the Viewer
role (inspect/edit without saving)
* http://docs.grafana.org/guides/whats-new-in-v6-0/
2019-02-17 13:26:42 -08:00
Dalton Hubble
170ef74eea
Remove Nginx Ingress default backend
...
* nginx-ingress no longer requires a configured default-backend,
it will respond with its own 404 page starting in v0.21.0
* https://github.com/kubernetes/ingress-nginx/pull/3196
2019-02-16 14:18:15 -08:00
Dalton Hubble
b13a651cfe
Drop metrics that are unset, high cardinality, or extraneous
...
* https://github.com/coreos/prometheus-operator/pull/2387
* https://github.com/coreos/prometheus-operator/pull/1959
2019-02-10 23:56:11 -08:00
Dalton Hubble
9c59f393a5
Add Kubernetes pod name to metrics discovered from service endpoints
...
* Prometheus queries from some upstreams use joins of node-exporter
and kube-state-metrics metrics by (namespace,pod). Add the Kubernetes
pod name to service endpoint metrics
* Rename the kubernetes_namespace field to namespace
* Honor labels since kube-state-metrics already include a `pod` field
that should not be overridden
2019-02-10 23:54:30 -08:00
Dalton Hubble
3e4b3bfb04
Raise nginx-ingress liveness/readiness timeout
...
* Under heavy load, avoid timeouts causing nginx-ingress
restarts https://github.com/kubernetes/ingress-nginx/pull/3737
2019-02-09 12:53:09 -08:00
Dalton Hubble
949ce21fb2
Update Prometheus from v2.7.0 to v2.7.1
...
* https://github.com/prometheus/prometheus/releases/tag/v2.7.1
2019-02-02 00:13:24 -08:00
Dalton Hubble
130daeac26
Update Prometheus from v2.6.1 to v2.7.0
2019-01-29 22:31:20 -08:00
Dalton Hubble
f5ff003d0e
Update node-exporter from v0.15.2 to v0.17.0
...
* node-exporter renamed multiple metrics that are reflected
in changes to Prometheus rules and Grafana dashboard expressions
2019-01-22 01:14:00 -08:00
Dalton Hubble
d697dd46dc
Allow kube-state-metrics PodDisruptionBudget metrics
...
* Update kube-state-metrics ClusterRole to allow collecting
poddisruptionbudget metrics (exported as kube_poddisruptionbudget_*)
* https://github.com/kubernetes/kube-state-metrics/pull/551
* Bump addon-resizer from v1.7 to v1.8.4
2019-01-22 01:12:32 -08:00
Dalton Hubble
2f3097ebea
Update nginx-ingress from v0.21.0 to v0.22.0
...
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.22.0
2019-01-16 23:01:22 -08:00
Dalton Hubble
67fb9602e7
Update Prometheus from v2.6.0 to v2.6.1
...
* https://github.com/prometheus/prometheus/releases/tag/v2.6.1
2019-01-15 21:13:40 -08:00
Dalton Hubble
c8a85fabe1
Update Grafana from v5.4.2 to v5.4.3
...
* https://github.com/grafana/grafana/releases/tag/v5.4.3
2019-01-15 21:13:16 -08:00
Dalton Hubble
1d27dc6528
Update kube-state-metrics exporter from v1.4.0 to v1.5.0
...
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.5.0
2019-01-12 14:24:57 -08:00
Dalton Hubble
ea8b0d1c84
Update Prometheus addon from v2.5.0 to v2.6.0
...
* https://github.com/prometheus/prometheus/releases/tag/v2.6.0
2018-12-27 07:35:12 -08:00
Dalton Hubble
b74bf11772
Update Grafana from v5.4.0 to v5.4.2
...
* https://github.com/grafana/grafana/releases/tag/v5.4.2
* https://github.com/grafana/grafana/releases/tag/v5.4.1
2018-12-15 12:39:03 -08:00
Dalton Hubble
991fb44c37
Update Grafana from v5.3.4 to v5.4.0
...
* https://github.com/grafana/grafana/releases/tag/v5.4.0
2018-12-06 01:33:50 -08:00
Dalton Hubble
b6016d0a26
Disable Grafana login form, admin user can't be disabled
...
* Example manifests aim to provide a read-only dashboard visible
to any users with network access (i.e. kubectl port-forward, LAN)
* Problem: Grafana always has an admin user, even with the user
management system disabled
* Disable the login form to prevent admin login
2018-11-28 22:04:08 -08:00
Dalton Hubble
872b11b948
Update ngninx-ingress from v0.20.0 to v0.21.0
...
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.21.0
2018-11-26 21:57:34 -08:00
Dalton Hubble
c8c43f3991
Update Grafana from v5.3.2 to v5.3.4
...
* https://github.com/grafana/grafana/releases/tag/v5.3.3
* https://github.com/grafana/grafana/releases/tag/v5.3.4
2018-11-18 16:42:50 -08:00
Dalton Hubble
7de03a1279
Fix Prometheus etcd scrape config for DigitalOcean
...
* Kubelet uses a node's hostname as the node name, which isn't
resolvable on DigitalOcean. On DigitalOcean, the node name was
set to the internal IP until #337 switched to instead configuring
kube-apiserver to prefer the InternalIP for communication
* Explicitly configure etcd scrapes to target each controller by
internal IP and port 2381 (replace __address__)
2018-11-06 23:02:45 -08:00
Dalton Hubble
be9f7b87d6
Update Prometheus from v2.4.3 to v2.5.0
...
* https://github.com/prometheus/prometheus/releases/tag/v2.5.0
2018-11-06 22:16:12 -08:00
Dalton Hubble
884c8b39dc
Update Grafana from v5.3.1 to v5.3.2
...
* https://github.com/grafana/grafana/releases/tag/v5.3.2
2018-10-28 19:44:22 -07:00
Dalton Hubble
bc750aec33
Configure Heapster to source metrics from Kubelet authenticated API
...
* Heapster can now get nodes (i.e. kubelets) from the apiserver and
source metrics from the Kubelet authenticated API (10250) instead of
the Kubelet HTTP read-only API (10255)
* https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md
* Use the heapster service account token via Kubelet bearer token
authn/authz.
* Permit Heapster to skip CA verification. The CA cert does not contain
IP SANs and cannot since nodes get random IPs that aren't known upfront.
Heapster obtains the node list from the apiserver, so the risk of
spoofing a node is limited. For the same reason, Prometheus scrapes
must skip CA verification for scraping Kubelet's provided by the apiserver.
* https://github.com/poseidon/typhoon/blob/v1.12.1/addons/prometheus/config.yaml#L68
* Create a heapster ClusterRole to work around the default Kubernetes
`system:heapster` ClusterRole lacking the proper GET `nodes/stats`
access. See https://github.com/kubernetes/heapster/issues/1936
2018-10-18 21:03:01 -07:00
Dalton Hubble
0127ee82c1
Update nginx-ingress from v0.19.0 to v0.20.0
2018-10-16 21:35:29 -07:00
Dalton Hubble
a10d6977b8
Update Prometheus from v2.4.2 to v2.4.3
...
* https://github.com/prometheus/prometheus/releases/tag/v2.4.3
2018-10-16 21:29:41 -07:00
Dalton Hubble
05fe923c14
Update Grafana from v5.3.0 to v5.3.1
...
* https://github.com/grafana/grafana/releases/tag/v5.3.1
2018-10-16 21:23:44 -07:00
Dalton Hubble
5eb4078d68
Add docker/default seccomp to control plane and addons
...
* Annotate pods, deployments, and daemonsets to start containers
with the Docker runtime's default seccomp profile
* Overrides Kubernetes default behavior which started containers
with seccomp=unconfined
* https://docs.docker.com/engine/security/seccomp/#pass-a-profile-for-a-container
2018-10-16 20:07:29 -07:00
Dalton Hubble
8f0d2b5db4
Update Grafana from v5.2.4 to v5.3.0
2018-10-13 23:03:31 -07:00
Dalton Hubble
032a24133b
Update Prometheus from v2.3.2 to v2.4.2
...
* https://github.com/prometheus/prometheus/releases/tag/v2.4.0
* https://github.com/prometheus/prometheus/releases/tag/v2.4.1
* https://github.com/prometheus/prometheus/releases/tag/v2.4.2
2018-09-21 22:27:11 -07:00
Dalton Hubble
dc03f7a4a9
Update nginx-ingress from 0.17.1 to 0.19.0
...
* If using --enable-ssl-passthrough or exposing TCP/UDP services,
be aware of https://github.com/kubernetes/ingress-nginx/pull/3038
* Workarounds until the fix merges are to stay on 0.17.1, use the
suggested development image, or revert to securityContext
`runAsNonRoot: false` for a while (less secure)
2018-09-08 17:57:01 -07:00
Dalton Hubble
1b8234eb91
Update Grafana from v5.2.2 to v5.2.4
...
* https://github.com/grafana/grafana/releases/tag/v5.2.3
* https://github.com/grafana/grafana/releases/tag/v5.2.4
2018-09-08 15:41:20 -07:00
Dalton Hubble
4ba090feb0
Update kube-state-metrics from v1.3.1 to v1.4.0
2018-08-29 09:37:50 -07:00
Dalton Hubble
4882fe1053
Add docs for Azure Ingress and worker pools
...
* Azure worker pools must be in the same region as
the cluster itself unfortunately
2018-08-27 23:30:56 -07:00
Becca Powell
49a9dc9b8b
Fix typo in Prometheus alerting rules
2018-08-21 16:55:49 -07:00
Dalton Hubble
dbdc3fc850
Add nginx-ingress addon manifests for bare-metal
2018-08-11 12:14:23 -07:00
Dalton Hubble
e00f97c578
Update nginx-ingress from 0.16.2 to 0.17.1
...
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.17.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.17.0
2018-08-08 00:45:20 -07:00
Dalton Hubble
e6720cf738
Update heapster from v1.5.3 to v1.5.4
...
* https://github.com/kubernetes/heapster/releases/tag/v1.5.4
2018-07-29 11:19:57 -07:00
Dalton Hubble
844f380b4e
Update Grafana from v5.2.1 to v5.2.2
...
* https://github.com/grafana/grafana/releases/tag/v5.2.2
2018-07-29 11:12:56 -07:00
Dalton Hubble
02cd8eb8d3
Update Prometheus from v2.3.1 to v2.3.2
...
* https://github.com/prometheus/prometheus/releases/tag/v2.3.2
2018-07-14 14:25:49 -07:00
Dalton Hubble
84d6cfe7b3
Add Prometheus alert rule for inactive md devices
...
* node-exporter exposes metrics to Prometheus about total and
active md devices (e.g. disks in mdadm RAID arrays)
* Add alert that fires when a RAID disk fails or becomes inactive
for another reason
2018-07-10 00:20:30 -07:00
Dalton Hubble
f40f60b83c
Update Nginx Ingress controller from 0.15.0 to 0.16.2
...
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.16.2
* https://github.com/kubernetes/ingress-nginx/blob/master/Changelog.md
2018-07-02 22:06:22 -07:00
Dalton Hubble
a3349b5c68
Update heapster from v1.5.2 to v1.5.3
2018-07-01 21:07:52 -07:00
Dalton Hubble
74dc6b0bf9
Update Grafana from 5.1.4 to 5.2.1
...
* http://docs.grafana.org/guides/whats-new-in-v5-2/
* https://github.com/grafana/grafana/releases/tag/v5.2.0
* https://github.com/grafana/grafana/releases/tag/v5.2.1
2018-07-01 20:55:34 -07:00
Dalton Hubble
2eaf04c68b
Drop hostNetwork from nginx-ingress addon
...
* Both flannel and Calico support host port via `portmap`
* Allows writing NetworkPolicies that reference ingress pods in `from`
or `to`. HostNetwork pods were difficult to write network policy for
since they could circumvent the CNI network to communicate with pods on
the same node.
2018-06-22 00:46:41 -07:00
Dalton Hubble
d5de41e07a
Update Grafana from 5.1.3 to 5.1.4
...
* https://github.com/grafana/grafana/releases/tag/v5.1.4
2018-06-19 21:45:15 -07:00
Dalton Hubble
05b99178ae
Update prometheus from v2.3.0 to v2.3.1
...
* https://github.com/prometheus/prometheus/releases/tag/v2.3.1
2018-06-19 21:43:50 -07:00
Stephen Demos
18dd7ccc09
Update CLUO from v0.6.0 to v0.7.0
2018-06-14 22:32:36 -07:00
Dalton Hubble
cbe646fba6
Label namespaces to ease writing Network Policies
2018-06-09 11:45:11 -07:00
Dalton Hubble
c166b2ba33
Update prometheus from v2.2.1 to v2.3.0
2018-06-09 11:43:10 -07:00
Dalton Hubble
d32e6797ae
Annotate Grafana so Prometheus scrapes metrics
2018-05-30 22:37:47 -07:00
Dalton Hubble
32a9a83190
Add Prometheus liveness and readiness probes
2018-05-30 22:34:07 -07:00
Dalton Hubble
28d0891729
Annotate nginx-ingress addon for Prometheus auto-discovery
...
* Add Google Cloud firewall rule to allow worker to worker access
to health and metrics
2018-05-19 13:13:14 -07:00
Dalton Hubble
714419342e
Update nginx-ingress from 0.14.0 to 0.15.0
...
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.15.0
2018-05-17 21:42:55 -07:00
Dalton Hubble
3701c0b1fe
Update Grafana from v5.1.2 to v5.1.3
...
* https://github.com/grafana/grafana/releases/tag/v5.1.3
2018-05-17 21:36:09 -07:00
Dalton Hubble
c2b719dc75
Configure Prometheus to scrape Kubelets directly
...
* Use Kubelet bearer token authn/authz to scrape metrics
* Drop RBAC permission from nodes/proxy to nodes/metrics
* Stop proxying kubelet scrapes through the apiserver, since
this required higher privilege (nodes/proxy) and can add
load to the apiserver on large clusters
2018-05-14 23:06:50 -07:00
Dalton Hubble
fb88113523
Disable default Google Analytics in Grafana addon
...
* Its come to my attention Grafana reports analytics data
by default. Typhoon's philosophy requires user permission
for data collection so the addon should have this disabled
* http://docs.grafana.org/installation/configuration/#analytics
2018-05-10 01:18:47 -07:00
Dalton Hubble
1854f5c104
Update Grafana from v5.1.1 to v5.1.2
...
* https://github.com/grafana/grafana/releases/tag/v5.1.2
2018-05-10 01:09:08 -07:00
Dalton Hubble
726b58b697
Update Grafana from v5.0.4 to v5.1.1
...
* https://github.com/grafana/grafana/releases/tag/v5.1.1
* https://github.com/grafana/grafana/releases/tag/v5.1.0
2018-05-07 22:05:19 -07:00
Dalton Hubble
a54e3c0da1
Fix Prometheus data dir to /var/lib/prometheus
...
* A data volume (emptyDir) is mounted to /var/lib/prometheus
* Users could swap emptyDir for any desired volume if data
persistence is desired. Prometheus previously defaulted to
keeping its data in ./data relative to /prometheus. Override
this behavior to store data in /var/lib/prometheus
2018-05-01 22:05:27 -07:00
Dalton Hubble
731a6ec23a
Update nginx-ingress from 0.13.0 to 0.14.0
...
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.14.0
2018-04-28 13:10:03 -07:00
Dalton Hubble
e0d9e9979c
Update nginx-ingress from 0.12.0 to 0.13.0
...
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.13.0
2018-04-18 21:12:09 -07:00
Dalton Hubble
9789881243
Update kube-state-metrics from v1.3.0 to v1.3.1
...
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.3.1
2018-04-15 17:10:02 -07:00
Dalton Hubble
6b08bde479
Use k8s.gcr.io instead of gcr.io/google_containers
...
* Kubernetes recommends using the alias to fetch images
from the nearest GCR regional mirror, to abstract the use
of GCR, and to drop names containing 'google'
* https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ
2018-04-08 12:57:52 -07:00
Dalton Hubble
f4b2396718
Return Prometheus deployment to be a worker workload
...
* Expose etcd metrics to workers so Prometheus can
run on a worker, rather than a controller
* Drop temporary firewall rules allowing Prometheus
to run on a controller and scrape targes
* Related to https://github.com/poseidon/typhoon/pull/175
2018-04-08 12:20:00 -07:00
Dalton Hubble
7186aa46da
Update kube-state-metrics from v1.2.0 to v1.3.0
...
* https://github.com/kubernetes/kube-state-metrics/pull/412
* https://github.com/kubernetes/kube-state-metrics/pull/413
2018-04-04 21:04:13 -07:00
Dalton Hubble
d770393dbc
Add etcd metrics, Prometheus scrapes, and Grafana dash
...
* Use etcd v3.3 --listen-metrics-urls to expose only metrics
data via http://0.0.0.0:2381 on controllers
* Add Prometheus discovery for etcd peers on controller nodes
* Temporarily drop two noisy Prometheus alerts
2018-04-03 20:31:00 -07:00
Dalton Hubble
b1e41dcb99
addons: Update from Grafana v4.6.3 to v5.0.4
...
This reverts commit c59a9c66b1
.
2018-03-28 19:45:19 -07:00
Dalton Hubble
65a2751f77
addons: Update heapster from v1.5.1 to v1.5.2
...
* https://github.com/kubernetes/heapster/releases/tag/v1.5.2
2018-03-21 20:32:01 -07:00
Dalton Hubble
851bc1a3f8
Update nginx-ingress from 0.11.0 to 0.12.0
2018-03-19 23:17:17 -07:00
Dalton Hubble
46226a8015
Update Prometheus from 2.2.0 to 2.2.1
2018-03-18 15:56:44 -07:00
Dalton Hubble
c59a9c66b1
Revert "addons: Update from Grafana v4.6.3 to v5.0.0"
...
* Revert commit 9dcc255f8e
.
* Grafana v5.0 is not compatible with Kubernetes v1.9.4. See
https://github.com/poseidon/typhoon/pull/162
2018-03-12 21:01:14 -07:00
Dalton Hubble
42708f9a70
Update Prometheus from v2.2.0-rc.1 to v2.2.0
...
* https://github.com/prometheus/prometheus/releases/tag/v2.2.0
2018-03-09 00:20:40 -08:00
Dalton Hubble
d54709f89c
Update Grafana from v5.0.0 to 5.0.1
...
* https://github.com/grafana/grafana/releases/tag/v5.0.1
2018-03-09 00:20:40 -08:00
Dalton Hubble
9dcc255f8e
addons: Update from Grafana v4.6.3 to v5.0.0
2018-03-09 00:20:40 -08:00
Dalton Hubble
9307e97c46
addons: Update Prometheus from v2.1.0 to v2.2.0
...
* Annotate Prometheus service to scrape metrics from
Prometheus itself (enables Prometheus* alerts)
* Update kube-state-metrics addon-resizer to 1.7
* Use port 8080 for kube-state-metrics
* Add PrometheusNotIngestingSamples alert rule
* Change K8SKubeletDown alert rule to fire when 10%
of kubelets are down, not 1%
* https://github.com/coreos/prometheus-operator/pull/1032
2018-03-09 00:20:40 -08:00
Paul Saunders
86420fd507
Rename namespace manifests to be applied first
...
* Ensure kubectl apply -R creates manifests in the right order
2018-02-22 01:04:30 -08:00
Dalton Hubble
5c383f4184
addons: Update nginx-ingress from 0.10.2 to 0.11.0
2018-02-21 23:54:12 -08:00