Reduce apiserver metrics cardinality and extraneous labels

* Stop mapping node labels to targets discovered via Kubernetes nodes (e.g. etcd, kubelet, cadvisor). It is rarely useful to store node labels (e.g. kubernetes.io/os=linux) on these metrics * kube-apiserver's apiserver_request_duration_seconds_bucket metric has a high cardinality that includes labels for the API group, verb, scope, resource, and component for each object type, including for each CRD. This one metric has ~10k time series in a typical cluster (btw 10-40% of total) * Removing the apiserver request duration outright would make latency alerts a NoOp and break a Grafana apiserver panel. Instead, drop series that have a "group" label. Effectively, only request durations for core Kubernetes APIs will be kept (e.g. cardinality won't grow with each CRD added). This reduces the metric to ~2k unique series
2025-09-16 16:59:44 +02:00 · 2019-12-08 22:05:15 -08:00
parent d9c7a9e049
commit 178afe4a9b
2 changed files with 15 additions and 11 deletions
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -28,6 +28,7 @@ Notable changes between versions.
 * Update Grafana from v6.4.4 to [v6.5.1](https://grafana.com/docs/guides/whats-new-in-v6-5/)
 * Add pod networking details in dashboards ([#593](https://github.com/poseidon/typhoon/pull/593))
 * Add node alerts and Grafana dashboard from node-exporter ([#591](https://github.com/poseidon/typhoon/pull/591))
+* Reduce Prometheus time series of high cardinality metrics ([#596](https://github.com/poseidon/typhoon/pull/596))

 ## v1.16.3