Reduce apiserver metrics cardinality and extraneous labels
* Stop mapping node labels to targets discovered via Kubernetes nodes (e.g. etcd, kubelet, cadvisor). It is rarely useful to store node labels (e.g. kubernetes.io/os=linux) on these metrics * kube-apiserver's apiserver_request_duration_seconds_bucket metric has a high cardinality that includes labels for the API group, verb, scope, resource, and component for each object type, including for each CRD. This one metric has ~10k time series in a typical cluster (btw 10-40% of total) * Removing the apiserver request duration outright would make latency alerts a NoOp and break a Grafana apiserver panel. Instead, drop series that have a "group" label. Effectively, only request durations for core Kubernetes APIs will be kept (e.g. cardinality won't grow with each CRD added). This reduces the metric to ~2k unique series
This commit is contained in:
parent
d9c7a9e049
commit
178afe4a9b
|
@ -28,6 +28,7 @@ Notable changes between versions.
|
||||||
* Update Grafana from v6.4.4 to [v6.5.1](https://grafana.com/docs/guides/whats-new-in-v6-5/)
|
* Update Grafana from v6.4.4 to [v6.5.1](https://grafana.com/docs/guides/whats-new-in-v6-5/)
|
||||||
* Add pod networking details in dashboards ([#593](https://github.com/poseidon/typhoon/pull/593))
|
* Add pod networking details in dashboards ([#593](https://github.com/poseidon/typhoon/pull/593))
|
||||||
* Add node alerts and Grafana dashboard from node-exporter ([#591](https://github.com/poseidon/typhoon/pull/591))
|
* Add node alerts and Grafana dashboard from node-exporter ([#591](https://github.com/poseidon/typhoon/pull/591))
|
||||||
|
* Reduce Prometheus time series of high cardinality metrics ([#596](https://github.com/poseidon/typhoon/pull/596))
|
||||||
|
|
||||||
## v1.16.3
|
## v1.16.3
|
||||||
|
|
||||||
|
|
|
@ -65,6 +65,9 @@ data:
|
||||||
- source_labels: [__name__]
|
- source_labels: [__name__]
|
||||||
action: drop
|
action: drop
|
||||||
regex: apiserver_admission_step_admission_latencies_seconds_.*
|
regex: apiserver_admission_step_admission_latencies_seconds_.*
|
||||||
|
- source_labels: [__name__, group]
|
||||||
|
regex: apiserver_request_duration_seconds_bucket;.+
|
||||||
|
action: drop
|
||||||
|
|
||||||
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
|
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
|
||||||
# metrics from a node by scraping kubelet (127.0.0.1:10250/metrics).
|
# metrics from a node by scraping kubelet (127.0.0.1:10250/metrics).
|
||||||
|
@ -81,7 +84,7 @@ data:
|
||||||
|
|
||||||
relabel_configs:
|
relabel_configs:
|
||||||
- action: labelmap
|
- action: labelmap
|
||||||
regex: __meta_kubernetes_node_label_(.+)
|
regex: __meta_kubernetes_node_name
|
||||||
|
|
||||||
# Scrape config for Kubelet cAdvisor. Explore metrics from a node by
|
# Scrape config for Kubelet cAdvisor. Explore metrics from a node by
|
||||||
# scraping kubelet (127.0.0.1:10250/metrics/cadvisor).
|
# scraping kubelet (127.0.0.1:10250/metrics/cadvisor).
|
||||||
|
@ -99,7 +102,7 @@ data:
|
||||||
|
|
||||||
relabel_configs:
|
relabel_configs:
|
||||||
- action: labelmap
|
- action: labelmap
|
||||||
regex: __meta_kubernetes_node_label_(.+)
|
regex: __meta_kubernetes_node_name
|
||||||
metric_relabel_configs:
|
metric_relabel_configs:
|
||||||
- source_labels: [__name__, image]
|
- source_labels: [__name__, image]
|
||||||
action: drop
|
action: drop
|
||||||
|
@ -119,7 +122,7 @@ data:
|
||||||
action: keep
|
action: keep
|
||||||
regex: 'true'
|
regex: 'true'
|
||||||
- action: labelmap
|
- action: labelmap
|
||||||
regex: __meta_kubernetes_node_label_(.+)
|
regex: __meta_kubernetes_node_name
|
||||||
- source_labels: [__meta_kubernetes_node_address_InternalIP]
|
- source_labels: [__meta_kubernetes_node_address_InternalIP]
|
||||||
action: replace
|
action: replace
|
||||||
target_label: __address__
|
target_label: __address__
|
||||||
|
|
Loading…
Reference in New Issue