Compare commits

..

55 Commits

Author SHA1 Message Date
5c3b5a20de Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-11-14 13:32:04 -08:00
f5a83667e8 Update Grafana from v7.3.1 to v7.3.2
* https://github.com/grafana/grafana/releases/tag/v7.3.2
2020-11-14 13:30:30 -08:00
a911367c2e Update nginx-ingress from v0.41.0 to v0.41.2
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.2
2020-11-14 13:27:06 -08:00
f884de847e Discard Prometheus etcd gRPC failure alert
* Kubernetes watch expiry is not a gRPC code we care about
* Background: This rule is typically removed, but was added back in
2020-11-14 13:17:56 -08:00
1b3a0f6ebc Add experimental Fedora CoreOS arm64 support on AWS
* Add experimental `arch` variable to Fedora CoreOS AWS,
accepting amd64 (default) or arm64 to support native
arm64/aarch64 clusters or mixed/hybrid clusters with
a worker pool of arm64 workers
* Add `daemonset_tolerations` variable to cluster module
(experimental)
* Add `node_taints` variable to workers module
* Requires flannel CNI and experimental Poseidon-built
arm64 Fedora CoreOS AMIs (published to us-east-1, us-east-2,
and us-west-1)

WARN:

* Our AMIs are experimental, may be removed at any time, and
will be removed when Fedora CoreOS publishes official arm64
AMIs. Do NOT use in production

Related:

* https://github.com/poseidon/typhoon/pull/682
2020-11-14 13:09:24 -08:00
1113a22f61 Update Kubernetes from v1.19.3 to v1.19.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1194
2020-11-11 22:56:27 -08:00
152c7d86bd Change bootstrap.service container from rkt to docker
* Use docker to run `bootstrap.service` container
* Background https://github.com/poseidon/typhoon/pull/855
2020-11-11 22:26:05 -08:00
79deb8a967 Update Cilium from v1.9.0-rc3 to v1.9.0
* https://github.com/cilium/cilium/releases/tag/v1.9.0
2020-11-10 23:42:41 -08:00
f412f0d9f2 Update Calico from v3.16.4 to v3.16.5
* https://github.com/projectcalico/calico/releases/tag/v3.16.5
2020-11-10 22:58:19 -08:00
eca6c4a1a1 Fix broken flatcar linux documentation links (#870)
* Fix old documentation links
2020-11-10 18:30:30 -08:00
133d325013 Update nginx-ingress from v0.40.2 to v0.41.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.0
2020-11-08 14:34:52 -08:00
4b05c0180e Update Grafana from v7.3.0 to v7.3.1
* https://github.com/grafana/grafana/releases/tag/v7.3.1
2020-11-08 14:13:39 -08:00
f49ab3a6ee Update Prometheus from v2.22.0 to v2.22.1
* https://github.com/prometheus/prometheus/releases/tag/v2.22.1
2020-11-08 14:12:24 -08:00
0eef16b274 Improve and tidy Fedora CoreOS etcd-member.service
* Allow a snippet with a systemd dropin to set an alternate
image via `ETCD_IMAGE`, for consistency across Fedora CoreOS
and Flatcar Linux
* Drop comments about integrating system containers with
systemd-notify
2020-11-08 11:49:56 -08:00
ad1f59ce91 Change Flatcar etcd-member.service container from rkt to docker
* Use docker to run the `etcd-member.service` container
* Use env-file `/etc/etcd/etcd.env` like podman on FCOS
* Background: https://github.com/poseidon/typhoon/pull/855
2020-11-03 16:42:18 -08:00
82e5ac3e7c Update Cilium from v1.8.5 to v1.9.0-rc3
* https://github.com/poseidon/terraform-render-bootstrap/pull/224
2020-11-03 10:29:07 -08:00
a8f7880511 Update Cilium from v1.8.4 to v1.8.5
* https://github.com/cilium/cilium/releases/tag/v1.8.5
2020-10-29 00:50:18 -07:00
cda5b93b09 Update kube-state-metrics from v2.0.0-alpha.1 to v2.0.0-alpha.2
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.2
2020-10-28 18:49:40 -07:00
3e9f5f34de Update Grafana from v7.2.2 to v7.3.0
* https://github.com/grafana/grafana/releases/tag/v7.3.0
2020-10-28 17:46:26 -07:00
893d139590 Update Calico from v3.16.3 to v3.16.4
* https://github.com/projectcalico/calico/releases/tag/v3.16.4
2020-10-26 00:50:40 -07:00
fc62e51b2a Update Grafana from v7.2.1 to v7.2.2
* https://github.com/grafana/grafana/releases/tag/v7.2.2
2020-10-22 00:14:04 -07:00
e5ba3329eb Remove bare-metal CoreOS Container Linux profiles
* Remove Matchbox profiles for CoreOS Container Linux
* Simplify the remaining Flatcat Linux profiles
2020-10-21 00:25:10 -07:00
7c3f3ab6d0 Rename container-linux modules to flatcar-linux
* CoreOS Container Linux was deprecated in v1.18.3
* Continue transitioning docs and modules from supporting
both CoreOS and Flatcar "variants" of Container Linux to
now supporting Flatcar Linux and equivalents

Action Required: Update the Flatcar Linux modules `source`
to replace `s/container-linux/flatcar-linux`. See docs for
examples
2020-10-20 22:47:19 -07:00
a99a990d49 Remove unused Kubelet tls mounts
* Kubelet trusts only the cluster CA certificate (and
certificates in the Kubelet debian base image), there
is no longer a need to mount the host's trusted certs
* Similar change on Flatcar Linux in
https://github.com/poseidon/typhoon/pull/855

Rel: https://github.com/poseidon/typhoon/pull/810
2020-10-18 23:48:21 -07:00
df17253e72 Fix delete node permission on Fedora CoreOS node shutdown
* On cloud platforms, `delete-node.service` tries to delete the
local node (not always possible depending on preemption time)
* Since v1.18.3, kubelet TLS bootstrap generates a kubeconfig
in `/var/lib/kubelet` which should be used with kubectl in
the delete-node oneshot
2020-10-18 23:38:11 -07:00
eda78db08e Change Flatcar kubelet.service container from rkt to docker
* Use docker to run the `kubelet.service` container
* Update Kubelet mounts to match Fedora CoreOS
* Remove unused `/etc/ssl/certs` mount (see
https://github.com/poseidon/typhoon/pull/810)
* Remove unused `/usr/share/ca-certificates` mount
* Remove `/etc/resolv.conf` mount, Docker default is ok
* Change `delete-node.service` to use docker instead of rkt
and inline ExecStart, as was done on Fedora CoreOS
* Fix permission denied on shutdown `delete-node`, caused
by the kubeconfig mount changing with the introduction of
node TLS bootstrap

Background

* podmand, rkt, and runc daemonless container process runners
provide advantages over the docker daemon for system containers.
Docker requires workarounds for use in systemd units where the
ExecStart must tail logs so systemd can monitor the daemonized
container. https://github.com/moby/moby/issues/6791
* Why switch then? On Flatcar Linux, podman isn't shipped. rkt
works, but isn't developing while container standards continue
to move forward. Typhoon has used runc for the Kubelet runner
before in Fedora Atomic, but its more low-level. So we're left
with Docker, which is less than ideal, but shipped in Flatcar
* Flatcar Linux appears to be shifting system components to
use docker, which does provide some limited guards against
breakages (e.g. Flatcar cannot enable docker live restore)
2020-10-18 23:24:45 -07:00
afac46e39a Remove asset_dir variable and optional asset writes
* Originally, poseidon/terraform-render-bootstrap generated
TLS certificates, manifests, and cluster "assets" written
to local disk (`asset_dir`) during terraform apply cluster
bootstrap
* Typhoon v1.17.0 introduced bootstrapping using only Terraform
state to store cluster assets, to avoid ever writing sensitive
materials to disk and improve automated use-cases. `asset_dir`
was changed to optional and defaulted to "" (no writes)
* Typhoon v1.18.0 deprecated the `asset_dir` variable, removed
docs, and announced it would be deleted in future.
* Add Terraform output `assets_dir` map
* Remove the `asset_dir` variable

Cluster assets are now stored in Terraform state only. For those
who wish to write those assets to local files, this is possible
doing so explicitly.

```
resource local_file "assets" {
  for_each = module.yavin.assets_dist
  filename = "some-assets/${each.key}"
  content = each.value
}
```

Related:

* https://github.com/poseidon/typhoon/pull/595
* https://github.com/poseidon/typhoon/pull/678
2020-10-17 15:00:15 -07:00
b1e680ac0c Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-10-17 13:56:24 -07:00
9fbfbdb854 Update Prometheus from v2.21.0 to v2.22.0
* https://github.com/prometheus/prometheus/releases/tag/v2.22.0
2020-10-17 12:38:25 -07:00
511f5272f4 Update Calico from v3.15.3 to v3.16.3
* https://github.com/projectcalico/calico/releases/tag/v3.16.3
* https://github.com/poseidon/terraform-render-bootstrap/pull/212
2020-10-15 20:08:51 -07:00
46ca5e8813 Update Kubernetes from v1.19.2 to v1.19.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193
2020-10-14 20:47:49 -07:00
394e496cc7 Update Grafana from v7.2.0 to v7.2.1
* https://github.com/grafana/grafana/releases/tag/v7.2.1
2020-10-11 13:21:25 -07:00
a38ec1a856 Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-10-11 13:06:53 -07:00
7881f4bd86 Update kube-state-metrics from v1.9.7 to v2.0.0-alpha.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.1
2020-10-11 12:35:43 -07:00
d5b5b7cb02 Update nginx-ingress from v0.40.0 to v0.40.2
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.2
2020-10-06 23:52:15 -07:00
759a48be7c Update mkdocs-material from v5.5.12 to v6.0.1
* Update OS kernel, systemd, and docker verisons
2020-10-02 01:18:38 -07:00
b39a1d70da Update nginx-ingress from v0.35.0 to v0.40.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.0
2020-10-02 01:00:35 -07:00
901f7939b2 Update Cilium from v1.8.3 to v1.8.4
* https://github.com/cilium/cilium/releases/tag/v1.8.4
2020-10-02 00:24:26 -07:00
d65085ce14 Update Grafana from v7.1.5 to v7.2.0
* https://github.com/grafana/grafana/releases/tag/v7.2.0
2020-09-24 20:58:32 -07:00
343db5b578 Remove references to CoreOS Container Linux
* CoreOS Container Linux was deprecated in v1.18.3 (May 2020)
in favor of Fedora CoreOS and Flatcar Linux. CoreOS Container
Linux references were kept to give folks more time to migrate,
but AMIs have now been deleted. Time is up.

Rel: https://coreos.com/os/eol/
2020-09-24 20:51:02 -07:00
444363be2d Update Kubernetes from v1.19.1 to v1.19.2
* Update flannel from v0.12.0 to v0.13.0-rc2
* Update flannel-cni from v0.4.0 to v0.4.1
* Update CNI plugins from v0.8.6 to v0.8.7
2020-09-16 20:05:54 -07:00
bc7ad25c60 Update Grafana dashboard for Kubelet v1.19
* Fix Kubelet pod and container count metrics dashboard
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/499
2020-09-15 23:21:56 -07:00
e838d4dc3d Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh upstream Prometheus rules/alerts and Grafana dashboards
2020-09-13 15:03:27 -07:00
979c092ef6 Reduce apiserver metrics cardinality of non-core APIs
* Reduce `apiserver_request_duration_seconds_count` cardinality
by dropping series for non-core Kubernetes APIs. This is done
to match `apiserver_request_duration_seconds_count` relabeling
* These two relabels must be performed the same way to avoid
affecting new SLO calculations (upcoming)
* See https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/498

Related: https://github.com/poseidon/typhoon/pull/596
2020-09-13 14:47:49 -07:00
db8e94bb4b Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-09-12 19:41:15 -07:00
eb093af9ed Drop Kubelet labelmap relabel for node_name
* Originally, Kubelet and CAdvisor metrics used a labelmap
relabel to add Kubernetes SD node labels onto timeseries
* With https://github.com/poseidon/typhoon/pull/596 that
relabel was dropped since node labels aren't usually that
valuable. `__meta_kubernetes_node_name` was retained but
the field name is empty
* Favor just using Prometheus server-side `instance` in
queries that require some node identifier for aggregation
or debugging

Fix https://github.com/poseidon/typhoon/issues/823
2020-09-12 19:40:00 -07:00
36096f844d Promote Cilium from experimental to GA
* Cilium was added as an experimental CNI provider in June
* Since then, I've been choosing it for an increasing number
of clusters and scenarios.
2020-09-12 19:24:55 -07:00
d236628e53 Update Prometheus from v2.20.0 to v2.21.0
* https://github.com/prometheus/prometheus/releases/tag/v2.21.0
2020-09-12 19:20:54 -07:00
577b927a2b Update Fedora CoreOS Config version from v1.0.0 to v1.1.0
* No notable changes in the config spec, just house keeping
* Require any snippets customization to update to v1.1.0. Version
skew between the main config and snippets will show an err message
* https://github.com/coreos/fcct/blob/master/docs/configuration-v1_1.md
2020-09-10 23:38:40 -07:00
000c11edf6 Update IngressClass resources to networking.k8s.io/v1
* Kubernetes v1.19 graduated Ingress and IngressClass from
networking.k8s.io/v1beta1 to networking.k8s.io/v1
2020-09-10 23:25:53 -07:00
29b16c3fc0 Change seccomp annotations to seccompProfile
* seccomp graduated to GA in Kubernetes v1.19. Support for
seccomp alpha annotations will be removed in v1.22
* Replace seccomp annotations with the GA seccompProfile
field in the PodTemplate securityContext
* Switch profile from `docker/default` to `runtime/default`
(no effective change, since docker is the runtime)
* Verify with docker inspect SecurityOpt. Without the profile,
you'd see `seccomp=unconfined`

Related: https://github.com/poseidon/terraform-render-bootstrap/pull/215
2020-09-10 01:15:07 -07:00
0c7a879bc4 Update Kubernetes from v1.19.0 to v1.19.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191
2020-09-09 20:52:29 -07:00
1e654c9e4e Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
* Update mkdocs-material from v5.5.11 to v5.5.12
2020-09-07 21:18:47 -07:00
28ee693e6b Update Cilium from v1.8.2 to v1.8.3
* https://github.com/cilium/cilium/releases/tag/v1.8.3
2020-09-07 21:10:27 -07:00
8c7d95aefd Update mkdocs-material from v5.5.9 to v5.5.11 2020-08-29 13:52:16 -07:00
174 changed files with 4359 additions and 2438 deletions

View File

@ -4,6 +4,99 @@ Notable changes between versions.
## Latest ## Latest
## v1.19.4
* Kubernetes [v1.19.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1194)
* Update Cilium from v1.8.4 to [v1.9.0](https://github.com/cilium/cilium/releases/tag/v1.9.0)
* Update Calico from v3.16.3 to [v3.16.5](https://github.com/projectcalico/calico/releases/tag/v3.16.5)
* Remove `asset_dir` variable (defaulted off in [v1.17.0](https://github.com/poseidon/typhoon/pull/595), deprecated in [v1.18.0](https://github.com/poseidon/typhoon/pull/678))
### Fedora CoreOS
* Improve `etcd-member.service` systemd unit ([#868](https://github.com/poseidon/typhoon/pull/868))
* Allow a snippet with a systemd dropin to set an alternate image (e.g. mirror)
* Fix local node delete oneshot on node shutdown ([#856](https://github.com/poseidon/typhoon/pull/855))
#### AWS
* Add experimental Fedora CoreOS arm64 support ([docs](https://typhoon.psdn.io/advanced/arm64/), [#875](https://github.com/poseidon/typhoon/pull/875))
* Allow arm64 full-cluster or mixed/hybrid cluster with worker pools
* Add `arch` variable to cluster module
* Add `daemonset_tolerations` variable to cluster module
* Add `node_taints` variable to workers module
* Requires flannel CNI provider and use of experimental AMI (see docs)
### Flatcar Linux
* Rename `container-linux` modules to `flatcar-linux` ([#858](https://github.com/poseidon/typhoon/issues/858)) (**action required**)
* Change on-host system containers from rkt to docker
* Change `etcd-member.service` container runnner from rkt to docker ([#867](https://github.com/poseidon/typhoon/pull/867))
* Change `kubelet.service` container runner from rkt-fly to docker ([#855](https://github.com/poseidon/typhoon/pull/855))
* Change `bootstrap.service` container runner from rkt to docker ([#873](https://github.com/poseidon/typhoon/pull/873))
* Change `delete-node.service` to use docker and an inline ExecStart ([#855](https://github.com/poseidon/typhoon/pull/855))
* Fix local node delete oneshot on node shutdown ([#855](https://github.com/poseidon/typhoon/pull/855))
* Remove CoreOS Container Linux Matchbox profiles ([#859](https://github.com/poseidon/typhoon/pull/858))
### Addons
* Update nginx-ingress from v0.40.2 to [v0.41.2](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.2)
* Update Prometheus from v2.22.0 to [v2.22.1](https://github.com/prometheus/prometheus/releases/tag/v2.22.1)
* Update kube-state-metrics from v2.0.0-alpha.1 to [v2.0.0-alpha.2](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.2)
* Update Grafana from v7.2.1 to [v7.3.2](https://github.com/grafana/grafana/releases/tag/v7.3.2)
## v1.19.3
* Kubernetes [v1.19.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193)
* Update Cilium from v1.8.3 to [v1.8.4](https://github.com/cilium/cilium/releases/tag/v1.8.4)
* Update Calico from v1.15.3 to [v1.16.3](https://github.com/projectcalico/calico/releases/tag/v3.16.3) ([#851](https://github.com/poseidon/typhoon/pull/851))
* Update flannel from v0.13.0-rc2 to v0.13.0 ([#219](https://github.com/poseidon/terraform-render-bootstrap/pull/219))
### Flatcar Linux
* Remove references to CoreOS Container Linux ([#839](https://github.com/poseidon/typhoon/pull/839))
* Fix error querying for coreos AMI on AWS ([#838](https://github.com/poseidon/typhoon/issues/838))
### Addons
* Update nginx-ingress from v0.35.0 to [v0.40.2](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.2)
* Update Grafana from v7.1.5 to [v7.2.1](https://github.com/grafana/grafana/releases/tag/v7.2.1)
* Update Prometheus from v2.21.0 to [v2.22.0](https://github.com/prometheus/prometheus/releases/tag/v2.22.0)
* Update kube-state-metrics from v1.9.7 to [v2.0.0-alpha.1](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.1)
## v1.19.2
* Kubernetes [v1.19.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1192)
* Update flannel from v0.12.0 to v0.13.0-rc2 ([#216](https://github.com/poseidon/terraform-render-bootstrap/pull/216))
* Update flannel-cni from v0.4.0 to v0.4.1
* Update CNI plugins from v0.8.6 to v0.8.7
### Addons
* Refresh Prometheus rules/alerts and Grafana dashboards ([#831](https://github.com/poseidon/typhoon/pull/831))
* Reduce apiserver metrics cardinality for non-core APIs ([#830](https://github.com/poseidon/typhoon/pull/830))
## v1.19.1
* Kubernetes [v1.19.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191)
* Change control plane seccomp annotations to GA `seccompProfile` ([#822](https://github.com/poseidon/typhoon/pull/822))
* Update Cilium from v1.8.2 to [v1.8.3](https://github.com/cilium/cilium/releases/tag/v1.8.3)
* Promote Cilium from experimental to general availability ([#827](https://github.com/poseidon/typhoon/pull/827))
* Update Calico from v1.15.2 to [v1.15.3](https://github.com/projectcalico/calico/releases/tag/v3.15.3)
### Fedora CoreOS
* Update Fedora CoreOS Config version from v1.0.0 to v1.1.0
* Require any [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customizations to update to v1.1.0
### Addons
* Update IngressClass resources to `networking.k8s.io/v1` ([#824](https://github.com/poseidon/typhoon/pull/824))
* Update Prometheus from v2.20.0 to [v2.21.0](https://github.com/prometheus/prometheus/releases/tag/v2.21.0)
* Remove Kubernetes node name labelmap `relabel_config` from etcd, Kubelet, and CAdvisor scrape config ([#828](https://github.com/poseidon/typhoon/pull/828))
## v1.19.0
* Kubernetes [v1.19.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1190) * Kubernetes [v1.19.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1190)
* Update etcd from v3.4.10 to [v3.4.12](https://github.com/etcd-io/etcd/releases/tag/v3.4.12) * Update etcd from v3.4.10 to [v3.4.12](https://github.com/etcd-io/etcd/releases/tag/v3.4.12)
* Update Calico from v3.15.1 to [v3.15.2](https://docs.projectcalico.org/v3.15/release-notes/) * Update Calico from v3.15.1 to [v3.15.2](https://docs.projectcalico.org/v3.15/release-notes/)

View File

@ -11,10 +11,10 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.19.0 (upstream) * Kubernetes v1.19.4 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/cl/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/flatcar-linux/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, CSI, or other [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, CSI, or other [addons](https://typhoon.psdn.io/addons/overview/)
## Modules ## Modules
@ -35,11 +35,11 @@ Typhoon is available for [Flatcar Linux](https://www.flatcar-linux.org/releases/
| Platform | Operating System | Terraform Module | Status | | Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------| |---------------|------------------|------------------|--------|
| AWS | Flatcar Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable | | AWS | Flatcar Linux | [aws/flatcar-linux/kubernetes](aws/flatcar-linux/kubernetes) | stable |
| Azure | Flatcar Linux | [azure/container-linux/kubernetes](azure/container-linux/kubernetes) | alpha | | Azure | Flatcar Linux | [azure/flatcar-linux/kubernetes](azure/flatcar-linux/kubernetes) | alpha |
| Bare-Metal | Flatcar Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable | | Bare-Metal | Flatcar Linux | [bare-metal/flatcar-linux/kubernetes](bare-metal/flatcar-linux/kubernetes) | stable |
| DigitalOcean | Flatcar Linux | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta | | DigitalOcean | Flatcar Linux | [digital-ocean/flatcar-linux/kubernetes](digital-ocean/flatcar-linux/kubernetes) | beta |
| Google Cloud | Flatcar Linux | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | beta | | Google Cloud | Flatcar Linux | [google-cloud/flatcar-linux/kubernetes](google-cloud/flatcar-linux/kubernetes) | beta |
## Documentation ## Documentation
@ -54,7 +54,7 @@ Define a Kubernetes cluster by using the Terraform module for your chosen platfo
```tf ```tf
module "yavin" { module "yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=v1.19.0" source = "git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=v1.19.4"
# Google Cloud # Google Cloud
cluster_name = "yavin" cluster_name = "yavin"
@ -93,9 +93,9 @@ In 4-8 minutes (varies by platform), the cluster will be ready. This Google Clou
$ export KUBECONFIG=/home/user/.kube/configs/yavin-config $ export KUBECONFIG=/home/user/.kube/configs/yavin-config
$ kubectl get nodes $ kubectl get nodes
NAME ROLES STATUS AGE VERSION NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.19.0 yavin-controller-0.c.example-com.internal <none> Ready 6m v1.19.4
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.19.0 yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.19.4
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.19.0 yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.19.4
``` ```
List the pods. List the pods.

View File

@ -49,6 +49,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -140,6 +141,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -231,6 +233,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -335,6 +338,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -440,6 +444,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -544,6 +549,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -649,6 +655,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -767,6 +774,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -858,6 +866,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },

View File

@ -11,7 +11,6 @@ data:
"editable": true, "editable": true,
"gnetId": null, "gnetId": null,
"hideControls": false, "hideControls": false,
"id": 6,
"links": [ "links": [
], ],
@ -343,7 +342,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "etcd_debugging_mvcc_db_total_size_in_bytes{job=\"$cluster\"}", "expr": "etcd_mvcc_db_total_size_in_bytes{job=\"$cluster\"}",
"hide": false, "hide": false,
"interval": "", "interval": "",
"intervalFactor": 2, "intervalFactor": 2,

View File

@ -172,7 +172,7 @@ data:
"tableColumn": "", "tableColumn": "",
"targets": [ "targets": [
{ {
"expr": "sum(kubelet_running_pod_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})", "expr": "sum(kubelet_running_pods{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}}", "legendFormat": "{{instance}}",
@ -256,7 +256,7 @@ data:
"tableColumn": "", "tableColumn": "",
"targets": [ "targets": [
{ {
"expr": "sum(kubelet_running_container_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})", "expr": "sum(kubelet_running_containers{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}}", "legendFormat": "{{instance}}",
@ -565,6 +565,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -656,6 +657,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -760,6 +762,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -864,6 +867,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -962,6 +966,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1075,6 +1080,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1168,6 +1174,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1274,6 +1281,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1378,6 +1386,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1469,6 +1478,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1574,6 +1584,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1665,6 +1676,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1769,6 +1781,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1873,6 +1886,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1998,6 +2012,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -2021,7 +2036,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}} {{verb}} {{url}}", "legendFormat": "{{instance}} {{verb}} {{url}}",
@ -2102,6 +2117,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2193,6 +2209,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2284,6 +2301,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2470,7 +2488,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Kubelet", "title": "Kubernetes / Kubelet",
"uid": "3138fa155d5915769fbded898ac09fd9", "uid": "3138fa155d5915769fbded898ac09fd9",
"version": 0 "version": 0
@ -2607,6 +2625,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2698,6 +2717,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -2802,6 +2822,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2893,6 +2914,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -2997,6 +3019,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3109,6 +3132,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3132,7 +3156,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-proxy\",instance=~\"$instance\",verb=\"POST\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-proxy\",instance=~\"$instance\",verb=\"POST\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -3213,6 +3237,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -3236,7 +3261,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-proxy\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-proxy\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -3317,6 +3342,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3408,6 +3434,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3499,6 +3526,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3659,7 +3687,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Proxy", "title": "Kubernetes / Proxy",
"uid": "632e265de029684c40b21cb76bca4f94", "uid": "632e265de029684c40b21cb76bca4f94",
"version": 0 "version": 0

View File

@ -31,6 +31,7 @@ data:
"fill": 1, "fill": 1,
"format": "percentunit", "format": "percentunit",
"id": 1, "id": 1,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -686,6 +687,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1", "linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #A", "pattern": "Value #A",
@ -704,6 +706,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to workloads", "linkTooltip": "Drill down to workloads",
"linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1", "linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #B", "pattern": "Value #B",
@ -722,6 +725,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -740,6 +744,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -758,6 +763,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -776,6 +782,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -794,6 +801,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #G", "pattern": "Value #G",
@ -812,6 +820,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell", "linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace", "pattern": "namespace",
@ -839,7 +848,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "count(mixin_pod_workload{cluster=\"$cluster\"}) by (namespace)", "expr": "sum(kube_pod_owner{cluster=\"$cluster\"}) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -848,7 +857,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "count(avg(mixin_pod_workload{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)", "expr": "count(avg(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1105,6 +1114,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1", "linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #A", "pattern": "Value #A",
@ -1123,6 +1133,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to workloads", "linkTooltip": "Drill down to workloads",
"linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1", "linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #B", "pattern": "Value #B",
@ -1141,6 +1152,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -1159,6 +1171,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -1177,6 +1190,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -1195,6 +1209,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -1213,6 +1228,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #G", "pattern": "Value #G",
@ -1231,6 +1247,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell", "linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace", "pattern": "namespace",
@ -1258,7 +1275,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "count(mixin_pod_workload{cluster=\"$cluster\"}) by (namespace)", "expr": "sum(kube_pod_owner{cluster=\"$cluster\"}) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1267,7 +1284,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "count(avg(mixin_pod_workload{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)", "expr": "count(avg(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1384,6 +1401,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 11, "id": 11,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1426,6 +1444,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -1444,6 +1463,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -1462,6 +1482,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -1480,6 +1501,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -1498,6 +1520,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -1516,6 +1539,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -1534,6 +1558,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell", "linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace", "pattern": "namespace",
@ -2472,33 +2497,6 @@ data:
"regex": "", "regex": "",
"type": "datasource" "type": "datasource"
}, },
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(node_cpu_seconds_total, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{ {
"allValue": null, "allValue": null,
"current": { "current": {
@ -2557,7 +2555,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Cluster", "title": "Kubernetes / Compute Resources / Cluster",
"uid": "efa86fd1d0c121a26444b636a3f509a8", "uid": "efa86fd1d0c121a26444b636a3f509a8",
"version": 0 "version": 0
@ -2789,7 +2787,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"})", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\", image!=\"\"}) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"})",
"format": "time_series", "format": "time_series",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2873,7 +2871,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"})", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\", image!=\"\"}) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"})",
"format": "time_series", "format": "time_series",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3115,6 +3113,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -3133,6 +3132,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -3151,6 +3151,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -3169,6 +3170,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -3187,6 +3189,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -3205,6 +3208,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -3387,7 +3391,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}) by (pod)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3515,6 +3519,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -3533,6 +3538,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -3551,6 +3557,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -3569,6 +3576,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -3587,6 +3595,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -3605,6 +3614,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -3623,6 +3633,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #G", "pattern": "Value #G",
@ -3641,6 +3652,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #H", "pattern": "Value #H",
@ -3659,6 +3671,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -3686,7 +3699,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\", image!=\"\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3704,7 +3717,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"}) by (pod)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\", image!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3722,7 +3735,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"}) by (pod)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\", image!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3821,6 +3834,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 9, "id": 9,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -3863,6 +3877,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -3881,6 +3896,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -3899,6 +3915,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -3917,6 +3934,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -3935,6 +3953,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -3953,6 +3972,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -3971,6 +3991,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -4798,7 +4819,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Namespace (Pods)", "title": "Kubernetes / Compute Resources / Namespace (Pods)",
"uid": "85a562078cdf77779eaa1add43ccec1e", "uid": "85a562078cdf77779eaa1add43ccec1e",
"version": 0 "version": 0
@ -4861,7 +4882,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -4973,6 +4994,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -4991,6 +5013,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -5009,6 +5032,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -5027,6 +5051,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -5045,6 +5070,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -5063,6 +5089,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "pod", "pattern": "pod",
@ -5090,7 +5117,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5099,7 +5126,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5108,7 +5135,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=~\"$node\"}) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5117,7 +5144,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5126,7 +5153,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=~\"$node\"}) by (pod) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5226,7 +5253,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\", container!=\"\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=~\"$node\", container!=\"\"}) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -5338,6 +5365,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -5356,6 +5384,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -5374,6 +5403,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -5392,6 +5422,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -5410,6 +5441,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -5428,6 +5460,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -5446,6 +5479,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #G", "pattern": "Value #G",
@ -5464,6 +5498,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #H", "pattern": "Value #H",
@ -5482,6 +5517,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "pod", "pattern": "pod",
@ -5509,7 +5545,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5518,7 +5554,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5527,7 +5563,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5536,7 +5572,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5545,7 +5581,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5554,7 +5590,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_memory_rss{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_rss{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5563,7 +5599,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_memory_cache{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_cache{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5572,7 +5608,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_memory_swap{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_swap{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5691,7 +5727,7 @@ data:
"hide": 0, "hide": 0,
"includeAll": false, "includeAll": false,
"label": null, "label": null,
"multi": false, "multi": true,
"name": "node", "name": "node",
"options": [ "options": [
@ -5739,7 +5775,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Node (Pods)", "title": "Kubernetes / Compute Resources / Node (Pods)",
"uid": "200ac8fdbfbb74b39aff88118e4d1c2c", "uid": "200ac8fdbfbb74b39aff88118e4d1c2c",
"version": 0 "version": 0

View File

@ -189,7 +189,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(increase(container_cpu_cfs_throttled_periods_total{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", cluster=\"$cluster\"}[5m])) by (container) /sum(increase(container_cpu_cfs_periods_total{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", cluster=\"$cluster\"}[5m])) by (container)", "expr": "sum(increase(container_cpu_cfs_throttled_periods_total{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\", cluster=\"$cluster\"}[5m])) by (container) /sum(increase(container_cpu_cfs_periods_total{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\", cluster=\"$cluster\"}[5m])) by (container)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{container}}", "legendFormat": "{{container}}",
@ -203,7 +203,7 @@ data:
"fill": true, "fill": true,
"line": true, "line": true,
"op": "gt", "op": "gt",
"value": 1, "value": 0.80000000000000004,
"yaxis": "left" "yaxis": "left"
} }
], ],
@ -308,6 +308,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -326,6 +327,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -344,6 +346,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -362,6 +365,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -380,6 +384,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -398,6 +403,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "container", "pattern": "container",
@ -580,7 +586,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\", image!=\"\"}) by (container)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{container}}", "legendFormat": "{{container}}",
@ -708,6 +714,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -726,6 +733,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -744,6 +752,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -762,6 +771,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -780,6 +790,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -798,6 +809,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -816,6 +828,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #G", "pattern": "Value #G",
@ -834,6 +847,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #H", "pattern": "Value #H",
@ -852,6 +866,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "container", "pattern": "container",
@ -879,7 +894,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\", image!=\"\"}) by (container)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -897,7 +912,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", image!=\"\"}) by (container) / sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -915,7 +930,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\"}) by (container) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container) / sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1014,6 +1029,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 6, "id": 6,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1112,6 +1128,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 7, "id": 7,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1210,6 +1227,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 8, "id": 8,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1308,6 +1326,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 9, "id": 9,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1406,6 +1425,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 10, "id": 10,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1504,6 +1524,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 11, "id": 11,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1724,7 +1745,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Pod", "title": "Kubernetes / Compute Resources / Pod",
"uid": "6581e46e4e5c7ba40a07646395ef7b23", "uid": "6581e46e4e5c7ba40a07646395ef7b23",
"version": 0 "version": 0
@ -1787,7 +1808,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -1899,6 +1920,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -1917,6 +1939,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -1935,6 +1958,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -1953,6 +1977,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -1971,6 +1996,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -1989,6 +2015,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -2016,7 +2043,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2025,7 +2052,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2034,7 +2061,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2043,7 +2070,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2052,7 +2079,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2152,7 +2179,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -2264,6 +2291,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -2282,6 +2310,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -2300,6 +2329,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -2318,6 +2348,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -2336,6 +2367,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -2354,6 +2386,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -2381,7 +2414,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2390,7 +2423,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2399,7 +2432,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2408,7 +2441,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2417,7 +2450,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2489,6 +2522,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 5, "id": 5,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -2531,6 +2565,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -2549,6 +2584,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -2567,6 +2603,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -2585,6 +2622,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -2603,6 +2641,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -2621,6 +2660,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -2639,6 +2679,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -2666,7 +2707,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2675,7 +2716,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2684,7 +2725,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2693,7 +2734,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2702,7 +2743,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2711,7 +2752,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2811,7 +2852,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -2909,7 +2950,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3007,7 +3048,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3105,7 +3146,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3203,7 +3244,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3301,7 +3342,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3399,7 +3440,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3497,7 +3538,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3646,7 +3687,7 @@ data:
"options": [ "options": [
], ],
"query": "label_values(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}, workload)", "query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\"}, workload)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 1, "sort": 1,
@ -3673,7 +3714,7 @@ data:
"options": [ "options": [
], ],
"query": "label_values(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\"}, workload_type)", "query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\"}, workload_type)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 1, "sort": 1,
@ -3716,7 +3757,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Workload", "title": "Kubernetes / Compute Resources / Workload",
"uid": "a164a7f0339f99e89cea5cb47e9be617", "uid": "a164a7f0339f99e89cea5cb47e9be617",
"version": 0 "version": 0
@ -3798,7 +3839,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}} - {{workload_type}}", "legendFormat": "{{workload}} - {{workload_type}}",
@ -3926,6 +3967,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -3944,6 +3986,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -3962,6 +4005,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -3980,6 +4024,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -3998,6 +4043,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -4016,6 +4062,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -4034,6 +4081,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2", "linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"pattern": "workload", "pattern": "workload",
@ -4052,6 +4100,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "workload_type", "pattern": "workload_type",
@ -4079,7 +4128,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "count(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)", "expr": "count(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4088,7 +4137,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4097,7 +4146,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4106,7 +4155,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4115,7 +4164,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4124,7 +4173,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4243,7 +4292,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}} - {{workload_type}}", "legendFormat": "{{workload}} - {{workload_type}}",
@ -4371,6 +4420,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -4389,6 +4439,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -4407,6 +4458,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -4425,6 +4477,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -4443,6 +4496,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -4461,6 +4515,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -4479,6 +4534,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2", "linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"pattern": "workload", "pattern": "workload",
@ -4497,6 +4553,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "workload_type", "pattern": "workload_type",
@ -4524,7 +4581,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "count(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)", "expr": "count(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4533,7 +4590,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4542,7 +4599,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4551,7 +4608,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4560,7 +4617,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4569,7 +4626,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4641,6 +4698,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 5, "id": 5,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -4683,6 +4741,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -4701,6 +4760,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -4719,6 +4779,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -4737,6 +4798,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -4755,6 +4817,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -4773,6 +4836,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -4791,6 +4855,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$type", "linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$type",
"pattern": "workload", "pattern": "workload",
@ -4809,6 +4874,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "workload_type", "pattern": "workload_type",
@ -4836,7 +4902,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4845,7 +4911,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4854,7 +4920,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4863,7 +4929,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4872,7 +4938,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4881,7 +4947,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4981,7 +5047,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5079,7 +5145,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5177,7 +5243,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5275,7 +5341,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5373,7 +5439,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5471,7 +5537,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5569,7 +5635,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5667,7 +5733,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod) \ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5757,7 +5823,7 @@ data:
"value": "deployment" "value": "deployment"
}, },
"datasource": "$datasource", "datasource": "$datasource",
"definition": "label_values(mixin_pod_workload{namespace=~\"$namespace\", workload=~\".+\"}, workload_type)", "definition": "label_values(namespace_workload_pod:kube_pod_owner:relabel{namespace=~\"$namespace\", workload=~\".+\"}, workload_type)",
"hide": 0, "hide": 0,
"includeAll": false, "includeAll": false,
"label": null, "label": null,
@ -5766,7 +5832,7 @@ data:
"options": [ "options": [
], ],
"query": "label_values(mixin_pod_workload{namespace=~\"$namespace\", workload=~\".+\"}, workload_type)", "query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{namespace=~\"$namespace\", workload=~\".+\"}, workload_type)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"skipUrlSync": false, "skipUrlSync": false,
@ -5864,7 +5930,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Namespace (Workloads)", "title": "Kubernetes / Compute Resources / Namespace (Workloads)",
"uid": "a87fb0d919ec0ea5f6543124e16c42a5", "uid": "a87fb0d919ec0ea5f6543124e16c42a5",
"version": 0 "version": 0

View File

@ -20,6 +20,24 @@ data:
"id": null, "id": null,
"links": [ "links": [
],
"panels": [
{
"content": "The SLO (service level objective) and other metrics displayed on this dashboard are for informational purposes only.",
"datasource": null,
"description": "The SLO (service level objective) and other metrics displayed on this dashboard are for informational purposes only.",
"gridPos": {
"h": 2,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"mode": "markdown",
"span": 12,
"title": "Notice",
"type": "text"
}
], ],
"refresh": "10s", "refresh": "10s",
"rows": [ "rows": [
@ -37,7 +55,9 @@ data:
"#d44a3a" "#d44a3a"
], ],
"datasource": "$datasource", "datasource": "$datasource",
"format": "none", "decimals": 3,
"description": "How many percent of requests (both read and write) in 30 days have been answered successfully and fast enough?",
"format": "percentunit",
"gauge": { "gauge": {
"maxValue": 100, "maxValue": 100,
"minValue": 0, "minValue": 0,
@ -48,7 +68,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 2, "id": 3,
"interval": null, "interval": null,
"links": [ "links": [
@ -78,7 +98,7 @@ data:
"to": "null" "to": "null"
} }
], ],
"span": 2, "span": 4,
"sparkline": { "sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)", "fillColor": "rgba(31, 118, 189, 0.18)",
"full": false, "full": false,
@ -88,7 +108,7 @@ data:
"tableColumn": "", "tableColumn": "",
"targets": [ "targets": [
{ {
"expr": "sum(up{job=\"apiserver\", cluster=\"$cluster\"})", "expr": "apiserver_request:availability30d{verb=\"all\", cluster=\"$cluster\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "", "legendFormat": "",
@ -96,7 +116,7 @@ data:
} }
], ],
"thresholds": "", "thresholds": "",
"title": "Up", "title": "Availability (30d) > 99.000%",
"tooltip": { "tooltip": {
"shared": false "shared": false
}, },
@ -109,7 +129,7 @@ data:
"value": "null" "value": "null"
} }
], ],
"valueName": "min" "valueName": "avg"
}, },
{ {
"aliasColors": { "aliasColors": {
@ -119,11 +139,13 @@ data:
"dashLength": 10, "dashLength": 10,
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "decimals": 3,
"description": "How much error budget is left looking at our 0.990% availability gurantees?",
"fill": 10,
"gridPos": { "gridPos": {
}, },
"id": 3, "id": 4,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -132,6 +154,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -150,37 +173,16 @@ data:
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 5, "span": 8,
"stack": false, "stack": false,
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"2..\", cluster=\"$cluster\"}[5m]))", "expr": "100 * (apiserver_request:availability30d{verb=\"all\", cluster=\"$cluster\"} - 0.990000)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "2xx", "legendFormat": "errorbudget",
"refId": "A" "refId": "A"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"3..\", cluster=\"$cluster\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "3xx",
"refId": "B"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"4..\", cluster=\"$cluster\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "4xx",
"refId": "C"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"5..\", cluster=\"$cluster\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5xx",
"refId": "D"
} }
], ],
"thresholds": [ "thresholds": [
@ -188,7 +190,7 @@ data:
], ],
"timeFrom": null, "timeFrom": null,
"timeShift": null, "timeShift": null,
"title": "RPC Rate", "title": "ErrorBudget (30d) > 99.000%",
"tooltip": { "tooltip": {
"shared": false, "shared": false,
"sort": 0, "sort": 0,
@ -206,7 +208,8 @@ data:
}, },
"yaxes": [ "yaxes": [
{ {
"format": "ops", "decimals": 3,
"format": "percentunit",
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
@ -214,7 +217,215 @@ data:
"show": true "show": true
}, },
{ {
"format": "ops", "decimals": 3,
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"decimals": 3,
"description": "How many percent of read requests (LIST,GET) in 30 days have been answered successfully and fast enough?",
"format": "percentunit",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "apiserver_request:availability30d{verb=\"read\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Read Availability (30d)",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many read requests (LIST,GET) per second do the apiservers get by code?",
"fill": 10,
"gridPos": {
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "/2../i",
"color": "#56A64B"
},
{
"alias": "/3../i",
"color": "#F2CC0C"
},
{
"alias": "/4../i",
"color": "#3274D9"
},
{
"alias": "/5../i",
"color": "#E02F44"
}
],
"spaceLength": 10,
"span": 3,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (code) (code_resource:apiserver_request_total:rate5m{verb=\"read\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ code }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Read SLI - Requests",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "reqps",
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
@ -231,21 +442,23 @@ data:
"dashLength": 10, "dashLength": 10,
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"description": "How many percent of read requests (LIST,GET) per second are returned with errors (5xx)?",
"fill": 1, "fill": 1,
"gridPos": { "gridPos": {
}, },
"id": 4, "id": 7,
"legend": { "legend": {
"alignAsTable": true, "alignAsTable": false,
"avg": false, "avg": false,
"current": true, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": true, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": false
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -262,15 +475,15 @@ data:
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 5, "span": 3,
"stack": false, "stack": false,
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\", verb!=\"WATCH\", cluster=\"$cluster\"}[5m])) by (verb, le))", "expr": "sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"read\",code=~\"5..\", cluster=\"$cluster\"}) / sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"read\", cluster=\"$cluster\"})",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}}", "legendFormat": "{{ resource }}",
"refId": "A" "refId": "A"
} }
], ],
@ -279,7 +492,493 @@ data:
], ],
"timeFrom": null, "timeFrom": null,
"timeShift": null, "timeShift": null,
"title": "Request duration 99th quantile", "title": "Read SLI - Errors",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many seconds is the 99th percentile for reading (LIST|GET) a given resource?",
"fill": 1,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{verb=\"read\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ resource }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Read SLI - Duration",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"decimals": 3,
"description": "How many percent of write requests (POST|PUT|PATCH|DELETE) in 30 days have been answered successfully and fast enough?",
"format": "percentunit",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 9,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "apiserver_request:availability30d{verb=\"write\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Write Availability (30d)",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many write requests (POST|PUT|PATCH|DELETE) per second do the apiservers get by code?",
"fill": 10,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "/2../i",
"color": "#56A64B"
},
{
"alias": "/3../i",
"color": "#F2CC0C"
},
{
"alias": "/4../i",
"color": "#3274D9"
},
{
"alias": "/5../i",
"color": "#E02F44"
}
],
"spaceLength": 10,
"span": 3,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (code) (code_resource:apiserver_request_total:rate5m{verb=\"write\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ code }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Write SLI - Requests",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many percent of write requests (POST|PUT|PATCH|DELETE) per second are returned with errors (5xx)?",
"fill": 1,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"write\",code=~\"5..\", cluster=\"$cluster\"}) / sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"write\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ resource }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Write SLI - Errors",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many seconds is the 99th percentile for writing (POST|PUT|PATCH|DELETE) a given resource?",
"fill": 1,
"gridPos": {
},
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{verb=\"write\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ resource }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Write SLI - Duration",
"tooltip": { "tooltip": {
"shared": false, "shared": false,
"sort": 0, "sort": 0,
@ -339,7 +1038,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 5, "id": 13,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -348,6 +1047,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": false, "show": false,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -430,7 +1130,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 6, "id": 14,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -439,6 +1139,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": false, "show": false,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -521,7 +1222,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 7, "id": 15,
"legend": { "legend": {
"alignAsTable": true, "alignAsTable": true,
"avg": false, "avg": false,
@ -530,6 +1231,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -625,307 +1327,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 8, "id": 16,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "etcd_helper_cache_entry_total{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ETCD Cache Entry Total",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_helper_cache_hit_total{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} hit",
"refId": "A"
},
{
"expr": "sum(rate(etcd_helper_cache_miss_total{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} miss",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ETCD Cache Hit/Miss Rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99,sum(rate(etcd_request_cache_get_duration_seconds_bucket{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} get",
"refId": "A"
},
{
"expr": "histogram_quantile(0.99,sum(rate(etcd_request_cache_add_duration_seconds_bucket{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} miss",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ETCD Cache Duration 99th Quantile",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 11,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -934,6 +1336,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1016,7 +1419,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 12, "id": 17,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -1025,6 +1428,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1107,7 +1511,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 13, "id": 18,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -1116,6 +1520,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1222,20 +1627,19 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"text": "prod",
"value": "prod"
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 2, "hide": 2,
"includeAll": false, "includeAll": false,
"label": null, "label": "cluster",
"multi": false, "multi": false,
"name": "cluster", "name": "cluster",
"options": [ "options": [
], ],
"query": "label_values(apiserver_request_total, cluster)", "query": "label_values(apiserver_request_total, cluster)",
"refresh": 1, "refresh": 2,
"regex": "", "regex": "",
"sort": 1, "sort": 1,
"tagValuesQuery": "", "tagValuesQuery": "",
@ -1303,7 +1707,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / API server", "title": "Kubernetes / API server",
"uid": "09ec8aa1e996d6ffcd6817bbaff4db1b", "uid": "09ec8aa1e996d6ffcd6817bbaff4db1b",
"version": 0 "version": 0
@ -1440,6 +1844,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1544,6 +1949,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1648,6 +2054,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1752,6 +2159,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1864,6 +2272,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1887,7 +2296,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"POST\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"POST\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -1968,6 +2377,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -1991,7 +2401,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -2072,6 +2482,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2163,6 +2574,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2254,6 +2666,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2414,7 +2827,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Controller Manager", "title": "Kubernetes / Controller Manager",
"uid": "72e0e05bef5099e5f049b05fdc429ed4", "uid": "72e0e05bef5099e5f049b05fdc429ed4",
"version": 0 "version": 0
@ -2467,6 +2880,7 @@ data:
"min": true, "min": true,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -2662,6 +3076,7 @@ data:
"min": true, "min": true,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -2965,7 +3380,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Persistent Volumes", "title": "Kubernetes / Persistent Volumes",
"uid": "919b92a8e8041bd567af9edab12c840c", "uid": "919b92a8e8041bd567af9edab12c840c",
"version": 0 "version": 0
@ -3102,6 +3517,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -3214,6 +3630,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -3339,6 +3756,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3451,6 +3869,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3474,7 +3893,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-scheduler\", instance=~\"$instance\", verb=\"POST\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-scheduler\", instance=~\"$instance\", verb=\"POST\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -3555,6 +3974,7 @@ data:
"min": false, "min": false,
"rightSide": true, "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -3578,7 +3998,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-scheduler\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-scheduler\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -3659,6 +4079,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3750,6 +4171,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3841,6 +4263,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -4001,11 +4424,916 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Scheduler", "title": "Kubernetes / Scheduler",
"uid": "2e6b6a3b4bddf1427b3a55aa1311c656", "uid": "2e6b6a3b4bddf1427b3a55aa1311c656",
"version": 0 "version": 0
} }
statefulset.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 2,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "cores",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$statefulset.*\"}[3m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "CPU",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 3,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "GB",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$statefulset.*\"}) / 1024^3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Memory",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 4,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "Bps",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$statefulset.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\",pod=~\"$statefulset.*\"}[3m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Network",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"height": "100px",
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_replicas{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Desired Replicas",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 6,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "min(kube_statefulset_status_replicas_current{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Replicas of current version",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 7,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_status_observed_generation{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Observed Generation",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 8,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Metadata Generation",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(kube_statefulset_replicas{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas specified",
"refId": "A"
},
{
"expr": "max(kube_statefulset_status_replicas{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas created",
"refId": "B"
},
{
"expr": "min(kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "ready",
"refId": "C"
},
{
"expr": "min(kube_statefulset_status_replicas_current{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas of current version",
"refId": "D"
},
{
"expr": "min(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "updated",
"refId": "E"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Replicas",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Name",
"multi": false,
"name": "statefulset",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\"}, statefulset)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "UTC",
"title": "Kubernetes / StatefulSets",
"uid": "a31c1f46e6f727cb37c0d731a7245005",
"version": 0
}
kind: ConfigMap kind: ConfigMap
metadata: metadata:
name: grafana-dashboards-k8s name: grafana-dashboards-k8s

View File

@ -308,6 +308,7 @@ data:
"min": false, "min": false,
"rightSide": "true", "rightSide": "true",
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -399,6 +400,7 @@ data:
"min": false, "min": false,
"rightSide": "true", "rightSide": "true",
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -503,6 +505,7 @@ data:
"min": false, "min": false,
"rightSide": "true", "rightSide": "true",
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -621,6 +624,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -719,6 +723,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -810,6 +815,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },

View File

@ -48,6 +48,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -140,6 +141,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -265,6 +267,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -471,6 +474,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -586,6 +590,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -704,6 +709,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -796,6 +802,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },

View File

@ -48,6 +48,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -71,10 +72,10 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(\n prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"} \n- \n ignoring(queue) group_right(instance) prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}\n)\n", "expr": "(\n prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"} \n- \n ignoring(remote_name, url) group_right(instance) prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}\n)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -139,6 +140,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -162,10 +164,10 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(\n rate(prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) \n- \n ignoring (queue) group_right(instance) rate(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n)\n", "expr": "(\n rate(prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) \n- \n ignoring (remote_name, url) group_right(instance) rate(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -243,6 +245,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -266,10 +269,10 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "rate(\n prometheus_remote_storage_samples_in_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n- \n ignoring(queue) group_right(instance) rate(prometheus_remote_storage_succeeded_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) \n- \n rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n", "expr": "rate(\n prometheus_remote_storage_samples_in_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n- \n ignoring(remote_name, url) group_right(instance) rate(prometheus_remote_storage_succeeded_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n- \n rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -347,6 +350,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -374,7 +378,7 @@ data:
"expr": "prometheus_remote_storage_shards{cluster=~\"$cluster\", instance=~\"$instance\"}", "expr": "prometheus_remote_storage_shards{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -439,6 +443,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -465,7 +470,7 @@ data:
"expr": "prometheus_remote_storage_shards_max{cluster=~\"$cluster\", instance=~\"$instance\"}", "expr": "prometheus_remote_storage_shards_max{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -530,6 +535,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -556,7 +562,7 @@ data:
"expr": "prometheus_remote_storage_shards_min{cluster=~\"$cluster\", instance=~\"$instance\"}", "expr": "prometheus_remote_storage_shards_min{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -621,6 +627,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -647,7 +654,7 @@ data:
"expr": "prometheus_remote_storage_shards_desired{cluster=~\"$cluster\", instance=~\"$instance\"}", "expr": "prometheus_remote_storage_shards_desired{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -725,6 +732,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -751,7 +759,7 @@ data:
"expr": "prometheus_remote_storage_shard_capacity{cluster=~\"$cluster\", instance=~\"$instance\"}", "expr": "prometheus_remote_storage_shard_capacity{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -816,6 +824,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -842,7 +851,7 @@ data:
"expr": "prometheus_remote_storage_pending_samples{cluster=~\"$cluster\", instance=~\"$instance\"}", "expr": "prometheus_remote_storage_pending_samples{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -920,6 +929,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1011,6 +1021,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1037,7 +1048,7 @@ data:
"expr": "prometheus_wal_watcher_current_segment{cluster=~\"$cluster\", instance=~\"$instance\"}", "expr": "prometheus_wal_watcher_current_segment{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{consumer}}",
"refId": "A" "refId": "A"
} }
], ],
@ -1115,6 +1126,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1141,7 +1153,7 @@ data:
"expr": "rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])", "expr": "rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -1206,6 +1218,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1232,7 +1245,7 @@ data:
"expr": "rate(prometheus_remote_storage_failed_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])", "expr": "rate(prometheus_remote_storage_failed_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -1297,6 +1310,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1323,7 +1337,7 @@ data:
"expr": "rate(prometheus_remote_storage_retried_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])", "expr": "rate(prometheus_remote_storage_retried_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -1388,6 +1402,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1414,7 +1429,7 @@ data:
"expr": "rate(prometheus_remote_storage_enqueue_retries_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])", "expr": "rate(prometheus_remote_storage_enqueue_retries_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A" "refId": "A"
} }
], ],
@ -1567,11 +1582,11 @@ data:
"includeAll": true, "includeAll": true,
"label": null, "label": null,
"multi": false, "multi": false,
"name": "queue", "name": "url",
"options": [ "options": [
], ],
"query": "label_values(prometheus_remote_storage_shards{cluster=~\"$cluster\", instance=~\"$instance\"}, queue)", "query": "label_values(prometheus_remote_storage_shards{cluster=~\"$cluster\", instance=~\"$instance\"}, url)",
"refresh": 2, "refresh": 2,
"regex": "", "regex": "",
"sort": 0, "sort": 0,
@ -1690,6 +1705,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -1708,6 +1724,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -1726,6 +1743,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "instance", "pattern": "instance",
@ -1744,6 +1762,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "job", "pattern": "job",
@ -1762,6 +1781,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "version", "pattern": "version",
@ -2814,7 +2834,7 @@ data:
] ]
}, },
"timezone": "utc", "timezone": "utc",
"title": "Prometheus", "title": "Prometheus Overview",
"uid": "", "uid": "",
"version": 0 "version": 0
} }

View File

@ -18,12 +18,13 @@ spec:
labels: labels:
name: grafana name: grafana
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: grafana - name: grafana
image: docker.io/grafana/grafana:7.1.5 image: docker.io/grafana/grafana:7.3.2
env: env:
- name: GF_PATHS_CONFIG - name: GF_PATHS_CONFIG
value: "/etc/grafana/custom.ini" value: "/etc/grafana/custom.ini"

View File

@ -1,4 +1,4 @@
apiVersion: networking.k8s.io/v1beta1 apiVersion: networking.k8s.io/v1
kind: IngressClass kind: IngressClass
metadata: metadata:
name: public name: public

View File

@ -17,12 +17,13 @@ spec:
labels: labels:
name: nginx-ingress-controller name: nginx-ingress-controller
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: k8s.gcr.io/ingress-nginx/controller:v0.35.0 image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --ingress-class=public - --ingress-class=public

View File

@ -1,4 +1,4 @@
apiVersion: networking.k8s.io/v1beta1 apiVersion: networking.k8s.io/v1
kind: IngressClass kind: IngressClass
metadata: metadata:
name: public name: public

View File

@ -17,12 +17,13 @@ spec:
labels: labels:
name: nginx-ingress-controller name: nginx-ingress-controller
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: k8s.gcr.io/ingress-nginx/controller:v0.35.0 image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --ingress-class=public - --ingress-class=public

View File

@ -1,4 +1,4 @@
apiVersion: networking.k8s.io/v1beta1 apiVersion: networking.k8s.io/v1
kind: IngressClass kind: IngressClass
metadata: metadata:
name: public name: public

View File

@ -17,12 +17,13 @@ spec:
labels: labels:
name: nginx-ingress-controller name: nginx-ingress-controller
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: k8s.gcr.io/ingress-nginx/controller:v0.35.0 image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --ingress-class=public - --ingress-class=public

View File

@ -1,4 +1,4 @@
apiVersion: networking.k8s.io/v1beta1 apiVersion: networking.k8s.io/v1
kind: IngressClass kind: IngressClass
metadata: metadata:
name: public name: public

View File

@ -17,12 +17,13 @@ spec:
labels: labels:
name: nginx-ingress-controller name: nginx-ingress-controller
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: k8s.gcr.io/ingress-nginx/controller:v0.35.0 image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --ingress-class=public - --ingress-class=public

View File

@ -1,4 +1,4 @@
apiVersion: networking.k8s.io/v1beta1 apiVersion: networking.k8s.io/v1
kind: IngressClass kind: IngressClass
metadata: metadata:
name: public name: public

View File

@ -17,12 +17,13 @@ spec:
labels: labels:
name: nginx-ingress-controller name: nginx-ingress-controller
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: k8s.gcr.io/ingress-nginx/controller:v0.35.0 image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --ingress-class=public - --ingress-class=public

View File

@ -34,7 +34,7 @@ data:
- job_name: 'kubernetes-apiservers' - job_name: 'kubernetes-apiservers'
kubernetes_sd_configs: kubernetes_sd_configs:
- role: endpoints - role: endpoints
scheme: https scheme: https
tls_config: tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
@ -68,13 +68,16 @@ data:
- source_labels: [__name__, group] - source_labels: [__name__, group]
regex: apiserver_request_duration_seconds_bucket;.+ regex: apiserver_request_duration_seconds_bucket;.+
action: drop action: drop
- source_labels: [__name__, group]
regex: apiserver_request_duration_seconds_count;.+
action: drop
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore # Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
# metrics from a node by scraping kubelet (127.0.0.1:10250/metrics). # metrics from a node by scraping kubelet (127.0.0.1:10250/metrics).
- job_name: 'kubelet' - job_name: 'kubelet'
kubernetes_sd_configs: kubernetes_sd_configs:
- role: node - role: node
scheme: https scheme: https
tls_config: tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
@ -82,10 +85,6 @@ data:
insecure_skip_verify: true insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_name
# Scrape config for Kubelet cAdvisor. Explore metrics from a node by # Scrape config for Kubelet cAdvisor. Explore metrics from a node by
# scraping kubelet (127.0.0.1:10250/metrics/cadvisor). # scraping kubelet (127.0.0.1:10250/metrics/cadvisor).
- job_name: 'kubernetes-cadvisor' - job_name: 'kubernetes-cadvisor'
@ -100,9 +99,6 @@ data:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_name
metric_relabel_configs: metric_relabel_configs:
- source_labels: [__name__, image] - source_labels: [__name__, image]
action: drop action: drop
@ -121,13 +117,11 @@ data:
- source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_controller] - source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_controller]
action: keep action: keep
regex: 'true' regex: 'true'
- action: labelmap
regex: __meta_kubernetes_node_name
- source_labels: [__meta_kubernetes_node_address_InternalIP] - source_labels: [__meta_kubernetes_node_address_InternalIP]
action: replace action: replace
target_label: __address__ target_label: __address__
replacement: '${1}:2381' replacement: '${1}:2381'
# Scrape config for service endpoints. # Scrape config for service endpoints.
# #
# The relabeling allows the actual service scrape endpoint to be configured # The relabeling allows the actual service scrape endpoint to be configured
@ -172,7 +166,7 @@ data:
- source_labels: [__meta_kubernetes_service_name] - source_labels: [__meta_kubernetes_service_name]
action: replace action: replace
target_label: job target_label: job
metric_relabel_configs: metric_relabel_configs:
- source_labels: [__name__] - source_labels: [__name__]
action: drop action: drop

View File

@ -14,13 +14,14 @@ spec:
labels: labels:
name: prometheus name: prometheus
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
serviceAccountName: prometheus serviceAccountName: prometheus
containers: containers:
- name: prometheus - name: prometheus
image: quay.io/prometheus/prometheus:v2.20.0 image: quay.io/prometheus/prometheus:v2.22.1
args: args:
- --web.listen-address=0.0.0.0:9090 - --web.listen-address=0.0.0.0:9090
- --config.file=/etc/prometheus/prometheus.yaml - --config.file=/etc/prometheus/prometheus.yaml

View File

@ -78,13 +78,6 @@ rules:
verbs: verbs:
- list - list
- watch - watch
- apiGroups:
- autoscaling.k8s.io
resources:
- verticalpodautoscalers
verbs:
- list
- watch
- apiGroups: - apiGroups:
- admissionregistration.k8s.io - admissionregistration.k8s.io
resources: resources:
@ -97,6 +90,14 @@ rules:
- networking.k8s.io - networking.k8s.io
resources: resources:
- networkpolicies - networkpolicies
- ingresses
verbs:
- list
- watch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs: verbs:
- list - list
- watch - watch

View File

@ -18,16 +18,19 @@ spec:
labels: labels:
name: kube-state-metrics name: kube-state-metrics
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
serviceAccountName: kube-state-metrics serviceAccountName: kube-state-metrics
containers: containers:
- name: kube-state-metrics - name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.9.7 image: quay.io/coreos/kube-state-metrics:v2.0.0-alpha.2
ports: ports:
- name: metrics - name: metrics
containerPort: 8080 containerPort: 8080
- name: telemetry
containerPort: 8081
livenessProbe: livenessProbe:
httpGet: httpGet:
path: /healthz path: /healthz
@ -40,3 +43,5 @@ spec:
port: 8081 port: 8081
initialDelaySeconds: 5 initialDelaySeconds: 5
timeoutSeconds: 5 timeoutSeconds: 5
securityContext:
runAsUser: 65534

View File

@ -17,13 +17,13 @@ spec:
labels: labels:
name: node-exporter name: node-exporter
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
serviceAccountName: node-exporter serviceAccountName: node-exporter
securityContext: securityContext:
runAsNonRoot: true runAsNonRoot: true
runAsUser: 65534 runAsUser: 65534
seccompProfile:
type: RuntimeDefault
hostNetwork: true hostNetwork: true
hostPID: true hostPID: true
containers: containers:

View File

@ -11,8 +11,8 @@ data:
"annotations": { "annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": members are down ({{ $value }})." "message": "etcd cluster \"{{ $labels.job }}\": members are down ({{ $value }})."
}, },
"expr": "max by (job) (\n sum by (job) (up{job=~\".*etcd.*\"} == bool 0)\nor\n count by (job,endpoint) (\n sum by (job,endpoint,To) (rate(etcd_network_peer_sent_failures_total{job=~\".*etcd.*\"}[3m])) > 0.01\n )\n)\n> 0\n", "expr": "max without (endpoint) (\n sum without (instance) (up{job=~\".*etcd.*\"} == bool 0)\nor\n count without (To) (\n sum without (instance) (rate(etcd_network_peer_sent_failures_total{job=~\".*etcd.*\"}[120s])) > 0.01\n )\n)\n> 0\n",
"for": "3m", "for": "10m",
"labels": { "labels": {
"severity": "critical" "severity": "critical"
} }
@ -22,7 +22,7 @@ data:
"annotations": { "annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": insufficient members ({{ $value }})." "message": "etcd cluster \"{{ $labels.job }}\": insufficient members ({{ $value }})."
}, },
"expr": "sum(up{job=~\".*etcd.*\"} == bool 1) by (job) < ((count(up{job=~\".*etcd.*\"}) by (job) + 1) / 2)\n", "expr": "sum(up{job=~\".*etcd.*\"} == bool 1) without (instance) < ((count(up{job=~\".*etcd.*\"}) without (instance) + 1) / 2)\n",
"for": "3m", "for": "3m",
"labels": { "labels": {
"severity": "critical" "severity": "critical"
@ -44,7 +44,7 @@ data:
"annotations": { "annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": {{ $value }} leader changes within the last 15 minutes. Frequent elections may be a sign of insufficient resources, high network latency, or disruptions by other components and should be investigated." "message": "etcd cluster \"{{ $labels.job }}\": {{ $value }} leader changes within the last 15 minutes. Frequent elections may be a sign of insufficient resources, high network latency, or disruptions by other components and should be investigated."
}, },
"expr": "increase((max by (job) (etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}) or 0*absent(etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}))[15m:1m]) >= 3\n", "expr": "increase((max without (instance) (etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}) or 0*absent(etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}))[15m:1m]) >= 4\n",
"for": "5m", "for": "5m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -55,7 +55,7 @@ data:
"annotations": { "annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": gRPC requests to {{ $labels.grpc_method }} are taking {{ $value }}s on etcd instance {{ $labels.instance }}." "message": "etcd cluster \"{{ $labels.job }}\": gRPC requests to {{ $labels.grpc_method }} are taking {{ $value }}s on etcd instance {{ $labels.instance }}."
}, },
"expr": "histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job=~\".*etcd.*\", grpc_type=\"unary\"}[5m])) by (job, instance, grpc_service, grpc_method, le))\n> 0.15\n", "expr": "histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job=~\".*etcd.*\", grpc_type=\"unary\"}[5m])) without(grpc_type))\n> 0.15\n",
"for": "10m", "for": "10m",
"labels": { "labels": {
"severity": "critical" "severity": "critical"
@ -110,7 +110,7 @@ data:
"annotations": { "annotations": {
"message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}" "message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}"
}, },
"expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nBY (method) > 0.01\n", "expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) without (code) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nwithout (code) > 0.01\n",
"for": "10m", "for": "10m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -121,7 +121,7 @@ data:
"annotations": { "annotations": {
"message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}." "message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}."
}, },
"expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nBY (method) > 0.05\n", "expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) without (code) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nwithout (code) > 0.05\n",
"for": "10m", "for": "10m",
"labels": { "labels": {
"severity": "critical" "severity": "critical"
@ -145,112 +145,137 @@ data:
kube.yaml: |- kube.yaml: |-
{ {
"groups": [ "groups": [
{
"name": "kube-apiserver-error",
"rules": [
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[5m]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate5m"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[30m]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate30m"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[1h]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate1h"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[2h]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate2h"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[6h]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate6h"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[1d]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate1d"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[3d]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate3d"
},
{
"expr": "sum(status_class:apiserver_request_total:rate5m{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate5m{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate5m"
},
{
"expr": "sum(status_class:apiserver_request_total:rate30m{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate30m{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate30m"
},
{
"expr": "sum(status_class:apiserver_request_total:rate1h{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate1h{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate1h"
},
{
"expr": "sum(status_class:apiserver_request_total:rate2h{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate2h{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate2h"
},
{
"expr": "sum(status_class:apiserver_request_total:rate6h{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate6h{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate6h"
},
{
"expr": "sum(status_class:apiserver_request_total:rate1d{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate1d{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate1d"
},
{
"expr": "sum(status_class:apiserver_request_total:rate3d{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate3d{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate3d"
}
]
},
{ {
"name": "kube-apiserver.rules", "name": "kube-apiserver.rules",
"rules": [ "rules": [
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[1d]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[1d]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[1d]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[1d]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[1d]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[1d]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate1d"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[1h]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[1h]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[1h]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[1h]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[1h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[1h]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate1h"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[2h]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[2h]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[2h]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[2h]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[2h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[2h]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate2h"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[30m]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[30m]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[30m]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[30m]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[30m]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[30m]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate30m"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[3d]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[3d]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[3d]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[3d]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate3d"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[5m]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[5m]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[5m]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[5m]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[5m]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[5m]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate5m"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[6h]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[6h]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[6h]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[6h]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[6h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[6h]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate6h"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[1d]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[1d]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate1d"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1h]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[1h]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[1h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1h]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate1h"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[2h]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[2h]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[2h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[2h]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate2h"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[30m]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[30m]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[30m]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[30m]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate30m"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[3d]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[3d]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[3d]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[3d]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate3d"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[5m]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[5m]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[5m]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[5m]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate5m"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[6h]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[6h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate6h"
},
{
"expr": "sum by (code,resource) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[5m]))\n",
"labels": {
"verb": "read"
},
"record": "code_resource:apiserver_request_total:rate5m"
},
{
"expr": "sum by (code,resource) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[5m]))\n",
"labels": {
"verb": "write"
},
"record": "code_resource:apiserver_request_total:rate5m"
},
{
"expr": "histogram_quantile(0.99, sum by (le, resource) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\"}[5m]))) > 0\n",
"labels": {
"quantile": "0.99",
"verb": "read"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.99, sum by (le, resource) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[5m]))) > 0\n",
"labels": {
"quantile": "0.99",
"verb": "write"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{ {
"expr": "sum(rate(apiserver_request_duration_seconds_sum{subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod)\n/\nsum(rate(apiserver_request_duration_seconds_count{subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod)\n", "expr": "sum(rate(apiserver_request_duration_seconds_sum{subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod)\n/\nsum(rate(apiserver_request_duration_seconds_count{subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod)\n",
"record": "cluster:apiserver_request_duration_seconds:mean5m" "record": "cluster:apiserver_request_duration_seconds:mean5m"
@ -278,6 +303,143 @@ data:
} }
] ]
}, },
{
"interval": "3m",
"name": "kube-apiserver-availability.rules",
"rules": [
{
"expr": "1 - (\n (\n # write too slow\n sum(increase(apiserver_request_duration_seconds_count{verb=~\"POST|PUT|PATCH|DELETE\"}[30d]))\n -\n sum(increase(apiserver_request_duration_seconds_bucket{verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[30d]))\n ) +\n (\n # read too slow\n sum(increase(apiserver_request_duration_seconds_count{verb=~\"LIST|GET\"}[30d]))\n -\n (\n (\n sum(increase(apiserver_request_duration_seconds_bucket{verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[30d]))\n or\n vector(0)\n )\n +\n sum(increase(apiserver_request_duration_seconds_bucket{verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[30d]))\n +\n sum(increase(apiserver_request_duration_seconds_bucket{verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[30d]))\n )\n ) +\n # errors\n sum(code:apiserver_request_total:increase30d{code=~\"5..\"} or vector(0))\n)\n/\nsum(code:apiserver_request_total:increase30d)\n",
"labels": {
"verb": "all"
},
"record": "apiserver_request:availability30d"
},
{
"expr": "1 - (\n sum(increase(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[30d]))\n -\n (\n # too slow\n (\n sum(increase(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[30d]))\n or\n vector(0)\n )\n +\n sum(increase(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[30d]))\n +\n sum(increase(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[30d]))\n )\n +\n # errors\n sum(code:apiserver_request_total:increase30d{verb=\"read\",code=~\"5..\"} or vector(0))\n)\n/\nsum(code:apiserver_request_total:increase30d{verb=\"read\"})\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:availability30d"
},
{
"expr": "1 - (\n (\n # too slow\n sum(increase(apiserver_request_duration_seconds_count{verb=~\"POST|PUT|PATCH|DELETE\"}[30d]))\n -\n sum(increase(apiserver_request_duration_seconds_bucket{verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[30d]))\n )\n +\n # errors\n sum(code:apiserver_request_total:increase30d{verb=\"write\",code=~\"5..\"} or vector(0))\n)\n/\nsum(code:apiserver_request_total:increase30d{verb=\"write\"})\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:availability30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"LIST\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"GET\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"POST\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PUT\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PATCH\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"DELETE\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"LIST\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"GET\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"POST\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PUT\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PATCH\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"DELETE\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"LIST\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"GET\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"POST\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PUT\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PATCH\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"DELETE\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"LIST\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"GET\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"POST\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PUT\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PATCH\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"DELETE\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code) (code_verb:apiserver_request_total:increase30d{verb=~\"LIST|GET\"})\n",
"labels": {
"verb": "read"
},
"record": "code:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code) (code_verb:apiserver_request_total:increase30d{verb=~\"POST|PUT|PATCH|DELETE\"})\n",
"labels": {
"verb": "write"
},
"record": "code:apiserver_request_total:increase30d"
}
]
},
{ {
"name": "k8s.rules", "name": "k8s.rules",
"rules": [ "rules": [
@ -286,23 +448,23 @@ data:
"record": "namespace:container_cpu_usage_seconds_total:sum_rate" "record": "namespace:container_cpu_usage_seconds_total:sum_rate"
}, },
{ {
"expr": "sum by (cluster, namespace, pod, container) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])\n) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (\n 1, max by(cluster, namespace, pod, node) (kube_pod_info)\n)\n", "expr": "sum by (cluster, namespace, pod, container) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])\n) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (\n 1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate" "record": "node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate"
}, },
{ {
"expr": "container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info)\n)\n", "expr": "container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_working_set_bytes" "record": "node_namespace_pod_container:container_memory_working_set_bytes"
}, },
{ {
"expr": "container_memory_rss{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info)\n)\n", "expr": "container_memory_rss{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_rss" "record": "node_namespace_pod_container:container_memory_rss"
}, },
{ {
"expr": "container_memory_cache{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info)\n)\n", "expr": "container_memory_cache{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_cache" "record": "node_namespace_pod_container:container_memory_cache"
}, },
{ {
"expr": "container_memory_swap{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info)\n)\n", "expr": "container_memory_swap{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_swap" "record": "node_namespace_pod_container:container_memory_swap"
}, },
{ {
@ -322,21 +484,21 @@ data:
"labels": { "labels": {
"workload_type": "deployment" "workload_type": "deployment"
}, },
"record": "mixin_pod_workload" "record": "namespace_workload_pod:kube_pod_owner:relabel"
}, },
{ {
"expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"DaemonSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n", "expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"DaemonSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": { "labels": {
"workload_type": "daemonset" "workload_type": "daemonset"
}, },
"record": "mixin_pod_workload" "record": "namespace_workload_pod:kube_pod_owner:relabel"
}, },
{ {
"expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"StatefulSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n", "expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"StatefulSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": { "labels": {
"workload_type": "statefulset" "workload_type": "statefulset"
}, },
"record": "mixin_pod_workload" "record": "namespace_workload_pod:kube_pod_owner:relabel"
} }
] ]
}, },
@ -412,11 +574,11 @@ data:
"name": "node.rules", "name": "node.rules",
"rules": [ "rules": [
{ {
"expr": "sum(min(kube_pod_info) by (cluster, node))\n", "expr": "sum(min(kube_pod_info{node!=\"\"}) by (cluster, node))\n",
"record": ":kube_pod_info_node_count:" "record": ":kube_pod_info_node_count:"
}, },
{ {
"expr": "topk by(namespace, pod) (1,\n max by (node, namespace, pod) (\n label_replace(kube_pod_info{job=\"kube-state-metrics\"}, \"pod\", \"$1\", \"pod\", \"(.*)\")\n))\n", "expr": "topk by(namespace, pod) (1,\n max by (node, namespace, pod) (\n label_replace(kube_pod_info{job=\"kube-state-metrics\",node!=\"\"}, \"pod\", \"$1\", \"pod\", \"(.*)\")\n))\n",
"record": "node_namespace_pod:kube_pod_info:" "record": "node_namespace_pod:kube_pod_info:"
}, },
{ {
@ -461,104 +623,113 @@ data:
{ {
"alert": "KubePodCrashLooping", "alert": "KubePodCrashLooping",
"annotations": { "annotations": {
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf \"%.2f\" $value }} times / 5 minutes.", "description": "Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf \"%.2f\" $value }} times / 5 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping",
"summary": "Pod is crash looping."
}, },
"expr": "rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\"}[15m]) * 60 * 5 > 0\n", "expr": "rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\"}[5m]) * 60 * 5 > 0\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubePodNotReady", "alert": "KubePodNotReady",
"annotations": { "annotations": {
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 15 minutes.", "description": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready",
"summary": "Pod has been in a non-ready state for more than 15 minutes."
}, },
"expr": "sum by (namespace, pod) (max by(namespace, pod) (kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Pending|Unknown\"}) * on(namespace, pod) group_left(owner_kind) max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!=\"Job\"})) > 0\n", "expr": "sum by (namespace, pod) (\n max by(namespace, pod) (\n kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Pending|Unknown\"}\n ) * on(namespace, pod) group_left(owner_kind) topk by(namespace, pod) (\n 1, max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!=\"Job\"})\n )\n) > 0\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeDeploymentGenerationMismatch", "alert": "KubeDeploymentGenerationMismatch",
"annotations": { "annotations": {
"message": "Deployment generation for {{ $labels.namespace }}/{{ $labels.deployment }} does not match, this indicates that the Deployment has failed but has not been rolled back.", "description": "Deployment generation for {{ $labels.namespace }}/{{ $labels.deployment }} does not match, this indicates that the Deployment has failed but has not been rolled back.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentgenerationmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentgenerationmismatch",
"summary": "Deployment generation mismatch due to possible roll-back"
}, },
"expr": "kube_deployment_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_deployment_metadata_generation{job=\"kube-state-metrics\"}\n", "expr": "kube_deployment_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_deployment_metadata_generation{job=\"kube-state-metrics\"}\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeDeploymentReplicasMismatch", "alert": "KubeDeploymentReplicasMismatch",
"annotations": { "annotations": {
"message": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than 15 minutes.", "description": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch",
"summary": "Deployment has not matched the expected number of replicas."
}, },
"expr": "(\n kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n !=\n kube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n) and (\n changes(kube_deployment_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n", "expr": "(\n kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n !=\n kube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n) and (\n changes(kube_deployment_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeStatefulSetReplicasMismatch", "alert": "KubeStatefulSetReplicasMismatch",
"annotations": { "annotations": {
"message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 15 minutes.", "description": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetreplicasmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetreplicasmismatch",
"summary": "Deployment has not matched the expected number of replicas."
}, },
"expr": "(\n kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas{job=\"kube-state-metrics\"}\n) and (\n changes(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n", "expr": "(\n kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas{job=\"kube-state-metrics\"}\n) and (\n changes(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeStatefulSetGenerationMismatch", "alert": "KubeStatefulSetGenerationMismatch",
"annotations": { "annotations": {
"message": "StatefulSet generation for {{ $labels.namespace }}/{{ $labels.statefulset }} does not match, this indicates that the StatefulSet has failed but has not been rolled back.", "description": "StatefulSet generation for {{ $labels.namespace }}/{{ $labels.statefulset }} does not match, this indicates that the StatefulSet has failed but has not been rolled back.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetgenerationmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetgenerationmismatch",
"summary": "StatefulSet generation mismatch due to possible roll-back"
}, },
"expr": "kube_statefulset_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_metadata_generation{job=\"kube-state-metrics\"}\n", "expr": "kube_statefulset_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_metadata_generation{job=\"kube-state-metrics\"}\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeStatefulSetUpdateNotRolledOut", "alert": "KubeStatefulSetUpdateNotRolledOut",
"annotations": { "annotations": {
"message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update has not been rolled out.", "description": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update has not been rolled out.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetupdatenotrolledout" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetupdatenotrolledout",
"summary": "StatefulSet update has not been rolled out."
}, },
"expr": "max without (revision) (\n kube_statefulset_status_current_revision{job=\"kube-state-metrics\"}\n unless\n kube_statefulset_status_update_revision{job=\"kube-state-metrics\"}\n)\n *\n(\n kube_statefulset_replicas{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}\n)\n", "expr": "(\n max without (revision) (\n kube_statefulset_status_current_revision{job=\"kube-state-metrics\"}\n unless\n kube_statefulset_status_update_revision{job=\"kube-state-metrics\"}\n )\n *\n (\n kube_statefulset_replicas{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}\n )\n) and (\n changes(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeDaemonSetRolloutStuck", "alert": "KubeDaemonSetRolloutStuck",
"annotations": { "annotations": {
"message": "Only {{ $value | humanizePercentage }} of the desired Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are scheduled and ready.", "description": "DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has not finished or progressed for at least 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck",
"summary": "DaemonSet rollout is stuck."
}, },
"expr": "kube_daemonset_status_number_ready{job=\"kube-state-metrics\"}\n /\nkube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"} < 1.00\n", "expr": "(\n (\n kube_daemonset_status_current_number_scheduled{job=\"kube-state-metrics\"}\n !=\n kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n ) or (\n kube_daemonset_status_number_misscheduled{job=\"kube-state-metrics\"}\n !=\n 0\n ) or (\n kube_daemonset_updated_number_scheduled{job=\"kube-state-metrics\"}\n !=\n kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n ) or (\n kube_daemonset_status_number_available{job=\"kube-state-metrics\"}\n !=\n kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n )\n) and (\n changes(kube_daemonset_updated_number_scheduled{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeContainerWaiting", "alert": "KubeContainerWaiting",
"annotations": { "annotations": {
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{ $labels.container}} has been in waiting state for longer than 1 hour.", "description": "Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{ $labels.container}} has been in waiting state for longer than 1 hour.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontainerwaiting" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontainerwaiting",
"summary": "Pod container waiting longer than 1 hour"
}, },
"expr": "sum by (namespace, pod, container) (kube_pod_container_status_waiting_reason{job=\"kube-state-metrics\"}) > 0\n", "expr": "sum by (namespace, pod, container) (kube_pod_container_status_waiting_reason{job=\"kube-state-metrics\"}) > 0\n",
"for": "1h", "for": "1h",
@ -569,8 +740,9 @@ data:
{ {
"alert": "KubeDaemonSetNotScheduled", "alert": "KubeDaemonSetNotScheduled",
"annotations": { "annotations": {
"message": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled.", "description": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetnotscheduled" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetnotscheduled",
"summary": "DaemonSet pods are not scheduled."
}, },
"expr": "kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n -\nkube_daemonset_status_current_number_scheduled{job=\"kube-state-metrics\"} > 0\n", "expr": "kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n -\nkube_daemonset_status_current_number_scheduled{job=\"kube-state-metrics\"} > 0\n",
"for": "10m", "for": "10m",
@ -581,23 +753,12 @@ data:
{ {
"alert": "KubeDaemonSetMisScheduled", "alert": "KubeDaemonSetMisScheduled",
"annotations": { "annotations": {
"message": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are running where they are not supposed to run.", "description": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are running where they are not supposed to run.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetmisscheduled" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetmisscheduled",
"summary": "DaemonSet pods are misscheduled."
}, },
"expr": "kube_daemonset_status_number_misscheduled{job=\"kube-state-metrics\"} > 0\n", "expr": "kube_daemonset_status_number_misscheduled{job=\"kube-state-metrics\"} > 0\n",
"for": "10m", "for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeCronJobRunning",
"annotations": {
"message": "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is taking more than 1h to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecronjobrunning"
},
"expr": "time() - kube_cronjob_next_schedule_time{job=\"kube-state-metrics\"} > 3600\n",
"for": "1h",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
@ -605,11 +766,12 @@ data:
{ {
"alert": "KubeJobCompletion", "alert": "KubeJobCompletion",
"annotations": { "annotations": {
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more than one hour to complete.", "description": "Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more than 12 hours to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobcompletion" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobcompletion",
"summary": "Job did not complete in time"
}, },
"expr": "kube_job_spec_completions{job=\"kube-state-metrics\"} - kube_job_status_succeeded{job=\"kube-state-metrics\"} > 0\n", "expr": "kube_job_spec_completions{job=\"kube-state-metrics\"} - kube_job_status_succeeded{job=\"kube-state-metrics\"} > 0\n",
"for": "1h", "for": "12h",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
@ -617,8 +779,9 @@ data:
{ {
"alert": "KubeJobFailed", "alert": "KubeJobFailed",
"annotations": { "annotations": {
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete.", "description": "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobfailed" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobfailed",
"summary": "Job failed to complete."
}, },
"expr": "kube_job_failed{job=\"kube-state-metrics\"} > 0\n", "expr": "kube_job_failed{job=\"kube-state-metrics\"} > 0\n",
"for": "15m", "for": "15m",
@ -629,8 +792,9 @@ data:
{ {
"alert": "KubeHpaReplicasMismatch", "alert": "KubeHpaReplicasMismatch",
"annotations": { "annotations": {
"message": "HPA {{ $labels.namespace }}/{{ $labels.hpa }} has not matched the desired number of replicas for longer than 15 minutes.", "description": "HPA {{ $labels.namespace }}/{{ $labels.hpa }} has not matched the desired number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpareplicasmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpareplicasmismatch",
"summary": "HPA has not matched descired number of replicas."
}, },
"expr": "(kube_hpa_status_desired_replicas{job=\"kube-state-metrics\"}\n !=\nkube_hpa_status_current_replicas{job=\"kube-state-metrics\"})\n and\nchanges(kube_hpa_status_current_replicas[15m]) == 0\n", "expr": "(kube_hpa_status_desired_replicas{job=\"kube-state-metrics\"}\n !=\nkube_hpa_status_current_replicas{job=\"kube-state-metrics\"})\n and\nchanges(kube_hpa_status_current_replicas[15m]) == 0\n",
"for": "15m", "for": "15m",
@ -641,8 +805,9 @@ data:
{ {
"alert": "KubeHpaMaxedOut", "alert": "KubeHpaMaxedOut",
"annotations": { "annotations": {
"message": "HPA {{ $labels.namespace }}/{{ $labels.hpa }} has been running at max replicas for longer than 15 minutes.", "description": "HPA {{ $labels.namespace }}/{{ $labels.hpa }} has been running at max replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpamaxedout" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpamaxedout",
"summary": "HPA is running at max replicas"
}, },
"expr": "kube_hpa_status_current_replicas{job=\"kube-state-metrics\"}\n ==\nkube_hpa_spec_max_replicas{job=\"kube-state-metrics\"}\n", "expr": "kube_hpa_status_current_replicas{job=\"kube-state-metrics\"}\n ==\nkube_hpa_spec_max_replicas{job=\"kube-state-metrics\"}\n",
"for": "15m", "for": "15m",
@ -658,8 +823,9 @@ data:
{ {
"alert": "KubeCPUOvercommit", "alert": "KubeCPUOvercommit",
"annotations": { "annotations": {
"message": "Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.", "description": "Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit",
"summary": "Cluster has overcommitted CPU resource requests."
}, },
"expr": "sum(namespace:kube_pod_container_resource_requests_cpu_cores:sum{})\n /\nsum(kube_node_status_allocatable_cpu_cores)\n >\n(count(kube_node_status_allocatable_cpu_cores)-1) / count(kube_node_status_allocatable_cpu_cores)\n", "expr": "sum(namespace:kube_pod_container_resource_requests_cpu_cores:sum{})\n /\nsum(kube_node_status_allocatable_cpu_cores)\n >\n(count(kube_node_status_allocatable_cpu_cores)-1) / count(kube_node_status_allocatable_cpu_cores)\n",
"for": "5m", "for": "5m",
@ -668,10 +834,11 @@ data:
} }
}, },
{ {
"alert": "KubeMemOvercommit", "alert": "KubeMemoryOvercommit",
"annotations": { "annotations": {
"message": "Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.", "description": "Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememoryovercommit",
"summary": "Cluster has overcommitted memory resource requests."
}, },
"expr": "sum(namespace:kube_pod_container_resource_requests_memory_bytes:sum{})\n /\nsum(kube_node_status_allocatable_memory_bytes)\n >\n(count(kube_node_status_allocatable_memory_bytes)-1)\n /\ncount(kube_node_status_allocatable_memory_bytes)\n", "expr": "sum(namespace:kube_pod_container_resource_requests_memory_bytes:sum{})\n /\nsum(kube_node_status_allocatable_memory_bytes)\n >\n(count(kube_node_status_allocatable_memory_bytes)-1)\n /\ncount(kube_node_status_allocatable_memory_bytes)\n",
"for": "5m", "for": "5m",
@ -680,10 +847,11 @@ data:
} }
}, },
{ {
"alert": "KubeCPUOvercommit", "alert": "KubeCPUQuotaOvercommit",
"annotations": { "annotations": {
"message": "Cluster has overcommitted CPU resource requests for Namespaces.", "description": "Cluster has overcommitted CPU resource requests for Namespaces.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuquotaovercommit",
"summary": "Cluster has overcommitted CPU resource requests."
}, },
"expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"cpu\"})\n /\nsum(kube_node_status_allocatable_cpu_cores)\n > 1.5\n", "expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"cpu\"})\n /\nsum(kube_node_status_allocatable_cpu_cores)\n > 1.5\n",
"for": "5m", "for": "5m",
@ -692,10 +860,11 @@ data:
} }
}, },
{ {
"alert": "KubeMemOvercommit", "alert": "KubeMemoryQuotaOvercommit",
"annotations": { "annotations": {
"message": "Cluster has overcommitted memory resource requests for Namespaces.", "description": "Cluster has overcommitted memory resource requests for Namespaces.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememoryquotaovercommit",
"summary": "Cluster has overcommitted memory resource requests."
}, },
"expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"memory\"})\n /\nsum(kube_node_status_allocatable_memory_bytes{job=\"node-exporter\"})\n > 1.5\n", "expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"memory\"})\n /\nsum(kube_node_status_allocatable_memory_bytes{job=\"node-exporter\"})\n > 1.5\n",
"for": "5m", "for": "5m",
@ -703,13 +872,40 @@ data:
"severity": "warning" "severity": "warning"
} }
}, },
{
"alert": "KubeQuotaAlmostFull",
"annotations": {
"description": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaalmostfull",
"summary": "Namespace quota is going to be full."
},
"expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 0.9 < 1\n",
"for": "15m",
"labels": {
"severity": "info"
}
},
{
"alert": "KubeQuotaFullyUsed",
"annotations": {
"description": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotafullyused",
"summary": "Namespace quota is fully used."
},
"expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n == 1\n",
"for": "15m",
"labels": {
"severity": "info"
}
},
{ {
"alert": "KubeQuotaExceeded", "alert": "KubeQuotaExceeded",
"annotations": { "annotations": {
"message": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.", "description": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded",
"summary": "Namespace quota has exceeded the limits."
}, },
"expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 0.90\n", "expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 1\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -718,13 +914,14 @@ data:
{ {
"alert": "CPUThrottlingHigh", "alert": "CPUThrottlingHigh",
"annotations": { "annotations": {
"message": "{{ $value | humanizePercentage }} throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.", "description": "{{ $value | humanizePercentage }} throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh",
"summary": "Processes experience elevated CPU throttling."
}, },
"expr": "sum(increase(container_cpu_cfs_throttled_periods_total{container!=\"\", }[5m])) by (container, pod, namespace)\n /\nsum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)\n > ( 100 / 100 )\n", "expr": "sum(increase(container_cpu_cfs_throttled_periods_total{container!=\"\", }[5m])) by (container, pod, namespace)\n /\nsum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)\n > ( 80 / 100 )\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "info"
} }
} }
] ]
@ -733,10 +930,11 @@ data:
"name": "kubernetes-storage", "name": "kubernetes-storage",
"rules": [ "rules": [
{ {
"alert": "KubePersistentVolumeUsageCritical", "alert": "KubePersistentVolumeFillingUp",
"annotations": { "annotations": {
"message": "The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is only {{ $value | humanizePercentage }} free.", "description": "The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is only {{ $value | humanizePercentage }} free.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeusagecritical" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefillingup",
"summary": "PersistentVolume is filling up."
}, },
"expr": "kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\nkubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n < 0.03\n", "expr": "kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\nkubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n < 0.03\n",
"for": "1m", "for": "1m",
@ -745,22 +943,24 @@ data:
} }
}, },
{ {
"alert": "KubePersistentVolumeFullInFourDays", "alert": "KubePersistentVolumeFillingUp",
"annotations": { "annotations": {
"message": "Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is expected to fill up within four days. Currently {{ $value | humanizePercentage }} is available.", "description": "Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is expected to fill up within four days. Currently {{ $value | humanizePercentage }} is available.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefullinfourdays" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefillingup",
"summary": "PersistentVolume is filling up."
}, },
"expr": "(\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 0.15\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\n", "expr": "(\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 0.15\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\n",
"for": "1h", "for": "1h",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubePersistentVolumeErrors", "alert": "KubePersistentVolumeErrors",
"annotations": { "annotations": {
"message": "The persistent volume {{ $labels.persistentvolume }} has status {{ $labels.phase }}.", "description": "The persistent volume {{ $labels.persistentvolume }} has status {{ $labels.phase }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeerrors" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeerrors",
"summary": "PersistentVolume is having issues with provisioning."
}, },
"expr": "kube_persistentvolume_status_phase{phase=~\"Failed|Pending\",job=\"kube-state-metrics\"} > 0\n", "expr": "kube_persistentvolume_status_phase{phase=~\"Failed|Pending\",job=\"kube-state-metrics\"} > 0\n",
"for": "5m", "for": "5m",
@ -776,10 +976,11 @@ data:
{ {
"alert": "KubeVersionMismatch", "alert": "KubeVersionMismatch",
"annotations": { "annotations": {
"message": "There are {{ $value }} different semantic versions of Kubernetes components running.", "description": "There are {{ $value }} different semantic versions of Kubernetes components running.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch",
"summary": "Different semantic versions of Kubernetes components running."
}, },
"expr": "count(count by (gitVersion) (label_replace(kubernetes_build_info{job!~\"kube-dns|coredns\"},\"gitVersion\",\"$1\",\"gitVersion\",\"(v[0-9]*.[0-9]*.[0-9]*).*\"))) > 1\n", "expr": "count(count by (gitVersion) (label_replace(kubernetes_build_info{job!~\"kube-dns|coredns\"},\"gitVersion\",\"$1\",\"gitVersion\",\"(v[0-9]*.[0-9]*).*\"))) > 1\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -788,8 +989,9 @@ data:
{ {
"alert": "KubeClientErrors", "alert": "KubeClientErrors",
"annotations": { "annotations": {
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ $value | humanizePercentage }} errors.'", "description": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ $value | humanizePercentage }} errors.'",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors",
"summary": "Kubernetes API server client is experiencing errors."
}, },
"expr": "(sum(rate(rest_client_requests_total{code=~\"5..\"}[5m])) by (instance, job)\n /\nsum(rate(rest_client_requests_total[5m])) by (instance, job))\n> 0.01\n", "expr": "(sum(rate(rest_client_requests_total{code=~\"5..\"}[5m])) by (instance, job)\n /\nsum(rate(rest_client_requests_total[5m])) by (instance, job))\n> 0.01\n",
"for": "15m", "for": "15m",
@ -800,30 +1002,66 @@ data:
] ]
}, },
{ {
"name": "kube-apiserver-error-alerts", "name": "kube-apiserver-slos",
"rules": [ "rules": [
{ {
"alert": "ErrorBudgetBurn", "alert": "KubeAPIErrorBudgetBurn",
"annotations": { "annotations": {
"message": "High requests error budget burn for job=apiserver (current value: {{ $value }})", "description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-errorbudgetburn" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
}, },
"expr": "(\n status_class_5xx:apiserver_request_total:ratio_rate1h{job=\"apiserver\"} > (14.4*0.010000)\n and\n status_class_5xx:apiserver_request_total:ratio_rate5m{job=\"apiserver\"} > (14.4*0.010000)\n)\nor\n(\n status_class_5xx:apiserver_request_total:ratio_rate6h{job=\"apiserver\"} > (6*0.010000)\n and\n status_class_5xx:apiserver_request_total:ratio_rate30m{job=\"apiserver\"} > (6*0.010000)\n)\n", "expr": "sum(apiserver_request:burnrate1h) > (14.40 * 0.01000)\nand\nsum(apiserver_request:burnrate5m) > (14.40 * 0.01000)\n",
"for": "2m",
"labels": { "labels": {
"job": "apiserver", "long": "1h",
"severity": "critical" "severity": "critical",
"short": "5m"
} }
}, },
{ {
"alert": "ErrorBudgetBurn", "alert": "KubeAPIErrorBudgetBurn",
"annotations": { "annotations": {
"message": "High requests error budget burn for job=apiserver (current value: {{ $value }})", "description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-errorbudgetburn" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
}, },
"expr": "(\n status_class_5xx:apiserver_request_total:ratio_rate1d{job=\"apiserver\"} > (3*0.010000)\n and\n status_class_5xx:apiserver_request_total:ratio_rate2h{job=\"apiserver\"} > (3*0.010000)\n)\nor\n(\n status_class_5xx:apiserver_request_total:ratio_rate3d{job=\"apiserver\"} > (0.010000)\n and\n status_class_5xx:apiserver_request_total:ratio_rate6h{job=\"apiserver\"} > (0.010000)\n)\n", "expr": "sum(apiserver_request:burnrate6h) > (6.00 * 0.01000)\nand\nsum(apiserver_request:burnrate30m) > (6.00 * 0.01000)\n",
"for": "15m",
"labels": { "labels": {
"job": "apiserver", "long": "6h",
"severity": "warning" "severity": "critical",
"short": "30m"
}
},
{
"alert": "KubeAPIErrorBudgetBurn",
"annotations": {
"description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
},
"expr": "sum(apiserver_request:burnrate1d) > (3.00 * 0.01000)\nand\nsum(apiserver_request:burnrate2h) > (3.00 * 0.01000)\n",
"for": "1h",
"labels": {
"long": "1d",
"severity": "warning",
"short": "2h"
}
},
{
"alert": "KubeAPIErrorBudgetBurn",
"annotations": {
"description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
},
"expr": "sum(apiserver_request:burnrate3d) > (1.00 * 0.01000)\nand\nsum(apiserver_request:burnrate6h) > (1.00 * 0.01000)\n",
"for": "3h",
"labels": {
"long": "3d",
"severity": "warning",
"short": "6h"
} }
} }
] ]
@ -831,59 +1069,12 @@ data:
{ {
"name": "kubernetes-system-apiserver", "name": "kubernetes-system-apiserver",
"rules": [ "rules": [
{
"alert": "KubeAPILatencyHigh",
"annotations": {
"message": "The API server has an abnormal latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
},
"expr": "(\n cluster:apiserver_request_duration_seconds:mean5m{job=\"apiserver\"}\n >\n on (verb) group_left()\n (\n avg by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job=\"apiserver\"} >= 0)\n +\n 2*stddev by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job=\"apiserver\"} >= 0)\n )\n) > on (verb) group_left()\n1.2 * avg by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job=\"apiserver\"} >= 0)\nand on (verb,resource)\ncluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\"}\n>\n1\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPILatencyHigh",
"annotations": {
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
},
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\"} > 4\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (resource,subresource,verb) > 0.10\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (resource,subresource,verb) > 0.05\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{ {
"alert": "KubeClientCertificateExpiration", "alert": "KubeClientCertificateExpiration",
"annotations": { "annotations": {
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 1.0 hours.", "description": "A client certificate used to authenticate to the apiserver is expiring in less than 1.0 hours.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration",
"summary": "Client certificate is about to expire."
}, },
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 3600\n", "expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 3600\n",
"labels": { "labels": {
@ -893,8 +1084,9 @@ data:
{ {
"alert": "KubeClientCertificateExpiration", "alert": "KubeClientCertificateExpiration",
"annotations": { "annotations": {
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 0.1 hours.", "description": "A client certificate used to authenticate to the apiserver is expiring in less than 0.1 hours.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration",
"summary": "Client certificate is about to expire."
}, },
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 300\n", "expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 300\n",
"labels": { "labels": {
@ -904,8 +1096,9 @@ data:
{ {
"alert": "AggregatedAPIErrors", "alert": "AggregatedAPIErrors",
"annotations": { "annotations": {
"message": "An aggregated API {{ $labels.name }}/{{ $labels.namespace }} has reported errors. The number of errors have increased for it in the past five minutes. High values indicate that the availability of the service changes too often.", "description": "An aggregated API {{ $labels.name }}/{{ $labels.namespace }} has reported errors. The number of errors have increased for it in the past five minutes. High values indicate that the availability of the service changes too often.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapierrors" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapierrors",
"summary": "An aggregated API has reported errors."
}, },
"expr": "sum by(name, namespace)(increase(aggregator_unavailable_apiservice_count[5m])) > 2\n", "expr": "sum by(name, namespace)(increase(aggregator_unavailable_apiservice_count[5m])) > 2\n",
"labels": { "labels": {
@ -915,10 +1108,11 @@ data:
{ {
"alert": "AggregatedAPIDown", "alert": "AggregatedAPIDown",
"annotations": { "annotations": {
"message": "An aggregated API {{ $labels.name }}/{{ $labels.namespace }} is down. It has not been available at least for the past five minutes.", "description": "An aggregated API {{ $labels.name }}/{{ $labels.namespace }} has been only {{ $value | humanize }}% available over the last 10m.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapidown" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapidown",
"summary": "An aggregated API is down."
}, },
"expr": "sum by(name, namespace)(sum_over_time(aggregator_unavailable_apiservice[5m])) > 0\n", "expr": "(1 - max by(name, namespace)(avg_over_time(aggregator_unavailable_apiservice[10m]))) * 100 < 85\n",
"for": "5m", "for": "5m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -927,8 +1121,9 @@ data:
{ {
"alert": "KubeAPIDown", "alert": "KubeAPIDown",
"annotations": { "annotations": {
"message": "KubeAPI has disappeared from Prometheus target discovery.", "description": "KubeAPI has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown",
"summary": "Target disappeared from Prometheus target discovery."
}, },
"expr": "absent(up{job=\"apiserver\"} == 1)\n", "expr": "absent(up{job=\"apiserver\"} == 1)\n",
"for": "15m", "for": "15m",
@ -944,8 +1139,9 @@ data:
{ {
"alert": "KubeNodeNotReady", "alert": "KubeNodeNotReady",
"annotations": { "annotations": {
"message": "{{ $labels.node }} has been unready for more than 15 minutes.", "description": "{{ $labels.node }} has been unready for more than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready",
"summary": "Node is not ready."
}, },
"expr": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n", "expr": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n",
"for": "15m", "for": "15m",
@ -956,11 +1152,12 @@ data:
{ {
"alert": "KubeNodeUnreachable", "alert": "KubeNodeUnreachable",
"annotations": { "annotations": {
"message": "{{ $labels.node }} is unreachable and some workloads may be rescheduled.", "description": "{{ $labels.node }} is unreachable and some workloads may be rescheduled.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodeunreachable" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodeunreachable",
"summary": "Node is unreachable."
}, },
"expr": "kube_node_spec_taint{job=\"kube-state-metrics\",key=\"node.kubernetes.io/unreachable\",effect=\"NoSchedule\"} == 1\n", "expr": "(kube_node_spec_taint{job=\"kube-state-metrics\",key=\"node.kubernetes.io/unreachable\",effect=\"NoSchedule\"} unless ignoring(key,value) kube_node_spec_taint{job=\"kube-state-metrics\",key=~\"ToBeDeletedByClusterAutoscaler|cloud.google.com/impending-node-termination|aws-node-termination-handler/spot-itn\"}) == 1\n",
"for": "2m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
@ -968,10 +1165,11 @@ data:
{ {
"alert": "KubeletTooManyPods", "alert": "KubeletTooManyPods",
"annotations": { "annotations": {
"message": "Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage }} of its Pod capacity.", "description": "Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage }} of its Pod capacity.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods",
"summary": "Kubelet is running at capacity."
}, },
"expr": "max(max(kubelet_running_pod_count{job=\"kubelet\"}) by(instance) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"}) by(node) / max(kube_node_status_capacity_pods{job=\"kube-state-metrics\"} != 1) by(node) > 0.95\n", "expr": "count by(node) (\n (kube_pod_status_phase{job=\"kube-state-metrics\",phase=\"Running\"} == 1) * on(instance,pod,namespace,cluster) group_left(node) topk by(instance,pod,namespace,cluster) (1, kube_pod_info{job=\"kube-state-metrics\"})\n)\n/\nmax by(node) (\n kube_node_status_capacity_pods{job=\"kube-state-metrics\"} != 1\n) > 0.95\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -980,8 +1178,9 @@ data:
{ {
"alert": "KubeNodeReadinessFlapping", "alert": "KubeNodeReadinessFlapping",
"annotations": { "annotations": {
"message": "The readiness status of node {{ $labels.node }} has changed {{ $value }} times in the last 15 minutes.", "description": "The readiness status of node {{ $labels.node }} has changed {{ $value }} times in the last 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping",
"summary": "Node readiness status is flapping."
}, },
"expr": "sum(changes(kube_node_status_condition{status=\"true\",condition=\"Ready\"}[15m])) by (node) > 2\n", "expr": "sum(changes(kube_node_status_condition{status=\"true\",condition=\"Ready\"}[15m])) by (node) > 2\n",
"for": "15m", "for": "15m",
@ -992,8 +1191,9 @@ data:
{ {
"alert": "KubeletPlegDurationHigh", "alert": "KubeletPlegDurationHigh",
"annotations": { "annotations": {
"message": "The Kubelet Pod Lifecycle Event Generator has a 99th percentile duration of {{ $value }} seconds on node {{ $labels.node }}.", "description": "The Kubelet Pod Lifecycle Event Generator has a 99th percentile duration of {{ $value }} seconds on node {{ $labels.node }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletplegdurationhigh" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletplegdurationhigh",
"summary": "Kubelet Pod Lifecycle Event Generator is taking too long to relist."
}, },
"expr": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile{quantile=\"0.99\"} >= 10\n", "expr": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile{quantile=\"0.99\"} >= 10\n",
"for": "5m", "for": "5m",
@ -1004,10 +1204,85 @@ data:
{ {
"alert": "KubeletPodStartUpLatencyHigh", "alert": "KubeletPodStartUpLatencyHigh",
"annotations": { "annotations": {
"message": "Kubelet Pod startup 99th percentile latency is {{ $value }} seconds on node {{ $labels.node }}.", "description": "Kubelet Pod startup 99th percentile latency is {{ $value }} seconds on node {{ $labels.node }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh",
"summary": "Kubelet Pod startup latency is too high."
}, },
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\"}[5m])) by (instance, le)) * on(instance) group_left(node) kubelet_node_name > 60\n", "expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\"}[5m])) by (instance, le)) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"} > 60\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletClientCertificateExpiration",
"annotations": {
"description": "Client certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletclientcertificateexpiration",
"summary": "Kubelet client certificate is about to expire."
},
"expr": "kubelet_certificate_manager_client_ttl_seconds < 3600\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletClientCertificateExpiration",
"annotations": {
"description": "Client certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletclientcertificateexpiration",
"summary": "Kubelet client certificate is about to expire."
},
"expr": "kubelet_certificate_manager_client_ttl_seconds < 300\n",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeletServerCertificateExpiration",
"annotations": {
"description": "Server certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletservercertificateexpiration",
"summary": "Kubelet server certificate is about to expire."
},
"expr": "kubelet_certificate_manager_server_ttl_seconds < 3600\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletServerCertificateExpiration",
"annotations": {
"description": "Server certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletservercertificateexpiration",
"summary": "Kubelet server certificate is about to expire."
},
"expr": "kubelet_certificate_manager_server_ttl_seconds < 300\n",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeletClientCertificateRenewalErrors",
"annotations": {
"description": "Kubelet on node {{ $labels.node }} has failed to renew its client certificate ({{ $value | humanize }} errors in the last 5 minutes).",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletclientcertificaterenewalerrors",
"summary": "Kubelet has failed to renew its client certificate."
},
"expr": "increase(kubelet_certificate_manager_client_expiration_renew_errors[5m]) > 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletServerCertificateRenewalErrors",
"annotations": {
"description": "Kubelet on node {{ $labels.node }} has failed to renew its server certificate ({{ $value | humanize }} errors in the last 5 minutes).",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletservercertificaterenewalerrors",
"summary": "Kubelet has failed to renew its server certificate."
},
"expr": "increase(kubelet_server_expiration_renew_errors[5m]) > 0\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -1016,8 +1291,9 @@ data:
{ {
"alert": "KubeletDown", "alert": "KubeletDown",
"annotations": { "annotations": {
"message": "Kubelet has disappeared from Prometheus target discovery.", "description": "Kubelet has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown",
"summary": "Target disappeared from Prometheus target discovery."
}, },
"expr": "absent(up{job=\"kubelet\"} == 1)\n", "expr": "absent(up{job=\"kubelet\"} == 1)\n",
"for": "15m", "for": "15m",
@ -1033,8 +1309,9 @@ data:
{ {
"alert": "KubeSchedulerDown", "alert": "KubeSchedulerDown",
"annotations": { "annotations": {
"message": "KubeScheduler has disappeared from Prometheus target discovery.", "description": "KubeScheduler has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown",
"summary": "Target disappeared from Prometheus target discovery."
}, },
"expr": "absent(up{job=\"kube-scheduler\"} == 1)\n", "expr": "absent(up{job=\"kube-scheduler\"} == 1)\n",
"for": "15m", "for": "15m",
@ -1050,8 +1327,9 @@ data:
{ {
"alert": "KubeControllerManagerDown", "alert": "KubeControllerManagerDown",
"annotations": { "annotations": {
"message": "KubeControllerManager has disappeared from Prometheus target discovery.", "description": "KubeControllerManager has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown",
"summary": "Target disappeared from Prometheus target discovery."
}, },
"expr": "absent(up{job=\"kube-controller-manager\"} == 1)\n", "expr": "absent(up{job=\"kube-controller-manager\"} == 1)\n",
"for": "15m", "for": "15m",
@ -1350,14 +1628,25 @@ data:
{ {
"alert": "NodeHighNumberConntrackEntriesUsed", "alert": "NodeHighNumberConntrackEntriesUsed",
"annotations": { "annotations": {
"description": "{{ $value | humanizePercentage }} of conntrack entries are used", "description": "{{ $value | humanizePercentage }} of conntrack entries are used.",
"summary": "Number of conntrack are getting close to the limit" "summary": "Number of conntrack are getting close to the limit."
}, },
"expr": "(node_nf_conntrack_entries / node_nf_conntrack_entries_limit) > 0.75\n", "expr": "(node_nf_conntrack_entries / node_nf_conntrack_entries_limit) > 0.75\n",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
}, },
{
"alert": "NodeTextFileCollectorScrapeError",
"annotations": {
"description": "Node Exporter text file collector failed to scrape.",
"summary": "Node Exporter text file collector failed to scrape."
},
"expr": "node_textfile_scrape_error{job=\"node-exporter\"} == 1\n",
"labels": {
"severity": "warning"
}
},
{ {
"alert": "NodeClockSkewDetected", "alert": "NodeClockSkewDetected",
"annotations": { "annotations": {
@ -1381,6 +1670,29 @@ data:
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
},
{
"alert": "NodeRAIDDegraded",
"annotations": {
"description": "RAID array '{{ $labels.device }}' on {{ $labels.instance }} is in degraded state due to one or more disks failures. Number of spare drives is insufficient to fix issue automatically.",
"summary": "RAID Array is degraded"
},
"expr": "node_md_disks_required - ignoring (state) (node_md_disks{state=\"active\"}) > 0\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeRAIDDiskFailure",
"annotations": {
"description": "At least one device in RAID array on {{ $labels.instance }} failed. Array '{{ $labels.device }}' needs attention and possibly a disk swap.",
"summary": "Failed device in RAID array"
},
"expr": "node_md_disks{state=\"fail\"} > 0\n",
"labels": {
"severity": "warning"
}
} }
] ]
} }
@ -1515,7 +1827,7 @@ data:
{ {
"alert": "PrometheusRemoteStorageFailures", "alert": "PrometheusRemoteStorageFailures",
"annotations": { "annotations": {
"description": "Prometheus {{$labels.instance}} failed to send {{ printf \"%.1f\" $value }}% of the samples to {{ if $labels.queue }}{{ $labels.queue }}{{ else }}{{ $labels.url }}{{ end }}.", "description": "Prometheus {{$labels.instance}} failed to send {{ printf \"%.1f\" $value }}% of the samples to {{ $labels.remote_name}}:{{ $labels.url }}",
"summary": "Prometheus fails to send samples to remote storage." "summary": "Prometheus fails to send samples to remote storage."
}, },
"expr": "(\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n/\n (\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n +\n rate(prometheus_remote_storage_succeeded_samples_total{job=\"prometheus\"}[5m])\n )\n)\n* 100\n> 1\n", "expr": "(\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n/\n (\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n +\n rate(prometheus_remote_storage_succeeded_samples_total{job=\"prometheus\"}[5m])\n )\n)\n* 100\n> 1\n",
@ -1527,7 +1839,7 @@ data:
{ {
"alert": "PrometheusRemoteWriteBehind", "alert": "PrometheusRemoteWriteBehind",
"annotations": { "annotations": {
"description": "Prometheus {{$labels.instance}} remote write is {{ printf \"%.1f\" $value }}s behind for {{ if $labels.queue }}{{ $labels.queue }}{{ else }}{{ $labels.url }}{{ end }}.", "description": "Prometheus {{$labels.instance}} remote write is {{ printf \"%.1f\" $value }}s behind for {{ $labels.remote_name}}:{{ $labels.url }}.",
"summary": "Prometheus remote write is behind." "summary": "Prometheus remote write is behind."
}, },
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds{job=\"prometheus\"}[5m])\n- on(job, instance) group_right\n max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job=\"prometheus\"}[5m])\n)\n> 120\n", "expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds{job=\"prometheus\"}[5m])\n- on(job, instance) group_right\n max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job=\"prometheus\"}[5m])\n)\n> 120\n",
@ -1539,7 +1851,7 @@ data:
{ {
"alert": "PrometheusRemoteWriteDesiredShards", "alert": "PrometheusRemoteWriteDesiredShards",
"annotations": { "annotations": {
"description": "Prometheus {{$labels.instance}} remote write desired shards calculation wants to run {{ $value }} shards, which is more than the max of {{ printf `prometheus_remote_storage_shards_max{instance=\"%s\",job=\"prometheus\"}` $labels.instance | query | first | value }}.", "description": "Prometheus {{$labels.instance}} remote write desired shards calculation wants to run {{ $value }} shards for queue {{ $labels.remote_name}}:{{ $labels.url }}, which is more than the max of {{ printf `prometheus_remote_storage_shards_max{instance=\"%s\",job=\"prometheus\"}` $labels.instance | query | first | value }}.",
"summary": "Prometheus remote write desired shards calculation wants to run more than configured max shards." "summary": "Prometheus remote write desired shards calculation wants to run more than configured max shards."
}, },
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_shards_desired{job=\"prometheus\"}[5m])\n>\n max_over_time(prometheus_remote_storage_shards_max{job=\"prometheus\"}[5m])\n)\n", "expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_shards_desired{job=\"prometheus\"}[5m])\n>\n max_over_time(prometheus_remote_storage_shards_max{job=\"prometheus\"}[5m])\n)\n",
@ -1571,6 +1883,18 @@ data:
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
},
{
"alert": "PrometheusTargetLimitHit",
"annotations": {
"description": "Prometheus {{$labels.instance}} has dropped {{ printf \"%.0f\" $value }} targets because the number of targets exceeded the configured target_limit.",
"summary": "Prometheus has dropped targets because some scrape configs have exceeded the targets limit."
},
"expr": "increase(prometheus_target_scrape_pool_exceeded_target_limit_total{job=\"prometheus\"}[5m]) > 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
} }
] ]
} }

View File

@ -1,50 +0,0 @@
locals {
# Pick a CoreOS Container Linux derivative
# coreos-stable -> Container Linux AMI
# flatcar-stable -> Flatcar Linux AMI
ami_id = local.flavor == "flatcar" ? data.aws_ami.flatcar.image_id : data.aws_ami.coreos.image_id
flavor = split("-", var.os_image)[0]
channel = split("-", var.os_image)[1]
}
data "aws_ami" "coreos" {
most_recent = true
owners = ["595879546273"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["CoreOS-${local.flavor == "coreos" ? local.channel : "stable"}-*"]
}
}
data "aws_ami" "flatcar" {
most_recent = true
owners = ["075585003325"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-${local.flavor == "flatcar" ? local.channel : "stable"}-*"]
}
}

View File

@ -1,50 +0,0 @@
locals {
# Pick a CoreOS Container Linux derivative
# coreos-stable -> Container Linux AMI
# flatcar-stable -> Flatcar Linux AMI
ami_id = local.flavor == "flatcar" ? data.aws_ami.flatcar.image_id : data.aws_ami.coreos.image_id
flavor = split("-", var.os_image)[0]
channel = split("-", var.os_image)[1]
}
data "aws_ami" "coreos" {
most_recent = true
owners = ["595879546273"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["CoreOS-${local.flavor == "coreos" ? local.channel : "stable"}-*"]
}
}
data "aws_ami" "flatcar" {
most_recent = true
owners = ["075585003325"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-${local.flavor == "flatcar" ? local.channel : "stable"}-*"]
}
}

View File

@ -1,140 +0,0 @@
---
systemd:
units:
- name: docker.service
enabled: true
- name: locksmithd.service
mask: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet
Wants=rpc-statd.service
[Service]
Environment=KUBELET_IMAGE=docker://quay.io/poseidon/kubelet:v1.19.0
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/bin/rkt run \
--uuid-file-save=/var/cache/kubelet-pod.uuid \
--stage1-from-dir=stage1-fly.aci \
--hosts-entry host \
--insecure-options=image \
--volume etc-kubernetes,kind=host,source=/etc/kubernetes,readOnly=true \
--mount volume=etc-kubernetes,target=/etc/kubernetes \
--volume etc-machine-id,kind=host,source=/etc/machine-id,readOnly=true \
--mount volume=etc-machine-id,target=/etc/machine-id \
--volume etc-os-release,kind=host,source=/usr/lib/os-release,readOnly=true \
--mount volume=etc-os-release,target=/etc/os-release \
--volume=etc-resolv,kind=host,source=/etc/resolv.conf,readOnly=true \
--mount volume=etc-resolv,target=/etc/resolv.conf \
--volume etc-ssl-certs,kind=host,source=/etc/ssl/certs,readOnly=true \
--mount volume=etc-ssl-certs,target=/etc/ssl/certs \
--volume lib-modules,kind=host,source=/lib/modules,readOnly=true \
--mount volume=lib-modules,target=/lib/modules \
--volume run,kind=host,source=/run \
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
--volume var-lib-kubelet,kind=host,source=/var/lib/kubelet,recursive=true \
--mount volume=var-lib-kubelet,target=/var/lib/kubelet \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
$${KUBELET_IMAGE} -- \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--cgroup-driver=$${KUBELET_CGROUP_DRIVER} \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--healthz-port=0 \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{~ for label in split(",", node_labels) ~}
--node-labels=${label} \
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
- name: delete-node.service
enable: true
contents: |
[Unit]
Description=Waiting to delete Kubernetes node on shutdown
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/etc/kubernetes/delete-node
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubeconfig
filesystem: root
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
mode: 0644
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/kubernetes/delete-node
filesystem: root
mode: 0744
contents:
inline: |
#!/bin/bash
set -e
exec /usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://quay.io/poseidon/kubelet:v1.19.0 \
--net=host \
--dns=host \
--exec=/usr/local/bin/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_key}"

View File

@ -11,10 +11,10 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.19.0 (upstream) * Kubernetes v1.19.4 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/fedora-coreos/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, CSI, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, CSI, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs ## Docs

View File

@ -18,3 +18,27 @@ data "aws_ami" "fedora-coreos" {
values = ["Fedora CoreOS ${var.os_stream} *"] values = ["Fedora CoreOS ${var.os_stream} *"]
} }
} }
# Experimental Fedora CoreOS arm64 / aarch64 AMIs from Poseidon
# WARNING: These AMIs will be removed when Fedora CoreOS publishes arm64 AMIs
# and may be removed for any reason before then as well. Do not use.
data "aws_ami" "fedora-coreos-arm" {
most_recent = true
owners = ["099663496933"]
filter {
name = "architecture"
values = ["arm64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["fedora-coreos-*"]
}
}

View File

@ -1,11 +1,10 @@
# Kubernetes assets (kubeconfig, manifests) # Kubernetes assets (kubeconfig, manifests)
module "bootstrap" { module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=79343f02aea7c69bb03dab2051aa95248c0471d7" source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=49216ab82c236520204c4c85c8e52edbd722e1f4"
cluster_name = var.cluster_name cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)] api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]
etcd_servers = aws_route53_record.etcds.*.fqdn etcd_servers = aws_route53_record.etcds.*.fqdn
asset_dir = var.asset_dir
networking = var.networking networking = var.networking
network_mtu = var.network_mtu network_mtu = var.network_mtu
pod_cidr = var.pod_cidr pod_cidr = var.pod_cidr
@ -13,6 +12,7 @@ module "bootstrap" {
cluster_domain_suffix = var.cluster_domain_suffix cluster_domain_suffix = var.cluster_domain_suffix
enable_reporting = var.enable_reporting enable_reporting = var.enable_reporting
enable_aggregation = var.enable_aggregation enable_aggregation = var.enable_aggregation
daemonset_tolerations = var.daemonset_tolerations
trusted_certs_dir = "/etc/pki/tls/certs" trusted_certs_dir = "/etc/pki/tls/certs"
} }

View File

@ -22,9 +22,8 @@ resource "aws_instance" "controllers" {
} }
instance_type = var.controller_type instance_type = var.controller_type
ami = var.arch == "arm64" ? data.aws_ami.fedora-coreos-arm.image_id : data.aws_ami.fedora-coreos.image_id
ami = data.aws_ami.fedora-coreos.image_id user_data = data.ct_config.controller-ignitions.*.rendered[count.index]
user_data = data.ct_config.controller-ignitions.*.rendered[count.index]
# storage # storage
root_block_device { root_block_device {
@ -63,6 +62,7 @@ data "template_file" "controller-configs" {
vars = { vars = {
# Cannot use cyclic dependencies on controllers or their DNS records # Cannot use cyclic dependencies on controllers or their DNS records
etcd_arch = var.arch == "arm64" ? "-arm64" : ""
etcd_name = "etcd${count.index}" etcd_name = "etcd${count.index}"
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}" etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,... # etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...

View File

@ -1,6 +1,6 @@
--- ---
variant: fcos variant: fcos
version: 1.0.0 version: 1.1.0
systemd: systemd:
units: units:
- name: etcd-member.service - name: etcd-member.service
@ -8,28 +8,25 @@ systemd:
contents: | contents: |
[Unit] [Unit]
Description=etcd (System Container) Description=etcd (System Container)
Documentation=https://github.com/coreos/etcd Documentation=https://github.com/etcd-io/etcd
Wants=network-online.target network.target Wants=network-online.target network.target
After=network-online.target After=network-online.target
[Service] [Service]
# https://github.com/opencontainers/runc/pull/1807 Environment=ETCD_IMAGE=quay.io/coreos/etcd:v3.4.12${etcd_arch}
# Type=notify
# NotifyAccess=exec
Type=exec Type=exec
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
ExecStartPre=/bin/mkdir -p /var/lib/etcd ExecStartPre=/bin/mkdir -p /var/lib/etcd
ExecStartPre=-/usr/bin/podman rm etcd ExecStartPre=-/usr/bin/podman rm etcd
#--volume $${NOTIFY_SOCKET}:/run/systemd/notify \
ExecStart=/usr/bin/podman run --name etcd \ ExecStart=/usr/bin/podman run --name etcd \
--env-file /etc/etcd/etcd.env \ --env-file /etc/etcd/etcd.env \
--network host \ --network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \ --volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \ --volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.4.12 $${ETCD_IMAGE}
ExecStop=/usr/bin/podman stop etcd ExecStop=/usr/bin/podman stop etcd
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
- name: docker.service - name: docker.service
@ -55,7 +52,7 @@ systemd:
Description=Kubelet (System Container) Description=Kubelet (System Container)
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.0 Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
@ -69,12 +66,10 @@ systemd:
--network host \ --network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \ --volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \ --volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \ --volume /lib/modules:/lib/modules:ro \
--volume /run:/run \ --volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \ --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \ --volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico:ro \ --volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \ --volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \ --volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
@ -124,7 +119,7 @@ systemd:
--volume /opt/bootstrap/assets:/assets:ro,Z \ --volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \ --volume /opt/bootstrap/apply:/apply:ro,Z \
--entrypoint=/apply \ --entrypoint=/apply \
quay.io/poseidon/kubelet:v1.19.0 quay.io/poseidon/kubelet:v1.19.4
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap ExecStartPost=-/usr/bin/podman stop bootstrap
storage: storage:
@ -202,8 +197,6 @@ storage:
mode: 0644 mode: 0644
contents: contents:
inline: | inline: |
# TODO: Use a systemd dropin once podman v1.4.5 is avail.
NOTIFY_SOCKET=/run/systemd/notify
ETCD_NAME=${etcd_name} ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379 ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379
@ -221,6 +214,7 @@ storage:
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true ETCD_PEER_CLIENT_CERT_AUTH=true
ETCD_UNSUPPORTED_ARCH=arm64
passwd: passwd:
users: users:
- name: core - name: core

View File

@ -52,3 +52,9 @@ output "worker_target_group_https" {
value = module.workers.target_group_https value = module.workers.target_group_https
} }
# Outputs for debug
output "assets_dist" {
value = module.bootstrap.assets_dist
}

View File

@ -96,12 +96,6 @@ variable "ssh_authorized_key" {
description = "SSH public key for user 'core'" description = "SSH public key for user 'core'"
} }
variable "asset_dir" {
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "networking" { variable "networking" {
type = string type = string
description = "Choice of networking provider (calico or flannel)" description = "Choice of networking provider (calico or flannel)"
@ -161,3 +155,15 @@ variable "cluster_domain_suffix" {
default = "cluster.local" default = "cluster.local"
} }
variable "arch" {
type = string
description = "Container architecture (amd64 or arm64)"
default = "amd64"
}
variable "daemonset_tolerations" {
type = list(string)
description = "List of additional taint keys kube-system DaemonSets should tolerate (e.g. ['custom-role', 'gpu-role'])"
default = []
}

View File

@ -9,6 +9,7 @@ module "workers" {
worker_count = var.worker_count worker_count = var.worker_count
instance_type = var.worker_type instance_type = var.worker_type
os_stream = var.os_stream os_stream = var.os_stream
arch = var.arch
disk_size = var.disk_size disk_size = var.disk_size
spot_price = var.worker_price spot_price = var.worker_price
target_groups = var.worker_target_groups target_groups = var.worker_target_groups

View File

@ -18,3 +18,27 @@ data "aws_ami" "fedora-coreos" {
values = ["Fedora CoreOS ${var.os_stream} *"] values = ["Fedora CoreOS ${var.os_stream} *"]
} }
} }
# Experimental Fedora CoreOS arm64 / aarch64 AMIs from Poseidon
# WARNING: These AMIs will be removed when Fedora CoreOS publishes arm64 AMIs
# and may be removed for any reason before then as well. Do not use.
data "aws_ami" "fedora-coreos-arm" {
most_recent = true
owners = ["099663496933"]
filter {
name = "architecture"
values = ["arm64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["fedora-coreos-*"]
}
}

View File

@ -1,6 +1,6 @@
--- ---
variant: fcos variant: fcos
version: 1.0.0 version: 1.1.0
systemd: systemd:
units: units:
- name: docker.service - name: docker.service
@ -25,7 +25,7 @@ systemd:
Description=Kubelet (System Container) Description=Kubelet (System Container)
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.0 Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
@ -39,12 +39,10 @@ systemd:
--network host \ --network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \ --volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \ --volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \ --volume /lib/modules:/lib/modules:ro \
--volume /run:/run \ --volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \ --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \ --volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico:ro \ --volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \ --volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \ --volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
@ -70,6 +68,9 @@ systemd:
%{~ for label in split(",", node_labels) ~} %{~ for label in split(",", node_labels) ~}
--node-labels=${label} \ --node-labels=${label} \
%{~ endfor ~} %{~ endfor ~}
%{~ for taint in split(",", node_taints) ~}
--register-with-taints=${taint} \
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \ --pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \ --read-only-port=0 \
--rotate-certificates \ --rotate-certificates \
@ -86,10 +87,11 @@ systemd:
[Unit] [Unit]
Description=Delete Kubernetes node on shutdown Description=Delete Kubernetes node on shutdown
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
Type=oneshot Type=oneshot
RemainAfterExit=true RemainAfterExit=true
ExecStart=/bin/true ExecStart=/bin/true
ExecStop=/bin/bash -c '/usr/bin/podman run --volume /etc/kubernetes:/etc/kubernetes:ro,z --entrypoint /usr/local/bin/kubectl quay.io/poseidon/kubelet:v1.19.0 --kubeconfig=/etc/kubernetes/kubeconfig delete node $HOSTNAME' ExecStop=/bin/bash -c '/usr/bin/podman run --volume /var/lib/kubelet:/var/lib/kubelet:ro,z --entrypoint /usr/local/bin/kubectl $${KUBELET_IMAGE} --kubeconfig=/var/lib/kubelet/kubeconfig delete node $HOSTNAME'
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
storage: storage:

View File

@ -108,3 +108,17 @@ variable "node_labels" {
description = "List of initial node labels" description = "List of initial node labels"
default = [] default = []
} }
variable "node_taints" {
type = list(string)
description = "List of initial node taints"
default = []
}
# unofficial, undocumented, unsupported
variable "arch" {
type = string
description = "Container architecture (amd64 or arm64)"
default = "amd64"
}

View File

@ -44,7 +44,7 @@ resource "aws_autoscaling_group" "workers" {
# Worker template # Worker template
resource "aws_launch_configuration" "worker" { resource "aws_launch_configuration" "worker" {
image_id = data.aws_ami.fedora-coreos.image_id image_id = var.arch == "arm64" ? data.aws_ami.fedora-coreos-arm.image_id : data.aws_ami.fedora-coreos.image_id
instance_type = var.instance_type instance_type = var.instance_type
spot_price = var.spot_price > 0 ? var.spot_price : null spot_price = var.spot_price > 0 ? var.spot_price : null
enable_monitoring = false enable_monitoring = false
@ -86,6 +86,7 @@ data "template_file" "worker-config" {
cluster_dns_service_ip = cidrhost(var.service_cidr, 10) cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix cluster_domain_suffix = var.cluster_domain_suffix
node_labels = join(",", var.node_labels) node_labels = join(",", var.node_labels)
node_taints = join(",", var.node_taints)
} }
} }

View File

@ -11,13 +11,13 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.19.0 (upstream) * Kubernetes v1.19.4 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/flatcar-linux/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, CSI, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, CSI, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs ## Docs
Please see the [official docs](https://typhoon.psdn.io) and the AWS [tutorial](https://typhoon.psdn.io/cl/aws/). Please see the [official docs](https://typhoon.psdn.io) and the AWS [tutorial](https://typhoon.psdn.io/flatcar-linux/aws/).

View File

@ -0,0 +1,27 @@
locals {
# Pick a Flatcar Linux AMI
# flatcar-stable -> Flatcar Linux AMI
ami_id = data.aws_ami.flatcar.image_id
channel = split("-", var.os_image)[1]
}
data "aws_ami" "flatcar" {
most_recent = true
owners = ["075585003325"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-${local.channel}-*"]
}
}

View File

@ -1,11 +1,10 @@
# Kubernetes assets (kubeconfig, manifests) # Kubernetes assets (kubeconfig, manifests)
module "bootstrap" { module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=79343f02aea7c69bb03dab2051aa95248c0471d7" source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=49216ab82c236520204c4c85c8e52edbd722e1f4"
cluster_name = var.cluster_name cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)] api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]
etcd_servers = aws_route53_record.etcds.*.fqdn etcd_servers = aws_route53_record.etcds.*.fqdn
asset_dir = var.asset_dir
networking = var.networking networking = var.networking
network_mtu = var.network_mtu network_mtu = var.network_mtu
pod_cidr = var.pod_cidr pod_cidr = var.pod_cidr

View File

@ -3,30 +3,31 @@ systemd:
units: units:
- name: etcd-member.service - name: etcd-member.service
enabled: true enabled: true
dropins: contents: |
- name: 40-etcd-cluster.conf [Unit]
contents: | Description=etcd (System Container)
[Service] Documentation=https://github.com/etcd-io/etcd
Environment="ETCD_IMAGE_TAG=v3.4.12" Requires=docker.service
Environment="ETCD_IMAGE_URL=docker://quay.io/coreos/etcd" After=docker.service
Environment="RKT_RUN_ARGS=--insecure-options=image" [Service]
Environment="ETCD_NAME=${etcd_name}" Environment=ETCD_IMAGE=quay.io/coreos/etcd:v3.4.12
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379" ExecStartPre=/usr/bin/docker run -d \
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380" --name etcd \
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379" --network host \
Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380" --env-file /etc/etcd/etcd.env \
Environment="ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381" --user 232:232 \
Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}" --volume /etc/ssl/etcd:/etc/ssl/certs:ro \
Environment="ETCD_STRICT_RECONFIG_CHECK=true" --volume /var/lib/etcd:/var/lib/etcd:rw \
Environment="ETCD_SSL_DIR=/etc/ssl/etcd" $${ETCD_IMAGE}
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt" ExecStart=docker logs -f etcd
Environment="ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt" ExecStop=docker stop etcd
Environment="ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key" ExecStopPost=docker rm etcd
Environment="ETCD_CLIENT_CERT_AUTH=true" Restart=always
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt" RestartSec=10s
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt" TimeoutStartSec=0
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key" LimitNOFILE=40000
Environment="ETCD_PEER_CLIENT_CERT_AUTH=true" [Install]
WantedBy=multi-user.target
- name: docker.service - name: docker.service
enabled: true enabled: true
- name: locksmithd.service - name: locksmithd.service
@ -49,10 +50,12 @@ systemd:
enabled: true enabled: true
contents: | contents: |
[Unit] [Unit]
Description=Kubelet Description=Kubelet (System Container)
Requires=docker.service
After=docker.service
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=docker://quay.io/poseidon/kubelet:v1.19.0 Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver} Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
@ -60,39 +63,24 @@ systemd:
ExecStartPre=/bin/mkdir -p /var/lib/calico ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt" ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid ExecStartPre=/usr/bin/docker run -d \
ExecStart=/usr/bin/rkt run \ --name kubelet \
--uuid-file-save=/var/cache/kubelet-pod.uuid \ --privileged \
--stage1-from-dir=stage1-fly.aci \ --pid host \
--hosts-entry host \ --network host \
--insecure-options=image \ -v /etc/kubernetes:/etc/kubernetes:ro \
--volume etc-kubernetes,kind=host,source=/etc/kubernetes,readOnly=true \ -v /etc/machine-id:/etc/machine-id:ro \
--mount volume=etc-kubernetes,target=/etc/kubernetes \ -v /usr/lib/os-release:/etc/os-release:ro \
--volume etc-machine-id,kind=host,source=/etc/machine-id,readOnly=true \ -v /lib/modules:/lib/modules:ro \
--mount volume=etc-machine-id,target=/etc/machine-id \ -v /run:/run \
--volume etc-os-release,kind=host,source=/usr/lib/os-release,readOnly=true \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
--mount volume=etc-os-release,target=/etc/os-release \ -v /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume=etc-resolv,kind=host,source=/etc/resolv.conf,readOnly=true \ -v /var/lib/calico:/var/lib/calico:ro \
--mount volume=etc-resolv,target=/etc/resolv.conf \ -v /var/lib/docker:/var/lib/docker \
--volume etc-ssl-certs,kind=host,source=/etc/ssl/certs,readOnly=true \ -v /var/lib/kubelet:/var/lib/kubelet:rshared \
--mount volume=etc-ssl-certs,target=/etc/ssl/certs \ -v /var/log:/var/log \
--volume lib-modules,kind=host,source=/lib/modules,readOnly=true \ -v /opt/cni/bin:/opt/cni/bin \
--mount volume=lib-modules,target=/lib/modules \ $${KUBELET_IMAGE} \
--volume run,kind=host,source=/run \
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
--volume var-lib-kubelet,kind=host,source=/var/lib/kubelet,recursive=true \
--mount volume=var-lib-kubelet,target=/var/lib/kubelet \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
$${KUBELET_IMAGE} -- \
--anonymous-auth=false \ --anonymous-auth=false \
--authentication-token-webhook \ --authentication-token-webhook \
--authorization-mode=Webhook \ --authorization-mode=Webhook \
@ -111,7 +99,9 @@ systemd:
--register-with-taints=node-role.kubernetes.io/controller=:NoSchedule \ --register-with-taints=node-role.kubernetes.io/controller=:NoSchedule \
--rotate-certificates \ --rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins --volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid ExecStart=docker logs -f kubelet
ExecStop=docker stop kubelet
ExecStopPost=docker rm kubelet
Restart=always Restart=always
RestartSec=10 RestartSec=10
[Install] [Install]
@ -120,24 +110,20 @@ systemd:
contents: | contents: |
[Unit] [Unit]
Description=Kubernetes control plane Description=Kubernetes control plane
Wants=docker.service
After=docker.service
ConditionPathExists=!/opt/bootstrap/bootstrap.done ConditionPathExists=!/opt/bootstrap/bootstrap.done
[Service] [Service]
Type=oneshot Type=oneshot
RemainAfterExit=true RemainAfterExit=true
WorkingDirectory=/opt/bootstrap WorkingDirectory=/opt/bootstrap
ExecStart=/usr/bin/rkt run \ Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
--trust-keys-from-https \ ExecStart=/usr/bin/docker run \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \ -v /etc/kubernetes/bootstrap-secrets:/etc/kubernetes/secrets:ro \
--mount volume=config,target=/etc/kubernetes/secrets \ -v /opt/bootstrap/assets:/assets:ro \
--volume assets,kind=host,source=/opt/bootstrap/assets \ -v /opt/bootstrap/apply:/apply:ro \
--mount volume=assets,target=/assets \ --entrypoint=/apply \
--volume script,kind=host,source=/opt/bootstrap/apply \ $${KUBELET_IMAGE}
--mount volume=script,target=/apply \
--insecure-options=image \
docker://quay.io/poseidon/kubelet:v1.19.0 \
--net=host \
--dns=host \
--exec=/apply
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
@ -198,6 +184,28 @@ storage:
contents: contents:
inline: | inline: |
fs.inotify.max_user_watches=16184 fs.inotify.max_user_watches=16184
- path: /etc/etcd/etcd.env
filesystem: root
mode: 0644
contents:
inline: |
ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381
ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}
ETCD_STRICT_RECONFIG_CHECK=true
ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt
ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt
ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true
passwd: passwd:
users: users:
- name: core - name: core

View File

@ -67,7 +67,7 @@ data "template_file" "controller-configs" {
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}" etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,... # etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
etcd_initial_cluster = join(",", data.template_file.etcds.*.rendered) etcd_initial_cluster = join(",", data.template_file.etcds.*.rendered)
cgroup_driver = local.flavor == "flatcar" && local.channel == "edge" ? "systemd" : "cgroupfs" cgroup_driver = local.channel == "edge" ? "systemd" : "cgroupfs"
kubeconfig = indent(10, module.bootstrap.kubeconfig-kubelet) kubeconfig = indent(10, module.bootstrap.kubeconfig-kubelet)
ssh_authorized_key = var.ssh_authorized_key ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10) cluster_dns_service_ip = cidrhost(var.service_cidr, 10)

View File

@ -52,3 +52,9 @@ output "worker_target_group_https" {
value = module.workers.target_group_https value = module.workers.target_group_https
} }
# Outputs for debug
output "assets_dist" {
value = module.bootstrap.assets_dist
}

View File

@ -43,7 +43,7 @@ variable "worker_type" {
variable "os_image" { variable "os_image" {
type = string type = string
description = "AMI channel for a Container Linux derivative (coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)" description = "AMI channel for a Container Linux derivative (flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)"
default = "flatcar-stable" default = "flatcar-stable"
} }
@ -149,12 +149,6 @@ variable "worker_node_labels" {
# unofficial, undocumented, unsupported # unofficial, undocumented, unsupported
variable "asset_dir" {
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "cluster_domain_suffix" { variable "cluster_domain_suffix" {
type = string type = string
description = "Queries for domains with the suffix will be answered by CoreDNS. Default is cluster.local (e.g. foo.default.svc.cluster.local)" description = "Queries for domains with the suffix will be answered by CoreDNS. Default is cluster.local (e.g. foo.default.svc.cluster.local)"

View File

@ -0,0 +1,27 @@
locals {
# Pick a Flatcar Linux AMI
# flatcar-stable -> Flatcar Linux AMI
ami_id = data.aws_ami.flatcar.image_id
channel = split("-", var.os_image)[1]
}
data "aws_ami" "flatcar" {
most_recent = true
owners = ["075585003325"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-${local.channel}-*"]
}
}

View File

@ -0,0 +1,117 @@
---
systemd:
units:
- name: docker.service
enabled: true
- name: locksmithd.service
mask: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet
Requires=docker.service
After=docker.service
Wants=rpc-statd.service
[Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
# Podman, rkt, or runc run container processes, whereas docker run
# is a client to a daemon and requires workarounds to use within a
# systemd unit. https://github.com/moby/moby/issues/6791
ExecStartPre=/usr/bin/docker run -d \
--name kubelet \
--privileged \
--pid host \
--network host \
-v /etc/kubernetes:/etc/kubernetes:ro \
-v /etc/machine-id:/etc/machine-id:ro \
-v /usr/lib/os-release:/etc/os-release:ro \
-v /lib/modules:/lib/modules:ro \
-v /run:/run \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-v /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
-v /var/lib/calico:/var/lib/calico:ro \
-v /var/lib/docker:/var/lib/docker \
-v /var/lib/kubelet:/var/lib/kubelet:rshared \
-v /var/log:/var/log \
-v /opt/cni/bin:/opt/cni/bin \
$${KUBELET_IMAGE} \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--cgroup-driver=$${KUBELET_CGROUP_DRIVER} \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--healthz-port=0 \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{~ for label in split(",", node_labels) ~}
--node-labels=${label} \
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStart=docker logs -f kubelet
ExecStop=docker stop kubelet
ExecStopPost=docker rm kubelet
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
- name: delete-node.service
enabled: true
contents: |
[Unit]
Description=Delete Kubernetes node on shutdown
[Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash -c '/usr/bin/docker run -v /var/lib/kubelet:/var/lib/kubelet:ro --entrypoint /usr/local/bin/kubectl $${KUBELET_IMAGE} --kubeconfig=/var/lib/kubelet/kubeconfig delete node $HOSTNAME'
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubeconfig
filesystem: root
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
mode: 0644
contents:
inline: |
fs.inotify.max_user_watches=16184
passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_key}"

View File

@ -36,7 +36,7 @@ variable "instance_type" {
variable "os_image" { variable "os_image" {
type = string type = string
description = "AMI channel for a Container Linux derivative (coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)" description = "AMI channel for a Container Linux derivative (flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)"
default = "flatcar-stable" default = "flatcar-stable"
} }

View File

@ -85,7 +85,7 @@ data "template_file" "worker-config" {
ssh_authorized_key = var.ssh_authorized_key ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10) cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix cluster_domain_suffix = var.cluster_domain_suffix
cgroup_driver = local.flavor == "flatcar" && local.channel == "edge" ? "systemd" : "cgroupfs" cgroup_driver = local.channel == "edge" ? "systemd" : "cgroupfs"
node_labels = join(",", var.node_labels) node_labels = join(",", var.node_labels)
} }
} }

View File

@ -1,140 +0,0 @@
---
systemd:
units:
- name: docker.service
enabled: true
- name: locksmithd.service
mask: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet
Wants=rpc-statd.service
[Service]
Environment=KUBELET_IMAGE=docker://quay.io/poseidon/kubelet:v1.19.0
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/bin/rkt run \
--uuid-file-save=/var/cache/kubelet-pod.uuid \
--stage1-from-dir=stage1-fly.aci \
--hosts-entry host \
--insecure-options=image \
--volume etc-kubernetes,kind=host,source=/etc/kubernetes,readOnly=true \
--mount volume=etc-kubernetes,target=/etc/kubernetes \
--volume etc-machine-id,kind=host,source=/etc/machine-id,readOnly=true \
--mount volume=etc-machine-id,target=/etc/machine-id \
--volume etc-os-release,kind=host,source=/usr/lib/os-release,readOnly=true \
--mount volume=etc-os-release,target=/etc/os-release \
--volume=etc-resolv,kind=host,source=/etc/resolv.conf,readOnly=true \
--mount volume=etc-resolv,target=/etc/resolv.conf \
--volume etc-ssl-certs,kind=host,source=/etc/ssl/certs,readOnly=true \
--mount volume=etc-ssl-certs,target=/etc/ssl/certs \
--volume lib-modules,kind=host,source=/lib/modules,readOnly=true \
--mount volume=lib-modules,target=/lib/modules \
--volume run,kind=host,source=/run \
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
--volume var-lib-kubelet,kind=host,source=/var/lib/kubelet,recursive=true \
--mount volume=var-lib-kubelet,target=/var/lib/kubelet \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
$${KUBELET_IMAGE} -- \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--cgroup-driver=$${KUBELET_CGROUP_DRIVER} \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--healthz-port=0 \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{~ for label in split(",", node_labels) ~}
--node-labels=${label} \
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
- name: delete-node.service
enabled: true
contents: |
[Unit]
Description=Waiting to delete Kubernetes node on shutdown
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/etc/kubernetes/delete-node
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubeconfig
filesystem: root
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
mode: 0644
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/kubernetes/delete-node
filesystem: root
mode: 0744
contents:
inline: |
#!/bin/bash
set -e
exec /usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://quay.io/poseidon/kubelet:v1.19.0 \
--net=host \
--dns=host \
--exec=/usr/local/bin/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname | tr '[:upper:]' '[:lower:]')
passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_key}"

View File

@ -11,10 +11,10 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.19.0 (upstream) * Kubernetes v1.19.4 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot priority](https://typhoon.psdn.io/fedora-coreos/azure/#low-priority) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot priority](https://typhoon.psdn.io/fedora-coreos/azure/#low-priority) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs ## Docs

View File

@ -1,11 +1,10 @@
# Kubernetes assets (kubeconfig, manifests) # Kubernetes assets (kubeconfig, manifests)
module "bootstrap" { module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=79343f02aea7c69bb03dab2051aa95248c0471d7" source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=49216ab82c236520204c4c85c8e52edbd722e1f4"
cluster_name = var.cluster_name cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)] api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]
etcd_servers = formatlist("%s.%s", azurerm_dns_a_record.etcds.*.name, var.dns_zone) etcd_servers = formatlist("%s.%s", azurerm_dns_a_record.etcds.*.name, var.dns_zone)
asset_dir = var.asset_dir
networking = var.networking networking = var.networking

View File

@ -1,6 +1,6 @@
--- ---
variant: fcos variant: fcos
version: 1.0.0 version: 1.1.0
systemd: systemd:
units: units:
- name: etcd-member.service - name: etcd-member.service
@ -8,28 +8,25 @@ systemd:
contents: | contents: |
[Unit] [Unit]
Description=etcd (System Container) Description=etcd (System Container)
Documentation=https://github.com/coreos/etcd Documentation=https://github.com/etcd-io/etcd
Wants=network-online.target network.target Wants=network-online.target network.target
After=network-online.target After=network-online.target
[Service] [Service]
# https://github.com/opencontainers/runc/pull/1807 Environment=ETCD_IMAGE=quay.io/coreos/etcd:v3.4.12
# Type=notify
# NotifyAccess=exec
Type=exec Type=exec
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
ExecStartPre=/bin/mkdir -p /var/lib/etcd ExecStartPre=/bin/mkdir -p /var/lib/etcd
ExecStartPre=-/usr/bin/podman rm etcd ExecStartPre=-/usr/bin/podman rm etcd
#--volume $${NOTIFY_SOCKET}:/run/systemd/notify \
ExecStart=/usr/bin/podman run --name etcd \ ExecStart=/usr/bin/podman run --name etcd \
--env-file /etc/etcd/etcd.env \ --env-file /etc/etcd/etcd.env \
--network host \ --network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \ --volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \ --volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.4.12 $${ETCD_IMAGE}
ExecStop=/usr/bin/podman stop etcd ExecStop=/usr/bin/podman stop etcd
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
- name: docker.service - name: docker.service
@ -54,7 +51,7 @@ systemd:
Description=Kubelet (System Container) Description=Kubelet (System Container)
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.0 Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
@ -68,12 +65,10 @@ systemd:
--network host \ --network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \ --volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \ --volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \ --volume /lib/modules:/lib/modules:ro \
--volume /run:/run \ --volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \ --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \ --volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico:ro \ --volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \ --volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \ --volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
@ -123,7 +118,7 @@ systemd:
--volume /opt/bootstrap/assets:/assets:ro,Z \ --volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \ --volume /opt/bootstrap/apply:/apply:ro,Z \
--entrypoint=/apply \ --entrypoint=/apply \
quay.io/poseidon/kubelet:v1.19.0 quay.io/poseidon/kubelet:v1.19.4
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap ExecStartPost=-/usr/bin/podman stop bootstrap
storage: storage:
@ -201,8 +196,6 @@ storage:
mode: 0644 mode: 0644
contents: contents:
inline: | inline: |
# TODO: Use a systemd dropin once podman v1.4.5 is avail.
NOTIFY_SOCKET=/run/systemd/notify
ETCD_NAME=${etcd_name} ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379 ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379

View File

@ -57,3 +57,10 @@ output "backend_address_pool_id" {
description = "ID of the worker backend address pool" description = "ID of the worker backend address pool"
value = azurerm_lb_backend_address_pool.worker.id value = azurerm_lb_backend_address_pool.worker.id
} }
# Outputs for debug
output "assets_dist" {
value = module.bootstrap.assets_dist
}

View File

@ -129,12 +129,6 @@ variable "worker_node_labels" {
# unofficial, undocumented, unsupported # unofficial, undocumented, unsupported
variable "asset_dir" {
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "cluster_domain_suffix" { variable "cluster_domain_suffix" {
type = string type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) " description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "

View File

@ -1,6 +1,6 @@
--- ---
variant: fcos variant: fcos
version: 1.0.0 version: 1.1.0
systemd: systemd:
units: units:
- name: docker.service - name: docker.service
@ -24,7 +24,7 @@ systemd:
Description=Kubelet (System Container) Description=Kubelet (System Container)
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.0 Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
@ -38,12 +38,10 @@ systemd:
--network host \ --network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \ --volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \ --volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \ --volume /lib/modules:/lib/modules:ro \
--volume /run:/run \ --volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \ --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \ --volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico:ro \ --volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \ --volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \ --volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
@ -85,10 +83,11 @@ systemd:
[Unit] [Unit]
Description=Delete Kubernetes node on shutdown Description=Delete Kubernetes node on shutdown
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
Type=oneshot Type=oneshot
RemainAfterExit=true RemainAfterExit=true
ExecStart=/bin/true ExecStart=/bin/true
ExecStop=/bin/bash -c '/usr/bin/podman run --volume /etc/kubernetes:/etc/kubernetes:ro,z --entrypoint /usr/local/bin/kubectl quay.io/poseidon/kubelet:v1.19.0 --kubeconfig=/etc/kubernetes/kubeconfig delete node $HOSTNAME' ExecStop=/bin/bash -c '/usr/bin/podman run --volume /var/lib/kubelet:/var/lib/kubelet:ro,z --entrypoint /usr/local/bin/kubectl $${KUBELET_IMAGE} --kubeconfig=/var/lib/kubelet/kubeconfig delete node $HOSTNAME'
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
storage: storage:

View File

@ -11,13 +11,13 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.19.0 (upstream) * Kubernetes v1.19.4 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [low-priority](https://typhoon.psdn.io/cl/azure/#low-priority) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [low-priority](https://typhoon.psdn.io/flatcar-linux/azure/#low-priority) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs ## Docs
Please see the [official docs](https://typhoon.psdn.io) and the Azure [tutorial](https://typhoon.psdn.io/cl/azure/). Please see the [official docs](https://typhoon.psdn.io) and the Azure [tutorial](https://typhoon.psdn.io/flatcar-linux/azure/).

View File

@ -1,11 +1,10 @@
# Kubernetes assets (kubeconfig, manifests) # Kubernetes assets (kubeconfig, manifests)
module "bootstrap" { module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=79343f02aea7c69bb03dab2051aa95248c0471d7" source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=49216ab82c236520204c4c85c8e52edbd722e1f4"
cluster_name = var.cluster_name cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)] api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]
etcd_servers = formatlist("%s.%s", azurerm_dns_a_record.etcds.*.name, var.dns_zone) etcd_servers = formatlist("%s.%s", azurerm_dns_a_record.etcds.*.name, var.dns_zone)
asset_dir = var.asset_dir
networking = var.networking networking = var.networking

View File

@ -3,30 +3,31 @@ systemd:
units: units:
- name: etcd-member.service - name: etcd-member.service
enabled: true enabled: true
dropins: contents: |
- name: 40-etcd-cluster.conf [Unit]
contents: | Description=etcd (System Container)
[Service] Documentation=https://github.com/etcd-io/etcd
Environment="ETCD_IMAGE_TAG=v3.4.12" Requires=docker.service
Environment="ETCD_IMAGE_URL=docker://quay.io/coreos/etcd" After=docker.service
Environment="RKT_RUN_ARGS=--insecure-options=image" [Service]
Environment="ETCD_NAME=${etcd_name}" Environment=ETCD_IMAGE=quay.io/coreos/etcd:v3.4.12
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379" ExecStartPre=/usr/bin/docker run -d \
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380" --name etcd \
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379" --network host \
Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380" --env-file /etc/etcd/etcd.env \
Environment="ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381" --user 232:232 \
Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}" --volume /etc/ssl/etcd:/etc/ssl/certs:ro \
Environment="ETCD_STRICT_RECONFIG_CHECK=true" --volume /var/lib/etcd:/var/lib/etcd:rw \
Environment="ETCD_SSL_DIR=/etc/ssl/etcd" $${ETCD_IMAGE}
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt" ExecStart=docker logs -f etcd
Environment="ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt" ExecStop=docker stop etcd
Environment="ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key" ExecStopPost=docker rm etcd
Environment="ETCD_CLIENT_CERT_AUTH=true" Restart=always
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt" RestartSec=10s
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt" TimeoutStartSec=0
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key" LimitNOFILE=40000
Environment="ETCD_PEER_CLIENT_CERT_AUTH=true" [Install]
WantedBy=multi-user.target
- name: docker.service - name: docker.service
enabled: true enabled: true
- name: locksmithd.service - name: locksmithd.service
@ -49,10 +50,12 @@ systemd:
enabled: true enabled: true
contents: | contents: |
[Unit] [Unit]
Description=Kubelet Description=Kubelet (System Container)
Requires=docker.service
After=docker.service
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=docker://quay.io/poseidon/kubelet:v1.19.0 Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver} Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
@ -60,39 +63,24 @@ systemd:
ExecStartPre=/bin/mkdir -p /var/lib/calico ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt" ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid ExecStartPre=/usr/bin/docker run -d \
ExecStart=/usr/bin/rkt run \ --name kubelet \
--uuid-file-save=/var/cache/kubelet-pod.uuid \ --privileged \
--stage1-from-dir=stage1-fly.aci \ --pid host \
--hosts-entry host \ --network host \
--insecure-options=image \ -v /etc/kubernetes:/etc/kubernetes:ro \
--volume etc-kubernetes,kind=host,source=/etc/kubernetes,readOnly=true \ -v /etc/machine-id:/etc/machine-id:ro \
--mount volume=etc-kubernetes,target=/etc/kubernetes \ -v /usr/lib/os-release:/etc/os-release:ro \
--volume etc-machine-id,kind=host,source=/etc/machine-id,readOnly=true \ -v /lib/modules:/lib/modules:ro \
--mount volume=etc-machine-id,target=/etc/machine-id \ -v /run:/run \
--volume etc-os-release,kind=host,source=/usr/lib/os-release,readOnly=true \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
--mount volume=etc-os-release,target=/etc/os-release \ -v /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume=etc-resolv,kind=host,source=/etc/resolv.conf,readOnly=true \ -v /var/lib/calico:/var/lib/calico:ro \
--mount volume=etc-resolv,target=/etc/resolv.conf \ -v /var/lib/docker:/var/lib/docker \
--volume etc-ssl-certs,kind=host,source=/etc/ssl/certs,readOnly=true \ -v /var/lib/kubelet:/var/lib/kubelet:rshared \
--mount volume=etc-ssl-certs,target=/etc/ssl/certs \ -v /var/log:/var/log \
--volume lib-modules,kind=host,source=/lib/modules,readOnly=true \ -v /opt/cni/bin:/opt/cni/bin \
--mount volume=lib-modules,target=/lib/modules \ $${KUBELET_IMAGE} \
--volume run,kind=host,source=/run \
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
--volume var-lib-kubelet,kind=host,source=/var/lib/kubelet,recursive=true \
--mount volume=var-lib-kubelet,target=/var/lib/kubelet \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
$${KUBELET_IMAGE} -- \
--anonymous-auth=false \ --anonymous-auth=false \
--authentication-token-webhook \ --authentication-token-webhook \
--authorization-mode=Webhook \ --authorization-mode=Webhook \
@ -111,7 +99,9 @@ systemd:
--register-with-taints=node-role.kubernetes.io/controller=:NoSchedule \ --register-with-taints=node-role.kubernetes.io/controller=:NoSchedule \
--rotate-certificates \ --rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins --volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid ExecStart=docker logs -f kubelet
ExecStop=docker stop kubelet
ExecStopPost=docker rm kubelet
Restart=always Restart=always
RestartSec=10 RestartSec=10
[Install] [Install]
@ -120,24 +110,20 @@ systemd:
contents: | contents: |
[Unit] [Unit]
Description=Kubernetes control plane Description=Kubernetes control plane
Wants=docker.service
After=docker.service
ConditionPathExists=!/opt/bootstrap/bootstrap.done ConditionPathExists=!/opt/bootstrap/bootstrap.done
[Service] [Service]
Type=oneshot Type=oneshot
RemainAfterExit=true RemainAfterExit=true
WorkingDirectory=/opt/bootstrap WorkingDirectory=/opt/bootstrap
ExecStart=/usr/bin/rkt run \ Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
--trust-keys-from-https \ ExecStart=/usr/bin/docker run \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \ -v /etc/kubernetes/bootstrap-secrets:/etc/kubernetes/secrets:ro \
--mount volume=config,target=/etc/kubernetes/secrets \ -v /opt/bootstrap/assets:/assets:ro \
--volume assets,kind=host,source=/opt/bootstrap/assets \ -v /opt/bootstrap/apply:/apply:ro \
--mount volume=assets,target=/assets \ --entrypoint=/apply \
--volume script,kind=host,source=/opt/bootstrap/apply \ $${KUBELET_IMAGE}
--mount volume=script,target=/apply \
--insecure-options=image \
docker://quay.io/poseidon/kubelet:v1.19.0 \
--net=host \
--dns=host \
--exec=/apply
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
@ -198,6 +184,28 @@ storage:
contents: contents:
inline: | inline: |
fs.inotify.max_user_watches=16184 fs.inotify.max_user_watches=16184
- path: /etc/etcd/etcd.env
filesystem: root
mode: 0644
contents:
inline: |
ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381
ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}
ETCD_STRICT_RECONFIG_CHECK=true
ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt
ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt
ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true
passwd: passwd:
users: users:
- name: core - name: core

View File

@ -16,9 +16,7 @@ resource "azurerm_dns_a_record" "etcds" {
locals { locals {
# Container Linux derivative # Container Linux derivative
# coreos-stable -> Container Linux Stable
# flatcar-stable -> Flatcar Linux Stable # flatcar-stable -> Flatcar Linux Stable
flavor = split("-", var.os_image)[0]
channel = split("-", var.os_image)[1] channel = split("-", var.os_image)[1]
} }
@ -53,23 +51,18 @@ resource "azurerm_linux_virtual_machine" "controllers" {
storage_account_type = "Premium_LRS" storage_account_type = "Premium_LRS"
} }
# CoreOS Container Linux or Flatcar Container Linux # Flatcar Container Linux
source_image_reference { source_image_reference {
publisher = local.flavor == "flatcar" ? "Kinvolk" : "CoreOS" publisher = "Kinvolk"
offer = local.flavor == "flatcar" ? "flatcar-container-linux-free" : "CoreOS" offer = "flatcar-container-linux-free"
sku = local.channel sku = local.channel
version = "latest" version = "latest"
} }
# Gross hack for Flatcar Linux plan {
dynamic "plan" { name = local.channel
for_each = local.flavor == "flatcar" ? [1] : [] publisher = "kinvolk"
product = "flatcar-container-linux-free"
content {
name = local.channel
publisher = "kinvolk"
product = "flatcar-container-linux-free"
}
} }
# network # network
@ -157,7 +150,7 @@ data "template_file" "controller-configs" {
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}" etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,... # etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
etcd_initial_cluster = join(",", data.template_file.etcds.*.rendered) etcd_initial_cluster = join(",", data.template_file.etcds.*.rendered)
cgroup_driver = local.flavor == "flatcar" && local.channel == "edge" ? "systemd" : "cgroupfs" cgroup_driver = local.channel == "edge" ? "systemd" : "cgroupfs"
kubeconfig = indent(10, module.bootstrap.kubeconfig-kubelet) kubeconfig = indent(10, module.bootstrap.kubeconfig-kubelet)
ssh_authorized_key = var.ssh_authorized_key ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10) cluster_dns_service_ip = cidrhost(var.service_cidr, 10)

View File

@ -57,3 +57,10 @@ output "backend_address_pool_id" {
description = "ID of the worker backend address pool" description = "ID of the worker backend address pool"
value = azurerm_lb_backend_address_pool.worker.id value = azurerm_lb_backend_address_pool.worker.id
} }
# Outputs for debug
output "assets_dist" {
value = module.bootstrap.assets_dist
}

View File

@ -48,7 +48,7 @@ variable "worker_type" {
variable "os_image" { variable "os_image" {
type = string type = string
description = "Channel for a Container Linux derivative (flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge, coreos-stable, coreos-beta, coreos-alpha)" description = "Channel for a Container Linux derivative (flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)"
default = "flatcar-stable" default = "flatcar-stable"
} }
@ -130,12 +130,6 @@ variable "worker_node_labels" {
# unofficial, undocumented, unsupported # unofficial, undocumented, unsupported
variable "asset_dir" {
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "cluster_domain_suffix" { variable "cluster_domain_suffix" {
type = string type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) " description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "

View File

@ -0,0 +1,117 @@
---
systemd:
units:
- name: docker.service
enabled: true
- name: locksmithd.service
mask: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet
Requires=docker.service
After=docker.service
Wants=rpc-statd.service
[Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
# Podman, rkt, or runc run container processes, whereas docker run
# is a client to a daemon and requires workarounds to use within a
# systemd unit. https://github.com/moby/moby/issues/6791
ExecStartPre=/usr/bin/docker run -d \
--name kubelet \
--privileged \
--pid host \
--network host \
-v /etc/kubernetes:/etc/kubernetes:ro \
-v /etc/machine-id:/etc/machine-id:ro \
-v /usr/lib/os-release:/etc/os-release:ro \
-v /lib/modules:/lib/modules:ro \
-v /run:/run \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-v /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
-v /var/lib/calico:/var/lib/calico:ro \
-v /var/lib/docker:/var/lib/docker \
-v /var/lib/kubelet:/var/lib/kubelet:rshared \
-v /var/log:/var/log \
-v /opt/cni/bin:/opt/cni/bin \
$${KUBELET_IMAGE} \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--cgroup-driver=$${KUBELET_CGROUP_DRIVER} \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--healthz-port=0 \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{~ for label in split(",", node_labels) ~}
--node-labels=${label} \
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStart=docker logs -f kubelet
ExecStop=docker stop kubelet
ExecStopPost=docker rm kubelet
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
- name: delete-node.service
enabled: true
contents: |
[Unit]
Description=Delete Kubernetes node on shutdown
[Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash -c '/usr/bin/docker run -v /var/lib/kubelet:/var/lib/kubelet:ro --entrypoint /usr/local/bin/kubectl $${KUBELET_IMAGE} --kubeconfig=/var/lib/kubelet/kubeconfig delete node $HOSTNAME'
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubeconfig
filesystem: root
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
mode: 0644
contents:
inline: |
fs.inotify.max_user_watches=16184
passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_key}"

View File

@ -46,7 +46,7 @@ variable "vm_type" {
variable "os_image" { variable "os_image" {
type = string type = string
description = "Channel for a Container Linux derivative (flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge, coreos-stable, coreos-beta, coreos-alpha)" description = "Channel for a Container Linux derivative (flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)"
default = "flatcar-stable" default = "flatcar-stable"
} }

View File

@ -1,7 +1,5 @@
locals { locals {
# coreos-stable -> Container Linux Stable
# flatcar-stable -> Flatcar Linux Stable # flatcar-stable -> Flatcar Linux Stable
flavor = split("-", var.os_image)[0]
channel = split("-", var.os_image)[1] channel = split("-", var.os_image)[1]
} }
@ -24,23 +22,18 @@ resource "azurerm_linux_virtual_machine_scale_set" "workers" {
caching = "ReadWrite" caching = "ReadWrite"
} }
# CoreOS Container Linux or Flatcar Container Linux # Flatcar Container Linux
source_image_reference { source_image_reference {
publisher = local.flavor == "flatcar" ? "Kinvolk" : "CoreOS" publisher = "Kinvolk"
offer = local.flavor == "flatcar" ? "flatcar-container-linux-free" : "CoreOS" offer = "flatcar-container-linux-free"
sku = local.channel sku = local.channel
version = "latest" version = "latest"
} }
# Gross hack for Flatcar Linux plan {
dynamic "plan" { name = local.channel
for_each = local.flavor == "flatcar" ? [1] : [] publisher = "kinvolk"
product = "flatcar-container-linux-free"
content {
name = local.channel
publisher = "kinvolk"
product = "flatcar-container-linux-free"
}
} }
# Azure requires setting admin_ssh_key, though Ignition custom_data handles it too # Azure requires setting admin_ssh_key, though Ignition custom_data handles it too
@ -111,7 +104,7 @@ data "template_file" "worker-config" {
ssh_authorized_key = var.ssh_authorized_key ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10) cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix cluster_domain_suffix = var.cluster_domain_suffix
cgroup_driver = local.flavor == "flatcar" && local.channel == "edge" ? "systemd" : "cgroupfs" cgroup_driver = local.channel == "edge" ? "systemd" : "cgroupfs"
node_labels = join(",", var.node_labels) node_labels = join(",", var.node_labels)
} }
} }

View File

@ -1,4 +0,0 @@
output "kubeconfig-admin" {
value = module.bootstrap.kubeconfig-admin
}

View File

@ -11,10 +11,10 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.19.0 (upstream) * Kubernetes v1.19.4 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs ## Docs

View File

@ -1,11 +1,10 @@
# Kubernetes assets (kubeconfig, manifests) # Kubernetes assets (kubeconfig, manifests)
module "bootstrap" { module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=79343f02aea7c69bb03dab2051aa95248c0471d7" source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=49216ab82c236520204c4c85c8e52edbd722e1f4"
cluster_name = var.cluster_name cluster_name = var.cluster_name
api_servers = [var.k8s_domain_name] api_servers = [var.k8s_domain_name]
etcd_servers = var.controllers.*.domain etcd_servers = var.controllers.*.domain
asset_dir = var.asset_dir
networking = var.networking networking = var.networking
network_mtu = var.network_mtu network_mtu = var.network_mtu
network_ip_autodetection_method = var.network_ip_autodetection_method network_ip_autodetection_method = var.network_ip_autodetection_method

View File

@ -1,6 +1,6 @@
--- ---
variant: fcos variant: fcos
version: 1.0.0 version: 1.1.0
systemd: systemd:
units: units:
- name: etcd-member.service - name: etcd-member.service
@ -8,28 +8,25 @@ systemd:
contents: | contents: |
[Unit] [Unit]
Description=etcd (System Container) Description=etcd (System Container)
Documentation=https://github.com/coreos/etcd Documentation=https://github.com/etcd-io/etcd
Wants=network-online.target network.target Wants=network-online.target network.target
After=network-online.target After=network-online.target
[Service] [Service]
# https://github.com/opencontainers/runc/pull/1807 Environment=ETCD_IMAGE=quay.io/coreos/etcd:v3.4.12
# Type=notify
# NotifyAccess=exec
Type=exec Type=exec
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
ExecStartPre=/bin/mkdir -p /var/lib/etcd ExecStartPre=/bin/mkdir -p /var/lib/etcd
ExecStartPre=-/usr/bin/podman rm etcd ExecStartPre=-/usr/bin/podman rm etcd
#--volume $${NOTIFY_SOCKET}:/run/systemd/notify \
ExecStart=/usr/bin/podman run --name etcd \ ExecStart=/usr/bin/podman run --name etcd \
--env-file /etc/etcd/etcd.env \ --env-file /etc/etcd/etcd.env \
--network host \ --network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \ --volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \ --volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.4.12 $${ETCD_IMAGE}
ExecStop=/usr/bin/podman stop etcd ExecStop=/usr/bin/podman stop etcd
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
- name: docker.service - name: docker.service
@ -53,7 +50,7 @@ systemd:
Description=Kubelet (System Container) Description=Kubelet (System Container)
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.0 Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
@ -67,12 +64,10 @@ systemd:
--network host \ --network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \ --volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \ --volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \ --volume /lib/modules:/lib/modules:ro \
--volume /run:/run \ --volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \ --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \ --volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico:ro \ --volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \ --volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \ --volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
@ -134,7 +129,7 @@ systemd:
--volume /opt/bootstrap/assets:/assets:ro,Z \ --volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \ --volume /opt/bootstrap/apply:/apply:ro,Z \
--entrypoint=/apply \ --entrypoint=/apply \
quay.io/poseidon/kubelet:v1.19.0 quay.io/poseidon/kubelet:v1.19.4
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap ExecStartPost=-/usr/bin/podman stop bootstrap
storage: storage:
@ -212,8 +207,6 @@ storage:
mode: 0644 mode: 0644
contents: contents:
inline: | inline: |
# TODO: Use a systemd dropin once podman v1.4.5 is avail.
NOTIFY_SOCKET=/run/systemd/notify
ETCD_NAME=${etcd_name} ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379 ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379

View File

@ -1,6 +1,6 @@
--- ---
variant: fcos variant: fcos
version: 1.0.0 version: 1.1.0
systemd: systemd:
units: units:
- name: docker.service - name: docker.service
@ -23,7 +23,7 @@ systemd:
Description=Kubelet (System Container) Description=Kubelet (System Container)
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.0 Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
@ -37,12 +37,10 @@ systemd:
--network host \ --network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \ --volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \ --volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \ --volume /lib/modules:/lib/modules:ro \
--volume /run:/run \ --volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \ --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \ --volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico:ro \ --volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \ --volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \ --volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \

View File

@ -2,3 +2,9 @@ output "kubeconfig-admin" {
value = module.bootstrap.kubeconfig-admin value = module.bootstrap.kubeconfig-admin
} }
# Outputs for debug
output "assets_dist" {
value = module.bootstrap.assets_dist
}

View File

@ -80,12 +80,6 @@ variable "ssh_authorized_key" {
description = "SSH public key for user 'core'" description = "SSH public key for user 'core'"
} }
variable "asset_dir" {
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "networking" { variable "networking" {
type = string type = string
description = "Choice of networking provider (flannel or calico)" description = "Choice of networking provider (flannel or calico)"

View File

@ -11,13 +11,13 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.19.0 (upstream) * Kubernetes v1.19.4 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs ## Docs
Please see the [official docs](https://typhoon.psdn.io) and the bare-metal [tutorial](https://typhoon.psdn.io/cl/bare-metal/). Please see the [official docs](https://typhoon.psdn.io) and the bare-metal [tutorial](https://typhoon.psdn.io/flatcar-linux/bare-metal/).

View File

@ -1,11 +1,10 @@
# Kubernetes assets (kubeconfig, manifests) # Kubernetes assets (kubeconfig, manifests)
module "bootstrap" { module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=79343f02aea7c69bb03dab2051aa95248c0471d7" source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=49216ab82c236520204c4c85c8e52edbd722e1f4"
cluster_name = var.cluster_name cluster_name = var.cluster_name
api_servers = [var.k8s_domain_name] api_servers = [var.k8s_domain_name]
etcd_servers = var.controllers.*.domain etcd_servers = var.controllers.*.domain
asset_dir = var.asset_dir
networking = var.networking networking = var.networking
network_mtu = var.network_mtu network_mtu = var.network_mtu
network_ip_autodetection_method = var.network_ip_autodetection_method network_ip_autodetection_method = var.network_ip_autodetection_method

View File

@ -3,30 +3,31 @@ systemd:
units: units:
- name: etcd-member.service - name: etcd-member.service
enabled: true enabled: true
dropins: contents: |
- name: 40-etcd-cluster.conf [Unit]
contents: | Description=etcd (System Container)
[Service] Documentation=https://github.com/etcd-io/etcd
Environment="ETCD_IMAGE_TAG=v3.4.12" Requires=docker.service
Environment="ETCD_IMAGE_URL=docker://quay.io/coreos/etcd" After=docker.service
Environment="RKT_RUN_ARGS=--insecure-options=image" [Service]
Environment="ETCD_NAME=${etcd_name}" Environment=ETCD_IMAGE=quay.io/coreos/etcd:v3.4.12
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379" ExecStartPre=/usr/bin/docker run -d \
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380" --name etcd \
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379" --network host \
Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380" --env-file /etc/etcd/etcd.env \
Environment="ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381" --user 232:232 \
Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}" --volume /etc/ssl/etcd:/etc/ssl/certs:ro \
Environment="ETCD_STRICT_RECONFIG_CHECK=true" --volume /var/lib/etcd:/var/lib/etcd:rw \
Environment="ETCD_SSL_DIR=/etc/ssl/etcd" $${ETCD_IMAGE}
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt" ExecStart=docker logs -f etcd
Environment="ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt" ExecStop=docker stop etcd
Environment="ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key" ExecStopPost=docker rm etcd
Environment="ETCD_CLIENT_CERT_AUTH=true" Restart=always
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt" RestartSec=10s
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt" TimeoutStartSec=0
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key" LimitNOFILE=40000
Environment="ETCD_PEER_CLIENT_CERT_AUTH=true" [Install]
WantedBy=multi-user.target
- name: docker.service - name: docker.service
enabled: true enabled: true
- name: locksmithd.service - name: locksmithd.service
@ -57,10 +58,12 @@ systemd:
- name: kubelet.service - name: kubelet.service
contents: | contents: |
[Unit] [Unit]
Description=Kubelet Description=Kubelet (System Container)
Requires=docker.service
After=docker.service
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=docker://quay.io/poseidon/kubelet:v1.19.0 Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver} Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
@ -68,43 +71,26 @@ systemd:
ExecStartPre=/bin/mkdir -p /var/lib/calico ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt" ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid ExecStartPre=/usr/bin/docker run -d \
ExecStart=/usr/bin/rkt run \ --name kubelet \
--uuid-file-save=/var/cache/kubelet-pod.uuid \ --privileged \
--stage1-from-dir=stage1-fly.aci \ --pid host \
--hosts-entry host \ --network host \
--insecure-options=image \ -v /etc/kubernetes:/etc/kubernetes:ro \
--volume etc-kubernetes,kind=host,source=/etc/kubernetes,readOnly=true \ -v /etc/machine-id:/etc/machine-id:ro \
--mount volume=etc-kubernetes,target=/etc/kubernetes \ -v /usr/lib/os-release:/etc/os-release:ro \
--volume etc-machine-id,kind=host,source=/etc/machine-id,readOnly=true \ -v /lib/modules:/lib/modules:ro \
--mount volume=etc-machine-id,target=/etc/machine-id \ -v /run:/run \
--volume etc-os-release,kind=host,source=/usr/lib/os-release,readOnly=true \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
--mount volume=etc-os-release,target=/etc/os-release \ -v /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume=etc-resolv,kind=host,source=/etc/resolv.conf,readOnly=true \ -v /var/lib/calico:/var/lib/calico:ro \
--mount volume=etc-resolv,target=/etc/resolv.conf \ -v /var/lib/docker:/var/lib/docker \
--volume etc-ssl-certs,kind=host,source=/etc/ssl/certs,readOnly=true \ -v /var/lib/kubelet:/var/lib/kubelet:rshared \
--mount volume=etc-ssl-certs,target=/etc/ssl/certs \ -v /var/log:/var/log \
--volume lib-modules,kind=host,source=/lib/modules,readOnly=true \ -v /opt/cni/bin:/opt/cni/bin \
--mount volume=lib-modules,target=/lib/modules \ -v /etc/iscsi:/etc/iscsi \
--volume run,kind=host,source=/run \ -v /usr/sbin/iscsiadm:/usr/sbin/iscsiadm \
--mount volume=run,target=/run \ $${KUBELET_IMAGE} \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
--volume var-lib-kubelet,kind=host,source=/var/lib/kubelet,recursive=true \
--mount volume=var-lib-kubelet,target=/var/lib/kubelet \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume etc-iscsi,kind=host,source=/etc/iscsi \
--mount volume=etc-iscsi,target=/etc/iscsi \
--volume usr-sbin-iscsiadm,kind=host,source=/usr/sbin/iscsiadm \
--mount volume=usr-sbin-iscsiadm,target=/sbin/iscsiadm \
$${KUBELET_IMAGE} -- \
--anonymous-auth=false \ --anonymous-auth=false \
--authentication-token-webhook \ --authentication-token-webhook \
--authorization-mode=Webhook \ --authorization-mode=Webhook \
@ -124,7 +110,9 @@ systemd:
--register-with-taints=node-role.kubernetes.io/controller=:NoSchedule \ --register-with-taints=node-role.kubernetes.io/controller=:NoSchedule \
--rotate-certificates \ --rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins --volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid ExecStart=docker logs -f kubelet
ExecStop=docker stop kubelet
ExecStopPost=docker rm kubelet
Restart=always Restart=always
RestartSec=10 RestartSec=10
[Install] [Install]
@ -133,24 +121,20 @@ systemd:
contents: | contents: |
[Unit] [Unit]
Description=Kubernetes control plane Description=Kubernetes control plane
Wants=docker.service
After=docker.service
ConditionPathExists=!/opt/bootstrap/bootstrap.done ConditionPathExists=!/opt/bootstrap/bootstrap.done
[Service] [Service]
Type=oneshot Type=oneshot
RemainAfterExit=true RemainAfterExit=true
WorkingDirectory=/opt/bootstrap WorkingDirectory=/opt/bootstrap
ExecStart=/usr/bin/rkt run \ Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.19.4
--trust-keys-from-https \ ExecStart=/usr/bin/docker run \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \ -v /etc/kubernetes/bootstrap-secrets:/etc/kubernetes/secrets:ro \
--mount volume=config,target=/etc/kubernetes/secrets \ -v /opt/bootstrap/assets:/assets:ro \
--volume assets,kind=host,source=/opt/bootstrap/assets \ -v /opt/bootstrap/apply:/apply:ro \
--mount volume=assets,target=/assets \ --entrypoint=/apply \
--volume script,kind=host,source=/opt/bootstrap/apply \ $${KUBELET_IMAGE}
--mount volume=script,target=/apply \
--insecure-options=image \
docker://quay.io/poseidon/kubelet:v1.19.0 \
--net=host \
--dns=host \
--exec=/apply
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
@ -214,6 +198,28 @@ storage:
contents: contents:
inline: | inline: |
fs.inotify.max_user_watches=16184 fs.inotify.max_user_watches=16184
- path: /etc/etcd/etcd.env
filesystem: root
mode: 0644
contents:
inline: |
ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381
ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}
ETCD_STRICT_RECONFIG_CHECK=true
ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt
ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt
ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true
passwd: passwd:
users: users:
- name: core - name: core

View File

@ -31,7 +31,7 @@ storage:
inline: | inline: |
#!/bin/bash -ex #!/bin/bash -ex
curl --retry 10 "${ignition_endpoint}?{{.request.raw_query}}&os=installed" -o ignition.json curl --retry 10 "${ignition_endpoint}?{{.request.raw_query}}&os=installed" -o ignition.json
${os_flavor}-install \ flatcar-install \
-d ${install_disk} \ -d ${install_disk} \
-C ${os_channel} \ -C ${os_channel} \
-V ${os_version} \ -V ${os_version} \

Some files were not shown because too many files have changed in this diff Show More