Compare commits

...

328 Commits

Author SHA1 Message Date
c163fbbbcd Update docs and README for release 2020-12-12 12:31:35 -08:00
dc7be431e0 Remove iSCSI mounts from Kubelet
* Remove Kubelet `/etc/iscsi` and `iscsiadm` host mounts that
were added on bare-metal, since these no longer work on either
Fedora CoreOS or Flatcar Linux with newer `iscsiadm`
* These special mounts on bare-metal date back to #350 which
added them to provide a way to use iSCSI in Kubernetes v1.10
* Today, storage should be handled by external CSI providers
which handle different storage systems, which doesn't rely
on Kubelet storage utils

Close #907
2020-12-12 11:41:02 -08:00
86e0f806b3 Revert "Add support for Terraform v0.14.x"
This reverts commit 968febb050.
2020-12-11 00:47:57 -08:00
96172ad269 Update Grafana from v7.3.4 to v7.3.5
* https://github.com/grafana/grafana/releases/tag/v7.3.5
2020-12-11 00:24:43 -08:00
3eb20a1f4b Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-12-11 00:15:29 -08:00
ee9ce3d0ab Update Calico from v3.17.0 to v3.17.1
* https://github.com/projectcalico/calico/releases/tag/v3.17.1
2020-12-10 22:48:38 -08:00
a8b8a9b454 Update Kubernetes from v1.20.0-rc.0 to v1.20.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1200
2020-12-08 18:28:13 -08:00
968febb050 Add support for Terraform v0.14.x
* Support Terraform v0.13.x and v0.14.x
2020-12-07 00:22:38 -08:00
bee455f83a Update Cilium from v1.9.0 to v1.9.1
* https://github.com/cilium/cilium/releases/tag/v1.9.1
2020-12-04 14:14:18 -08:00
3e89ea1b4a Promote Fedora CoreOS bare-metal to stable
* Fedora CoreOS is a good choice for use on bare-metal
2020-12-04 14:02:55 -08:00
e77dd6ecd4 Update Kubernetes from v1.19.4 to v1.20.0-rc.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1200-rc0
2020-12-03 16:01:28 -08:00
4fd4a0f540 Move control plane static pod TLS assets to /etc/kubernetes/pki
* Change control plane static pods to mount `/etc/kubernetes/pki`,
instead of `/etc/kubernetes/bootstrap-secrets` to better reflect
their purpose and match some loose conventions upstream
* Place control plane and bootstrap TLS assets and kubeconfig's
in `/etc/kubernetes/pki`
* Mount to `/etc/kubernetes/pki` (rather than `/etc/kubernetes/secrets`)
to match the host location (less surprise)

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/233
2020-12-02 23:26:42 -08:00
804dfea0f9 Add kubeconfig's for kube-scheduler and kube-controller-manager
* Generate TLS client certificates for `kube-scheduler` and
`kube-controller-manager` with `system:kube-scheduler` and
`system:kube-controller-manager` CNs
* Template separate kubeconfigs for kube-scheduler and
kube-controller manager (`scheduler.conf` and
`controller-manager.conf`). Rename admin for clarity
* Before v1.16.0, Typhoon scheduled a self-hosted control
plane, which allowed the steady-state kube-scheduler and
kube-controller-manager to use a scoped ServiceAccount.
With a static pod control plane, separate CN TLS client
certificates are the nearest equiv.
* https://kubernetes.io/docs/setup/best-practices/certificates/
* Remove unused Kubelet certificate, TLS bootstrap is used
instead
2020-12-01 22:02:15 -08:00
8ba23f364c Add TokenReview and TokenRequestProjection flags
* Add kube-apiserver flags for TokenReview and TokenRequestProjection
(beta, defaults on) to allow using Service Account Token Volume
Projection to create and mount service account tokens tied to a Pod's
lifecycle

Rel:

* https://github.com/poseidon/terraform-render-bootstrap/pull/231
* https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection
2020-12-01 20:02:33 -08:00
f6025666eb Update etcd from v3.4.12 to v3.4.14
* https://github.com/etcd-io/etcd/releases/tag/v3.4.14
2020-11-29 20:04:25 -08:00
85eb502f19 Update Prometheus from v2.23.0-rc.0 to v2.23.0
* https://github.com/prometheus/prometheus/releases/tag/v2.23.0
2020-11-29 19:59:27 -08:00
fa3184fb9c Relax terraform-provider-ct version constraint
* Allow terraform-provider-ct versions v0.6+ (e.g. v0.7.1)
Before, only v0.6.x point updates were allowed
* Update terraform-provider-ct to v0.7.1 in docs
* READ the docs before updating terraform-provider-ct,
as changing worker user-data is handled differently
by different cloud platforms
2020-11-29 19:51:26 -08:00
22565e57e0 Update kube-state-metrics from v2.0.0-alpha.2 to v2.0.0-alpha.3
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.3
2020-11-25 14:30:11 -08:00
026e1f3648 Update Grafana from v7.3.3 to v7.3.4
* https://github.com/grafana/grafana/releases/tag/v7.3.4
2020-11-25 14:25:15 -08:00
ae548ce213 Update Calico from v3.16.5 to v3.17.0
* Enable Calico MTU auto-detection
* Remove [workaround](https://github.com/poseidon/typhoon/pull/724) to
Calico cni-plugin [issue](https://github.com/projectcalico/cni-plugin/issues/874)

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/230
2020-11-25 14:22:58 -08:00
e826b49648 Update Matchbox profile to use initramfs and rootfs images
* Fedora CoreOS stable (after Oct 6) ships separate initramfs
and rootfs images, used as initrd's
* Update profiles to match the Matchbox examples, which have
already switched to the new profile and to remove the unused
kernel args
* Requires Fedora CoreOS version which ships rootfs images
(e.g. stable 32.20200923.3.0 or later)

Rel:

* https://github.com/coreos/fedora-coreos-tracker/issues/390#issuecomment-661986987
* da0df01763 (diff-4541f7b7c174f6ae6270135942c1c65ed9e09ebe81239709f5a9fb34e858ddcf)

Supercedes https://github.com/poseidon/typhoon/pull/888
2020-11-25 14:13:39 -08:00
fa8f68f50e Fix Fedora CoreOS AWS AMI query in non-US regions
* A `aws_ami` data source will fail a Terraform plan
if no matching AMI is found, even if the AMI is not
used. ARM64 images are only published to a few US
regions, so the `aws_ami` data query could fail when
creating Fedora CoreOS AWS clusters in non-US regions
* Condition `aws_ami` on whether experimental arch
`arm64` is chosen
* Recent regression introduced in v1.19.4
https://github.com/poseidon/typhoon/pull/875

Closes https://github.com/poseidon/typhoon/issues/886
2020-11-25 11:32:05 -08:00
ba8d972c76 Update Prometheus from v2.22.2 to v2.23.0-rc.0
* https://github.com/prometheus/prometheus/releases/tag/v2.23.0-rc.0
2020-11-24 10:54:42 -08:00
c0347ca0c6 Set kubeconfig and asset_dist as sensitive
* Mark `kubeconfig` and `asset_dist` as `sensitive` to
prevent the Terraform CLI displaying these values, esp.
for CI systems
* In particular, external tools or tfvars style uses (not
recommended) reportedly display all outputs and are improved
by setting sensitive
* For Terraform v0.14, outputs referencing sensitive fields
must also be annotated as sensitive

Closes https://github.com/poseidon/typhoon/issues/884
2020-11-23 11:41:55 -08:00
9f94ab6bcc Rerun terraform fmt for recent variables 2020-11-21 14:20:36 -08:00
5e4f5de271 Enable Network Load Balancer (NLB) dualstack
* NLB subnets assigned both IPv4 and IPv6 addresses
* NLB DNS name has both A and AAAA records
* NLB to target node traffic is IPv4 (no change),
no change to security groups needed
* Ingresses exposed through the recommended Nginx
Ingress Controller addon will be accessible via
IPv4 or IPv6. No change is needed to the app's
CNAME to NLB record

Related: https://aws.amazon.com/about-aws/whats-new/2020/11/network-load-balancer-supports-ipv6/
2020-11-21 14:16:24 -08:00
be28495d79 Update Prometheus from v2.22.1 to v2.22.2
* https://github.com/prometheus/prometheus/releases/tag/v2.22.2
2020-11-19 21:50:48 -08:00
f1356fec24 Update Grafana from v7.3.2 to v7.3.3
* https://github.com/grafana/grafana/releases/tag/v7.3.3
2020-11-19 21:49:11 -08:00
cc00afa4e1 Add Terraform v0.13 input variable validations
* Support for migrating from Terraform v0.12.x to v0.13.x
was added in v1.18.8
* Require Terraform v0.13+. Drop support for Terraform v0.12
2020-11-17 12:02:34 -08:00
5c3b5a20de Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-11-14 13:32:04 -08:00
f5a83667e8 Update Grafana from v7.3.1 to v7.3.2
* https://github.com/grafana/grafana/releases/tag/v7.3.2
2020-11-14 13:30:30 -08:00
a911367c2e Update nginx-ingress from v0.41.0 to v0.41.2
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.2
2020-11-14 13:27:06 -08:00
f884de847e Discard Prometheus etcd gRPC failure alert
* Kubernetes watch expiry is not a gRPC code we care about
* Background: This rule is typically removed, but was added back in
2020-11-14 13:17:56 -08:00
1b3a0f6ebc Add experimental Fedora CoreOS arm64 support on AWS
* Add experimental `arch` variable to Fedora CoreOS AWS,
accepting amd64 (default) or arm64 to support native
arm64/aarch64 clusters or mixed/hybrid clusters with
a worker pool of arm64 workers
* Add `daemonset_tolerations` variable to cluster module
(experimental)
* Add `node_taints` variable to workers module
* Requires flannel CNI and experimental Poseidon-built
arm64 Fedora CoreOS AMIs (published to us-east-1, us-east-2,
and us-west-1)

WARN:

* Our AMIs are experimental, may be removed at any time, and
will be removed when Fedora CoreOS publishes official arm64
AMIs. Do NOT use in production

Related:

* https://github.com/poseidon/typhoon/pull/682
2020-11-14 13:09:24 -08:00
1113a22f61 Update Kubernetes from v1.19.3 to v1.19.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1194
2020-11-11 22:56:27 -08:00
152c7d86bd Change bootstrap.service container from rkt to docker
* Use docker to run `bootstrap.service` container
* Background https://github.com/poseidon/typhoon/pull/855
2020-11-11 22:26:05 -08:00
79deb8a967 Update Cilium from v1.9.0-rc3 to v1.9.0
* https://github.com/cilium/cilium/releases/tag/v1.9.0
2020-11-10 23:42:41 -08:00
f412f0d9f2 Update Calico from v3.16.4 to v3.16.5
* https://github.com/projectcalico/calico/releases/tag/v3.16.5
2020-11-10 22:58:19 -08:00
eca6c4a1a1 Fix broken flatcar linux documentation links (#870)
* Fix old documentation links
2020-11-10 18:30:30 -08:00
133d325013 Update nginx-ingress from v0.40.2 to v0.41.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.0
2020-11-08 14:34:52 -08:00
4b05c0180e Update Grafana from v7.3.0 to v7.3.1
* https://github.com/grafana/grafana/releases/tag/v7.3.1
2020-11-08 14:13:39 -08:00
f49ab3a6ee Update Prometheus from v2.22.0 to v2.22.1
* https://github.com/prometheus/prometheus/releases/tag/v2.22.1
2020-11-08 14:12:24 -08:00
0eef16b274 Improve and tidy Fedora CoreOS etcd-member.service
* Allow a snippet with a systemd dropin to set an alternate
image via `ETCD_IMAGE`, for consistency across Fedora CoreOS
and Flatcar Linux
* Drop comments about integrating system containers with
systemd-notify
2020-11-08 11:49:56 -08:00
ad1f59ce91 Change Flatcar etcd-member.service container from rkt to docker
* Use docker to run the `etcd-member.service` container
* Use env-file `/etc/etcd/etcd.env` like podman on FCOS
* Background: https://github.com/poseidon/typhoon/pull/855
2020-11-03 16:42:18 -08:00
82e5ac3e7c Update Cilium from v1.8.5 to v1.9.0-rc3
* https://github.com/poseidon/terraform-render-bootstrap/pull/224
2020-11-03 10:29:07 -08:00
a8f7880511 Update Cilium from v1.8.4 to v1.8.5
* https://github.com/cilium/cilium/releases/tag/v1.8.5
2020-10-29 00:50:18 -07:00
cda5b93b09 Update kube-state-metrics from v2.0.0-alpha.1 to v2.0.0-alpha.2
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.2
2020-10-28 18:49:40 -07:00
3e9f5f34de Update Grafana from v7.2.2 to v7.3.0
* https://github.com/grafana/grafana/releases/tag/v7.3.0
2020-10-28 17:46:26 -07:00
893d139590 Update Calico from v3.16.3 to v3.16.4
* https://github.com/projectcalico/calico/releases/tag/v3.16.4
2020-10-26 00:50:40 -07:00
fc62e51b2a Update Grafana from v7.2.1 to v7.2.2
* https://github.com/grafana/grafana/releases/tag/v7.2.2
2020-10-22 00:14:04 -07:00
e5ba3329eb Remove bare-metal CoreOS Container Linux profiles
* Remove Matchbox profiles for CoreOS Container Linux
* Simplify the remaining Flatcat Linux profiles
2020-10-21 00:25:10 -07:00
7c3f3ab6d0 Rename container-linux modules to flatcar-linux
* CoreOS Container Linux was deprecated in v1.18.3
* Continue transitioning docs and modules from supporting
both CoreOS and Flatcar "variants" of Container Linux to
now supporting Flatcar Linux and equivalents

Action Required: Update the Flatcar Linux modules `source`
to replace `s/container-linux/flatcar-linux`. See docs for
examples
2020-10-20 22:47:19 -07:00
a99a990d49 Remove unused Kubelet tls mounts
* Kubelet trusts only the cluster CA certificate (and
certificates in the Kubelet debian base image), there
is no longer a need to mount the host's trusted certs
* Similar change on Flatcar Linux in
https://github.com/poseidon/typhoon/pull/855

Rel: https://github.com/poseidon/typhoon/pull/810
2020-10-18 23:48:21 -07:00
df17253e72 Fix delete node permission on Fedora CoreOS node shutdown
* On cloud platforms, `delete-node.service` tries to delete the
local node (not always possible depending on preemption time)
* Since v1.18.3, kubelet TLS bootstrap generates a kubeconfig
in `/var/lib/kubelet` which should be used with kubectl in
the delete-node oneshot
2020-10-18 23:38:11 -07:00
eda78db08e Change Flatcar kubelet.service container from rkt to docker
* Use docker to run the `kubelet.service` container
* Update Kubelet mounts to match Fedora CoreOS
* Remove unused `/etc/ssl/certs` mount (see
https://github.com/poseidon/typhoon/pull/810)
* Remove unused `/usr/share/ca-certificates` mount
* Remove `/etc/resolv.conf` mount, Docker default is ok
* Change `delete-node.service` to use docker instead of rkt
and inline ExecStart, as was done on Fedora CoreOS
* Fix permission denied on shutdown `delete-node`, caused
by the kubeconfig mount changing with the introduction of
node TLS bootstrap

Background

* podmand, rkt, and runc daemonless container process runners
provide advantages over the docker daemon for system containers.
Docker requires workarounds for use in systemd units where the
ExecStart must tail logs so systemd can monitor the daemonized
container. https://github.com/moby/moby/issues/6791
* Why switch then? On Flatcar Linux, podman isn't shipped. rkt
works, but isn't developing while container standards continue
to move forward. Typhoon has used runc for the Kubelet runner
before in Fedora Atomic, but its more low-level. So we're left
with Docker, which is less than ideal, but shipped in Flatcar
* Flatcar Linux appears to be shifting system components to
use docker, which does provide some limited guards against
breakages (e.g. Flatcar cannot enable docker live restore)
2020-10-18 23:24:45 -07:00
afac46e39a Remove asset_dir variable and optional asset writes
* Originally, poseidon/terraform-render-bootstrap generated
TLS certificates, manifests, and cluster "assets" written
to local disk (`asset_dir`) during terraform apply cluster
bootstrap
* Typhoon v1.17.0 introduced bootstrapping using only Terraform
state to store cluster assets, to avoid ever writing sensitive
materials to disk and improve automated use-cases. `asset_dir`
was changed to optional and defaulted to "" (no writes)
* Typhoon v1.18.0 deprecated the `asset_dir` variable, removed
docs, and announced it would be deleted in future.
* Add Terraform output `assets_dir` map
* Remove the `asset_dir` variable

Cluster assets are now stored in Terraform state only. For those
who wish to write those assets to local files, this is possible
doing so explicitly.

```
resource local_file "assets" {
  for_each = module.yavin.assets_dist
  filename = "some-assets/${each.key}"
  content = each.value
}
```

Related:

* https://github.com/poseidon/typhoon/pull/595
* https://github.com/poseidon/typhoon/pull/678
2020-10-17 15:00:15 -07:00
b1e680ac0c Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-10-17 13:56:24 -07:00
9fbfbdb854 Update Prometheus from v2.21.0 to v2.22.0
* https://github.com/prometheus/prometheus/releases/tag/v2.22.0
2020-10-17 12:38:25 -07:00
511f5272f4 Update Calico from v3.15.3 to v3.16.3
* https://github.com/projectcalico/calico/releases/tag/v3.16.3
* https://github.com/poseidon/terraform-render-bootstrap/pull/212
2020-10-15 20:08:51 -07:00
46ca5e8813 Update Kubernetes from v1.19.2 to v1.19.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193
2020-10-14 20:47:49 -07:00
394e496cc7 Update Grafana from v7.2.0 to v7.2.1
* https://github.com/grafana/grafana/releases/tag/v7.2.1
2020-10-11 13:21:25 -07:00
a38ec1a856 Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-10-11 13:06:53 -07:00
7881f4bd86 Update kube-state-metrics from v1.9.7 to v2.0.0-alpha.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.1
2020-10-11 12:35:43 -07:00
d5b5b7cb02 Update nginx-ingress from v0.40.0 to v0.40.2
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.2
2020-10-06 23:52:15 -07:00
759a48be7c Update mkdocs-material from v5.5.12 to v6.0.1
* Update OS kernel, systemd, and docker verisons
2020-10-02 01:18:38 -07:00
b39a1d70da Update nginx-ingress from v0.35.0 to v0.40.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.0
2020-10-02 01:00:35 -07:00
901f7939b2 Update Cilium from v1.8.3 to v1.8.4
* https://github.com/cilium/cilium/releases/tag/v1.8.4
2020-10-02 00:24:26 -07:00
d65085ce14 Update Grafana from v7.1.5 to v7.2.0
* https://github.com/grafana/grafana/releases/tag/v7.2.0
2020-09-24 20:58:32 -07:00
343db5b578 Remove references to CoreOS Container Linux
* CoreOS Container Linux was deprecated in v1.18.3 (May 2020)
in favor of Fedora CoreOS and Flatcar Linux. CoreOS Container
Linux references were kept to give folks more time to migrate,
but AMIs have now been deleted. Time is up.

Rel: https://coreos.com/os/eol/
2020-09-24 20:51:02 -07:00
444363be2d Update Kubernetes from v1.19.1 to v1.19.2
* Update flannel from v0.12.0 to v0.13.0-rc2
* Update flannel-cni from v0.4.0 to v0.4.1
* Update CNI plugins from v0.8.6 to v0.8.7
2020-09-16 20:05:54 -07:00
bc7ad25c60 Update Grafana dashboard for Kubelet v1.19
* Fix Kubelet pod and container count metrics dashboard
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/499
2020-09-15 23:21:56 -07:00
e838d4dc3d Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh upstream Prometheus rules/alerts and Grafana dashboards
2020-09-13 15:03:27 -07:00
979c092ef6 Reduce apiserver metrics cardinality of non-core APIs
* Reduce `apiserver_request_duration_seconds_count` cardinality
by dropping series for non-core Kubernetes APIs. This is done
to match `apiserver_request_duration_seconds_count` relabeling
* These two relabels must be performed the same way to avoid
affecting new SLO calculations (upcoming)
* See https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/498

Related: https://github.com/poseidon/typhoon/pull/596
2020-09-13 14:47:49 -07:00
db8e94bb4b Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-09-12 19:41:15 -07:00
eb093af9ed Drop Kubelet labelmap relabel for node_name
* Originally, Kubelet and CAdvisor metrics used a labelmap
relabel to add Kubernetes SD node labels onto timeseries
* With https://github.com/poseidon/typhoon/pull/596 that
relabel was dropped since node labels aren't usually that
valuable. `__meta_kubernetes_node_name` was retained but
the field name is empty
* Favor just using Prometheus server-side `instance` in
queries that require some node identifier for aggregation
or debugging

Fix https://github.com/poseidon/typhoon/issues/823
2020-09-12 19:40:00 -07:00
36096f844d Promote Cilium from experimental to GA
* Cilium was added as an experimental CNI provider in June
* Since then, I've been choosing it for an increasing number
of clusters and scenarios.
2020-09-12 19:24:55 -07:00
d236628e53 Update Prometheus from v2.20.0 to v2.21.0
* https://github.com/prometheus/prometheus/releases/tag/v2.21.0
2020-09-12 19:20:54 -07:00
577b927a2b Update Fedora CoreOS Config version from v1.0.0 to v1.1.0
* No notable changes in the config spec, just house keeping
* Require any snippets customization to update to v1.1.0. Version
skew between the main config and snippets will show an err message
* https://github.com/coreos/fcct/blob/master/docs/configuration-v1_1.md
2020-09-10 23:38:40 -07:00
000c11edf6 Update IngressClass resources to networking.k8s.io/v1
* Kubernetes v1.19 graduated Ingress and IngressClass from
networking.k8s.io/v1beta1 to networking.k8s.io/v1
2020-09-10 23:25:53 -07:00
29b16c3fc0 Change seccomp annotations to seccompProfile
* seccomp graduated to GA in Kubernetes v1.19. Support for
seccomp alpha annotations will be removed in v1.22
* Replace seccomp annotations with the GA seccompProfile
field in the PodTemplate securityContext
* Switch profile from `docker/default` to `runtime/default`
(no effective change, since docker is the runtime)
* Verify with docker inspect SecurityOpt. Without the profile,
you'd see `seccomp=unconfined`

Related: https://github.com/poseidon/terraform-render-bootstrap/pull/215
2020-09-10 01:15:07 -07:00
0c7a879bc4 Update Kubernetes from v1.19.0 to v1.19.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191
2020-09-09 20:52:29 -07:00
1e654c9e4e Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
* Update mkdocs-material from v5.5.11 to v5.5.12
2020-09-07 21:18:47 -07:00
28ee693e6b Update Cilium from v1.8.2 to v1.8.3
* https://github.com/cilium/cilium/releases/tag/v1.8.3
2020-09-07 21:10:27 -07:00
8c7d95aefd Update mkdocs-material from v5.5.9 to v5.5.11 2020-08-29 13:52:16 -07:00
d45dfdbf91 Update nginx-ingress from v0.34.1 to v0.35.0
* Repo changed to k8s.gcr.io/ingress-nginx/controller
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.35.0
2020-08-29 13:38:28 -07:00
d7e0536838 Add code group blocks to improve worker pool docs
* Show Fedora CoreOS and Flatcar Linux examples in
separate tabs, rather than trying to show one
* Add copyright footer for the poseidon org
2020-08-28 00:25:12 -07:00
8dd221a57c Add fleetlock docs and links to addons
* Add links to fleetlock for Fedora CoreOS reboot coordination
* https://github.com/poseidon/fleetlock
2020-08-28 00:02:24 -07:00
f17bb4cf61 Update mkdocs-material from v5.5.6 to v5.5.9 2020-08-27 09:20:18 -07:00
44f1fe620a Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-08-27 09:18:39 -07:00
a504264e24 Update Grafana from v7.1.4 to v7.1.5
* https://github.com/grafana/grafana/releases/tag/v7.1.5
2020-08-27 08:52:07 -07:00
88cf7273dc Update Kubernetes from v1.18.8 to v1.19.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md
2020-08-27 08:50:01 -07:00
58def65a09 Update Grafana from v7.1.3 to v7.1.4
* https://github.com/grafana/grafana/releases/tag/v7.1.4
2020-08-22 15:40:09 -07:00
cd7fd29194 Update etcd from v3.4.10 to v3.4.12
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md
2020-08-19 21:25:41 -07:00
aafa38476a Fix SELinux race condition on non-bootstrap controllers in multi-controller (#808)
* Fix race condition for bootstrap-secrets SELinux context on non-bootstrap controllers in multi-controller FCOS clusters
* On first boot from disk on non-bootstrap controllers, adding bootstrap-secrets races with kubelet.service starting, which can cause the secrets assets to have the wrong label until kubelet.service restarts (service, reboot, auto-update)
* This can manifest as `kube-apiserver`, `kube-controller-manager`, and `kube-scheduler` pods crashlooping on spare controllers on first cluster creation
2020-08-19 21:18:10 -07:00
9a07f1d30b Update recommended Terraform provider versions
* Sync Terraform provider plugin versions to those used
internally
* Update mkdocs-material from v5.5.1 to v5.5.6
* Fix minor details in docs
2020-08-14 10:05:52 -07:00
c87db3ef37 Update Kubernetes from v1.18.6 to v1.18.8
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1188
2020-08-13 20:47:43 -07:00
342380cfa4 Update Terraform migration guide SHA
* Mention the first master branch SHA that introduced Terraform
v0.13 forward compatibility
* Link the migration guide on Github until a release is available
and website docs are published
2020-08-13 00:36:47 -07:00
5e70d7e2c8 Migrate from Terraform v0.12.x to v0.13.x
* Recommend Terraform v0.13.x
* Support automatic install of poseidon's provider plugins
* Update tutorial docs for Terraform v0.13.x
* Add migration guide for Terraform v0.13.x (best-effort)
* Require Terraform v0.12.26+ (migration compatibility)
* Require `terraform-provider-ct` v0.6.1
* Require `terraform-provider-matchbox` v0.4.1
* Require `terraform-provider-digitalocean` v1.20+

Related:

* https://www.hashicorp.com/blog/announcing-hashicorp-terraform-0-13/
* https://www.terraform.io/upgrade-guides/0-13.html
* https://registry.terraform.io/providers/poseidon/ct/latest
* https://registry.terraform.io/providers/poseidon/matchbox/latest
2020-08-12 01:54:32 -07:00
aab071309f Update recommended Terraform provider versions
* Sync Terraform provider plugin versions to those used
internally
2020-08-09 12:40:22 -07:00
f6ce12766b Allow terraform-provider-aws v3.0+ plugin
* Typhoon AWS is compatible with terraform-provider-aws v3.x releases
* Continue to allow v2.23+, no v3.x specific features are used
* Set required provider versions in the worker module, since
it can be used independently

Related:

* https://github.com/terraform-providers/terraform-provider-aws/releases/tag/v3.0.0
2020-08-09 12:39:26 -07:00
e1d6ab2f24 Update Grafana from v7.1.1 to v7.1.3
* https://github.com/grafana/grafana/releases/tag/v7.1.3
* https://github.com/grafana/grafana/releases/tag/v7.1.2
2020-08-08 18:59:49 -07:00
8b3d41d6a0 Update mkdocs-material from v5.4.0 to v5.5.1 2020-08-02 15:22:10 -07:00
ccee5d3d89 Update from coreos/flannel-cni to poseidon/flannel-cni
* Update CNI plugins from v0.6.0 to v0.8.6 to fix several CVEs
* Update the base image to alpine:3.12
* Use `flannel-cni` as an init container and remove sleep
* https://github.com/poseidon/terraform-render-bootstrap/pull/205
* https://github.com/poseidon/flannel-cni
* https://quay.io/repository/poseidon/flannel-cni

Background

* Switch from github.com/coreos/flannel-cni v0.3.0 which was last
published by me in 2017 and is no longer accessible to me to maintain
or patch
* Port to the poseidon/flannel-cni rewrite, which releases v0.4.0
to continue the prior release numbering
2020-08-02 15:13:15 -07:00
8aefd4f082 Relex terraform-provider-matchbox version constraint
* Allow use of terraform-provider-matchbox v0.3+ (which
allows v0.3.0 <= version < v1.0) for any pre 1.0 release
* Before, the requirement was v0.3.0 <= version < v0.4.0
2020-08-02 01:09:28 -07:00
78e6409bd0 Fix flannel support on Fedora CoreOS
* Fedora CoreOS now ships systemd-udev's `default.link` while
Flannel relies on being able to pick its own MAC address for
the `flannel.1` link for tunneled traffic to reach cni0 on
the destination side, without being dropped
* This change first appeared in FCOS testing-devel 32.20200624.20.1
and is the behavior going forward in FCOS since it was added
to align FCOS network naming / configs with the rest of Fedora
and address issues related to the default being missing
* Flatcar Linux (and Container Linux) has a specific flannel.link
configuration builtin, so it was not affected
* https://github.com/coreos/fedora-coreos-tracker/issues/574#issuecomment-665487296

Note: Typhoon's recommended and default CNI provider is Calico,
unless `networking` is set to flannel directly.
2020-08-01 21:22:08 -07:00
2aef42d4f6 Update Prometheus from v2.19.2 to v2.20.0
* https://github.com/prometheus/prometheus/releases/tag/v2.20.0
2020-07-25 16:37:28 -07:00
b7d67757de Update Grafana from v7.1.0 to v7.1.1
* https://github.com/grafana/grafana/releases/tag/v7.1.1
2020-07-25 16:33:40 -07:00
26f5d2d753 Fix some links in docs (#788) 2020-07-25 16:32:08 -07:00
cd0a28904e Update Cilium from v1.8.1 to v1.8.2
* https://github.com/cilium/cilium/releases/tag/v1.8.2
2020-07-25 16:06:27 -07:00
618f8b30fd Update CoreDNS from v1.6.7 to v1.7.0
* https://coredns.io/2020/06/15/coredns-1.7.0-release/
* Update Grafana dashboard with revised metrics names
2020-07-25 15:51:31 -07:00
264d23a1b5 Declare etcd data directory permissions
* Set etcd data directory /var/lib/etcd permissions to 700
* On Flatcar Linux, /var/lib/etcd is pre-existing and Ignition
v2 doesn't overwrite the directory. Update the Container Linux
config, but add the manual chmod workaround to bootstrap for
Flatcar Linux users
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v3410-2020-07-16
* https://github.com/etcd-io/etcd/pull/11798
2020-07-25 15:48:27 -07:00
f96e91f225 Update etcd from v3.4.9 to v3.4.10
* https://github.com/etcd-io/etcd/releases/tag/v3.4.10
2020-07-18 14:08:22 -07:00
efd4a0319d Update Grafana from v7.0.6 to v7.1.0
* https://github.com/grafana/grafana/releases/tag/v7.1.0
2020-07-18 13:54:56 -07:00
6df6bf904a Show Cilium as a CNI provider option in docs
* Start to show Cilium as a CNI option
* https://github.com/cilium/cilium
2020-07-18 13:27:56 -07:00
5fba20d358 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions with those
used internally
2020-07-18 13:19:25 -07:00
a8d3d3bb12 Update ingress-nginx from v0.33.0 to v0.34.1
* Switch to ingress-nginx controller images from us.grc.io (eu, asia
can also be used if desired)
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.0
2020-07-15 22:43:49 -07:00
9ea6d2c245 Update Kubernetes from v1.18.5 to v1.18.6
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1186
* https://github.com/poseidon/terraform-render-bootstrap/pull/201
2020-07-15 22:05:57 -07:00
507aac9b78 Update mkdocs-material from v5.3.3 to v5.4.0 2020-07-11 22:56:59 -07:00
dfd2a0ec23 Update Grafana from v7.0.5 to v7.0.6
* https://github.com/grafana/grafana/releases/tag/v7.0.6
2020-07-09 21:10:48 -07:00
e3bf7d8f9b Update Prometheus from v2.19.1 to v2.19.2
* https://github.com/prometheus/prometheus/releases/tag/v2.19.2
2020-07-09 21:08:55 -07:00
49050320ce Update Cilium from v1.8.0 to v1.8.1
* https://github.com/cilium/cilium/releases/tag/v1.8.1
2020-07-05 16:00:00 -07:00
74e025c9e4 Update Grafana from v7.0.4 to v7.0.5
* https://github.com/grafana/grafana/releases/tag/v7.0.5
2020-07-05 15:49:34 -07:00
257a49ce37 Remove CoreOS Container Linux image names from docs
* Remove coreos-stable, coreos-beta, and coreos-alpha channel
references from docs
* CoreOS Container Linux is end of life (see changelog)
2020-06-30 01:36:53 -07:00
df3f40bcce Allow using Flatcar Linux edge on Azure
* Set Kubelet cgroup driver to systemd when Flatcar Linux edge
is chosen

Note: Typhoon module status assumes use of the stable variant of
an OS channel/stream. Its possible to use earlier variants and
those are sometimes tested or developed against, but stable is
the recommendation
2020-06-30 01:35:29 -07:00
32886cfba1 Promote Fedora CoreOS on Google Cloud to stable status 2020-06-29 23:09:11 -07:00
0ba2c1a4da Fix terraform fmt in firewall rules 2020-06-29 23:04:54 -07:00
430d139a5b Remove os_image variable on Google Cloud Fedora CoreOS
* In v1.18.3, the `os_stream` variable was added to select
a Fedora CoreOS image stream (stable, testing, next) on
AWS and Google Cloud (which publish official streams)
* Remove `os_image` variable deprecated in v1.18.3. Manually
uploaded images are no longer needed
2020-06-29 22:57:11 -07:00
7c6ab21b94 Isolate each DigitalOcean cluster in its own VPC
* DigitalOcean introduced Virtual Private Cloud (VPC) support
to match other clouds and enhance the prior "private networking"
feature. Before, droplet's belonging to different clusters (but
residing in the same region) could reach one another (although
Typhoon firewall rules prohibit this). Now, droplets in a VPC
reside in their own network
* https://www.digitalocean.com/docs/networking/vpc/
* Create droplet instances in a VPC per cluster. This matches the
design of Typhoon AWS, Azure, and GCP.
* Require `terraform-provider-digitalocean` v1.16.0+ (action required)
* Output `vpc_id` for use with an attached DigitalOcean
loadbalancer
2020-06-28 23:25:30 -07:00
21178868db Revert "Update Prometheus from v2.19.1 to v2.19.2"
* Prometheus has not published the v1.19.2
* This reverts commit 81b6f54169.
2020-06-27 14:53:58 -07:00
9dcf35e393 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions with those
used internally
2020-06-27 14:44:18 -07:00
81b6f54169 Update Prometheus from v2.19.1 to v2.19.2
* https://github.com/prometheus/prometheus/releases/tag/v2.19.2
2020-06-27 14:34:30 -07:00
7bce15975c Update Kubernetes from v1.18.4 to v1.18.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1185
2020-06-27 13:52:18 -07:00
1f83ae7dbb Update Calico from v3.14.1 to v3.15.0
* https://docs.projectcalico.org/v3.15/release-notes/
2020-06-26 02:40:12 -07:00
a10a1cee9f Update mkdocs-material from v5.3.0 to v5.3.3 2020-06-26 02:24:37 -07:00
a79ad34ba3 Update Grafana from v7.0.3 to v7.0.4
* https://github.com/grafana/grafana/releases/tag/v7.0.4
2020-06-26 02:06:38 -07:00
99a11442c7 Update Prometheus from v2.19.0 to v2.19.1
* https://github.com/prometheus/prometheus/releases/tag/v2.19.1
2020-06-26 02:01:58 -07:00
d27f367004 Update Cilium from v1.8.0-rc4 to v1.8.0
* https://github.com/cilium/cilium/releases/tag/v1.8.0
2020-06-22 22:26:49 -07:00
e9c8520359 Add experimental Cilium CNI provider
* Accept experimental CNI `networking` mode "cilium"
* Run Cilium v1.8.0-rc4 with overlay vxlan tunnels and a
minimal set of features. We're interested in:
  * IPAM: Divide pod_cidr into /24 subnets per node
  * CNI networking pod-to-pod, pod-to-external
  * BPF masquerade
  * NetworkPolicy as defined by Kubernetes (no L7 Policy)
* Continue using kube-proxy with Cilium probe mode
* Firewall changes:
  * Require UDP 8472 for vxlan (Linux kernel default) between nodes
  * Optional ICMP echo(8) between nodes for host reachability
    (health)
  * Optional TCP 4240 between nodes for endpoint reachability (health)

Known Issues:

* Containers with `hostPort` don't listen on all host addresses,
these workloads must use `hostNetwork` for now
https://github.com/cilium/cilium/issues/12116
* Erroneous warning on Fedora CoreOS
https://github.com/cilium/cilium/issues/10256

Note: This is experimental. It is not listed in docs and may be
changed or removed without a deprecation notice

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/192
* https://github.com/cilium/cilium/issues/12217
2020-06-21 20:41:53 -07:00
37f00a3882 Reduce Calcio MTU on Fedora CoreOS Azure
* Change the Calico VXLAN interface for MTU from 1450 to 1410
* VXLAN on Azure should support MTU 1450. However, there is
history where performance measures have shown that 1410 is
needed to have expected performance. Flatcar Linux has the
same MTU 1410 override and note
* FCOS 31.20200323.3.2 was known to perform fine with 1450, but
now in 31.20200517.3.0 the right value seems to be 1410
2020-06-19 00:24:56 -07:00
4cfafeaa07 Fix Kubelet starting before hostname set on FCOS AWS
* Fedora CoreOS `kubelet.service` can start before the hostname
is set. Kubelet reads the hostname to determine the node name to
register. If the hostname was read as localhost, Kubelet will
continue trying to register as localhost (problem)
* This race manifests as a node that appears NotReady, the Kubelet
is trying to register as localhost, while the host itself (by then)
has an AWS provided hostname. Restarting kubelet.service is a
manual fix so Kubelet re-reads the hostname
* This race could only be shown on AWS, not on Google Cloud or
Azure despite attempts. Bare-metal and DigitalOcean differ and
use hostname-override (e.g. afterburn) so they're not affected
* Wait for nodes to have a non-localhost hostname in the oneshot
that awaits /etc/resolve.conf. Typhoon has no valid cases for a
node hostname being localhost (not even single-node clusters)

Related Openshift: https://github.com/openshift/machine-config-operator/pull/1813
Close https://github.com/poseidon/typhoon/issues/765
2020-06-19 00:19:54 -07:00
90e23f5822 Rename controller node label and NoSchedule taint
* Remove node label `node.kubernetes.io/master` from controller nodes
* Use `node.kubernetes.io/controller` (present since v1.9.5,
[#160](https://github.com/poseidon/typhoon/pull/160)) to node select controllers
* Rename controller NoSchedule taint from `node-role.kubernetes.io/master` to
`node-role.kubernetes.io/controller`
* Tolerate the new taint name for workloads that may run on controller nodes
and stop tolerating `node-role.kubernetes.io/master` taint
2020-06-19 00:12:13 -07:00
6234147948 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions with those
used internally
2020-06-18 01:03:37 -07:00
c25c59058c Update Kubernetes from v1.18.3 to v1.18.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1184
2020-06-17 19:53:19 -07:00
bc9b808d44 Update nginx-ingress from v0.32.0 to v0.33.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-0.33.0
2020-06-16 18:44:40 -07:00
4b0203fdb2 Fix typo in DigitalOcean docs title 2020-06-16 18:33:56 -07:00
331566e1f7 Update mkdocs packages for website 2020-06-16 18:20:19 -07:00
04520e447c Update node-exporter from v1.0.0 to v1.0.1
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.1
2020-06-16 17:57:09 -07:00
413585681b Remove unused Kubelet lock-file and exit-on-lock-contention
* Kubelet `--lock-file` and `--exit-on-lock-contention` date
back to usage of bootkube and at one point running Kubelet
in a "self-hosted" style whereby an on-host Kubelet (rkt)
started pods, but then a Kubelet DaemonSet was scheduled
and able to take over (hence self-hosted). `lock-file` and
`exit-on-lock-contention` flags supported this pivot. The
pattern has been out of favor (in bootkube too) for years
because of dueling Kubelet complexity
* Typhoon runs Kubelet as a container via an on-host systemd
unit using podman (Fedora CoreOS) or rkt (Flatcar Linux). In
fact, Typhoon no longer uses bootkube or control plane pivot
(let alone Kubelet pivot) and uses static pods since v1.16.0
* https://github.com/poseidon/typhoon/pull/536
2020-06-12 00:06:41 -07:00
96711d7f17 Remove unused Kubelet cert / key Terraform state
* Generated Kubelet TLS certificate and key are not longer
used or distributed to machines since Kubelet TLS bootstrap
is used instead. Remove the certificate and key from state
2020-06-11 21:24:36 -07:00
c9059d3fe9 Update Prometheus from v2.19.0-rc.0 to v2.19.0
* https://github.com/prometheus/prometheus/releases/tag/v2.19.0
2020-06-09 23:05:03 -07:00
a287920169 Use strict mode for Container Linux Configs
* Enable terraform-provider-ct `strict` mode for parsing
Container Linux Configs and snippets
* Fix Container Linux Config systemd unit syntax `enable`
(old) to `enabled`
* Align with Fedora CoreOS which uses strict mode already
2020-06-09 23:00:36 -07:00
8dc170b9d9 Update security disclosure contact email
* Use security@psdn.io across github.com/poseidon projects
2020-06-08 12:37:09 -07:00
aed1a5f33d Fix Fedora CoreOS docs for selecting a stream
* Fedora CoreOS image `os_stream` stable, testing, and next
have been configurable since v1.18.3
* Remove mention of outdated `os_image` variable
2020-06-08 12:25:57 -07:00
31d02b0221 Update Prometheus from v2.18.1 to v2.19.0-rc.0
* https://github.com/prometheus/prometheus/releases/tag/v2.19.0-rc.0
2020-06-05 00:16:45 -07:00
8f875f80f5 Update Grafana from v7.0.1 to v7.0.3
* https://github.com/grafana/grafana/releases/tag/v7.0.2
* https://github.com/grafana/grafana/releases/tag/v7.0.3
2020-06-03 12:31:58 -07:00
16c0b9152b Update kube-state-metrics from v1.9.6 to v1.9.7
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.7
2020-06-03 11:35:10 -07:00
99dbce67a3 Tweak minor style elements of issue templates 2020-05-31 16:19:33 -07:00
20bfd69780 Change Kubelet container image publishing
* Build Kubelet container images internally and publish
to Quay and Dockerhub (new) as an alternative in case of
registry outage or breach
* Use our infra to provide single and multi-arch (default)
Kublet images for possible future use
* Docs: Show how to use alternative Kubelet images via
snippets and a systemd dropin (builds on #737)

Changes:

* Update docs with changes to Kubelet image building
* If you prefer to trust images built by Quay/Dockerhub,
automated image builds are still available with unique
tags (albeit with some limitations):
  * Quay automated builds are tagged `build-{short_sha}`
  (limit: only amd64)
  * Dockerhub automated builts are tagged `build-{tag}`
  and `build-master` (limit: only amd64, no shas)

Links:

* Kubelet: https://github.com/poseidon/kubelet
* Docs: https://typhoon.psdn.io/topics/security/#container-images
* Registries:
  * quay.io/poseidon/kubelet
  * docker.io/psdn/kubelet
2020-05-30 23:34:23 -07:00
ba44408b76 Update Calico from v3.14.0 to v3.14.1
* https://docs.projectcalico.org/v3.14/release-notes/
2020-05-30 22:08:37 -07:00
455175d9e6 Update the fallback issue template
* Even "blank" issues need to fill out the fallback
template
2020-05-28 00:06:59 -07:00
d45804b1f6 Update Github issue template to use drop-downs (#747)
* Create a stricter bug report template
* Highlight topics that are not accepted in issues: operation, support, debugging, advice, or Kubernetes concepts
* Add a section to strongly suggest bug reports link a PR or describe a solution. This may be able to weed out topics that aren't focused bug reports
2020-05-27 23:49:25 -07:00
907a96916f Update mkdocs-material from v5.2.0 to v5.2.2
* https://github.com/squidfunk/mkdocs-material/releases/tag/5.2.2
2020-05-27 21:49:40 -07:00
187bb17d39 Update Grafana from v7.0.0 to v7.0.1
* https://github.com/grafana/grafana/releases/tag/v7.0.1
2020-05-27 21:35:24 -07:00
abc31c3711 Update node-exporter from v1.0.0-rc.1 to v1.0.0
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0
2020-05-27 21:33:03 -07:00
283e14f3e0 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions to those actively
used internally
* Fix terraform fmt
2020-05-22 01:12:53 -07:00
e72f916c8d Update etcd from v3.4.8 to v3.4.9
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v349-2020-05-20
2020-05-22 00:52:20 -07:00
c52f9f8d08 Upgrade docs packages and refresh content
* Promote DigitalOcean from alpha to beta for Fedora
CoreOS and Flatcar Linux
* Upgrade mkdocs-material and PyPI packages for docs
* Replace docs mentions of Container Linux with Flatcar
Linux and move docs/cl to docs/flatcar-linux
* Deprecate CoreOS Container Linux support. Its still
usable for some time, but start removing docs
2020-05-20 23:31:26 -07:00
ecae6679ff Update Kubernetes from v1.18.2 to v1.18.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md
2020-05-20 20:37:39 -07:00
4760543356 Set Kubelet image via kubelet.service KUBELET_IMAGE
* Write the systemd kubelet.service to use `KUBELET_IMAGE`
as the Kubelet. This provides a nice way to use systemd
dropins to temporarily override the image (e.g. during a
registry outage)

Note: Only Typhoon Kubelet images and registries are supported.
2020-05-19 22:39:53 -07:00
09eb208b4e Fix Fedora CoreOS on GCP proposing controller recreate
* With Fedora CoreOS image stream support (#727), the latest
resolved image will change over the lifecycle of a cluster.
* Fix issue where an image diff proposed replacing a Fedora
CoreOS controller on GCP, introduced in #727 (unreleased)
* Also ignore image diffs to the GCP managed instance group
of workers. This aligns with worker AMI diffs being ignored
on AWS and similar on Azure, since workers update themselves.

Background:

* Controller nodes should strictly not be recreated by Terraform,
they are stateful (etcd) and should not be replaced
* Across cloud platforms, OS image diffs are ignored since both
Flatcar Linux and Fedora CoreOS nodes update themselves. For
workers, user-data or disk size diffs (where relevant) are allowed
to recreate workers templates/configs since these are considered
to be user-initiated declarations that a reprovision should be done
2020-05-19 21:41:51 -07:00
8d024d22ad Update etcd from v3.4.7 to v3.4.8
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v348-2020-05-18
2020-05-18 23:50:46 -07:00
3bdddc452c Update Grafana from v7.0.0-beta2 to v7.0.0
* https://grafana.com/docs/grafana/latest/guides/whats-new-in-v7-0/
2020-05-18 23:42:32 -07:00
ff4187a1fb Use new Azure subnet to set address_prefixes list
* Update Azure subnet `address_prefix` to `azure_prefixes` list
* Fix warning that `address_prefix` is deprecated
* Require `terraform-provider-azurerm` v2.8.0+ (action required)

Rel: https://github.com/terraform-providers/terraform-provider-azurerm/pull/6493
2020-05-18 23:35:47 -07:00
2578be1f96 Rollback Grafana to v7.0.0-beta3, v7.0.0 image is missing
* Grafana hasn't published the v7.0.0 image yet
2020-05-16 12:32:10 -07:00
90edcd3d77 Update node-exporter from v1.0.0-rc.0 to v1.0.0-rc.1
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.1
2020-05-15 18:03:19 -07:00
a927c7c790 Update kube-state-metrics from v1.9.5 to v1.9.6
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.6
2020-05-15 17:42:24 -07:00
d952576d2f Update Grafana from v7.0.0-beta3 to v7.0.0
* https://github.com/grafana/grafana/releases/tag/7.0.0
2020-05-15 17:38:59 -07:00
70e389f37f Restore use of Flatcar Linux Azure Marketplace image
* Switch Flatcar Linux Azure to use the Marketplace image
from Kinvolk (offer `flatcar-container-linux-free`)
* Accepting Azure Marketplace terms is still neccessary,
update docs to show accepting the free offer rather than
BYOL

* Upstream Flatcar: https://github.com/flatcar-linux/Flatcar/issues/82
* Typhoon: https://github.com/poseidon/typhoon/issues/703
2020-05-13 22:50:24 -07:00
a18bd0a707 Highlight SELinux enforcing mode in features 2020-05-13 21:57:38 -07:00
01905b00bc Support Fedora CoreOS OS image streams on AWS
* Add `os_stream` variable to set the stream to stable (default),
testing, or next
* Remove unused os_image variable on Fedora CoreOS AWS
2020-05-13 21:45:12 -07:00
f4194cd57a Update Grafana from v7.0.0-beta2 to v7.0.0-beta.3
* https://github.com/grafana/grafana/releases/tag/v7.0.0-beta3
2020-05-09 17:50:40 -07:00
a2db4fa8c4 Update Calico from v3.13.3 to v3.14.0
* https://docs.projectcalico.org/v3.14/release-notes/
2020-05-09 16:05:30 -07:00
358854e712 Fix Calico install-cni crash loop on Pod restarts
* Set a consistent MCS level/range for Calico install-cni
* Note: Rebooting a node was a workaround, because Kubelet
relabels /etc/kubernetes(/cni/net.d)

Background:

* On SELinux enforcing systems, the Calico CNI install-cni
container ran with default SELinux context and a random MCS
pair. install-cni places CNI configs by first creating a
temporary file and then moving them into place, which means
the file MCS categories depend on the containers SELinux
context.
* calico-node Pod restarts creates a new install-cni container
with a different MCS pair that cannot access the earlier
written file (it places configs every time), causing the
init container to error and calico-node to crash loop
* https://github.com/projectcalico/cni-plugin/issues/874

```
mv: inter-device move failed: '/calico.conf.tmp' to
'/host/etc/cni/net.d/10-calico.conflist'; unable to remove target:
Permission denied
Failed to mv files. This may be caused by selinux configuration on
the
host, or something else.
```

Note, this isn't a host SELinux configuration issue.

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/186
2020-05-09 16:01:44 -07:00
b5dabcea31 Use Fedora CoreOS image streams on Google Cloud
* Add `os_stream` variable to set a Fedora CoreOS stream
to `stable` (default), `testing`, or `next`
* Deprecate `os_image` variable. Remove docs about uploading
Fedora CoreOS images manually, this is no longer needed
* https://docs.fedoraproject.org/en-US/fedora-coreos/update-streams/

Rel: https://github.com/coreos/fedora-coreos-docs/pull/70
2020-05-08 01:23:12 -07:00
3f0a5d2715 Update Grafana from v7.0.0-beta1 to v7.0.0-beta2
* https://github.com/grafana/grafana/releases/tag/v7.0.0-beta2
2020-05-07 23:04:44 -07:00
33173c0206 Update Prometheus from v2.18.0 to v2.18.1
* https://github.com/prometheus/prometheus/releases/tag/v2.18.1
2020-05-07 22:59:11 -07:00
70f30d9c07 Update Prometheus from v2.18.0-rc.1 to v2.18.0
* https://github.com/prometheus/prometheus/releases/tag/v2.18.0
2020-05-05 22:31:11 -07:00
6afc1643d9 Update nginx-ingress from v0.30.0 to v0.32.0
* Add support for IngressClass and RBAC authorization
* Since our nginx ingress controller example uses the flag
`--ingress-class=public`, add an IngressClass to go along
with it

Rel: https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class
2020-05-03 23:24:19 -07:00
e71e27e769 Update Prometheus from v2.17.2 to v2.18.0-rc.1
* https://github.com/prometheus/prometheus/releases/tag/v2.18.0-rc.1
2020-04-29 20:57:48 -07:00
64035005d4 Update Grafana from v6.7.2 to v7.0.0-beta1
* https://github.com/grafana/grafana/releases/tag/v7.0.0-beta1
2020-04-29 20:53:30 -07:00
317416b316 Use Terraform element wrap-around for AWS controllers subnet_id (#714)
* Fix Terraform plan error when controller_count exceeds available AWS zones (e.g. 5 controllers)
2020-04-29 20:41:08 -07:00
2c1af917ec Update recommended Terraform provider versions
* Sync the Terraform provider plugin versions to those
actively used and tested by the author
* Fix terraform fmt
2020-04-28 19:57:50 -07:00
4ac2d94999 Add Fedora CoreOS Azure docs to site navigation
* Fix missing Fedora CoreOS Azure docs
2020-04-28 19:54:37 -07:00
fd044ee117 Enable Kubelet TLS bootstrap and NodeRestriction
* Enable bootstrap token authentication on kube-apiserver
* Generate the bootstrap.kubernetes.io/token Secret that
may be used as a bootstrap token
* Generate a bootstrap kubeconfig (with a bootstrap token)
to be securely distributed to nodes. Each Kubelet will use
the bootstrap kubeconfig to authenticate to kube-apiserver
as `system:bootstrappers` and send a node-unique CSR for
kube-controller-manager to automatically approve to issue
a Kubelet certificate and kubeconfig (expires in 72 hours)
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the `system:node-bootstrapper`
ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr nodeclient ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr selfnodeclient ClusterRole
* Enable NodeRestriction admission controller to limit the
scope of Node or Pod objects a Kubelet can modify to those of
the node itself
* Ability for a Kubelet to delete its Node object is retained
as preemptible nodes or those in auto-scaling instance groups
need to be able to remove themselves on shutdown. This need
continues to have precedence over any risk of a node deleting
itself maliciously

Security notes:

1. Issued Kubelet certificates authenticate as user `system:node:NAME`
and group `system:nodes` and are limited in their authorization
to perform API operations by Node authorization and NodeRestriction
admission. Previously, a Kubelet's authorization was broader. This
is the primary security motivation.

2. The bootstrap kubeconfig credential has the same sensitivity
as the previous generated TLS client-certificate kubeconfig.
It must be distributed securely to nodes. Its compromise still
allows an attacker to obtain a Kubelet kubeconfig

3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers
a slight security improvement.
  * An attacker who obtains the kubeconfig can likely obtain the
  bootstrap kubeconfig as well, to obtain the ability to renew
  their access
  * A compromised bootstrap kubeconfig could plausibly be handled
  by replacing the bootstrap token Secret, distributing the token
  to new nodes, and expiration. Whereas a compromised TLS-client
  certificate kubeconfig can't be revoked (no CRL). However,
  replacing a bootstrap token can be impractical in real cluster
  environments, so the limited lifetime is mostly a theoretical
  benefit.
  * Cluster CSR objects are visible via kubectl which is nice

4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet
clients have more identity information, which can improve the
utility of audits and future features

Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/185
2020-04-28 19:35:33 -07:00
38a6bddd06 Update Calico from v3.13.1 to v3.13.3
* https://docs.projectcalico.org/v3.13/release-notes/
2020-04-23 23:58:02 -07:00
d8966afdda Remove extraneous sudo from layout asset unpacking 2020-04-22 20:28:01 -07:00
84ed0a31c3 Update Prometheus from v2.17.1 to v2.17.2
* https://github.com/prometheus/prometheus/releases/tag/v2.17.2
2020-04-20 18:09:24 -07:00
fcbee12334 Fix race condition creating DigitalOcean firewall rules
* DigitalOcean firewall rules should reference Terraform tag
resources rather than using tag strings. Otherwise, terraform
apply can fail (neeeds rerun) if a tag has not yet been created
2020-04-19 16:55:02 -07:00
feac94605a Fix bootstrap mount to use shared volume SELinux label
* Race: During initial bootstrap, static control plane pods
could hang with Permission denied to bootstrap secrets. A
manual fix involved restarting Kubelet, which relabeled mounts
The race had no effect on subsequent reboots.
* bootstrap.service runs podman with a private unshared mount
of /etc/kubernetes/bootstrap-secrets which uses an SELinux MCS
label with a category pair. However, bootstrap-secrets should
be shared as its mounted by Docker pods kube-apiserver,
kube-scheduler, and kube-controller-manager. Restarting Kubelet
was a manual fix because Kubelet relabels all /etc/kubernetes
* Fix bootstrap Pod to use the shared volume label, which leaves
bootstrap-secrets files with SELinux level s0 without MCS
* Also allow failed bootstrap.service to be re-applied. This was
missing on bare-metal and AWS
2020-04-19 16:31:32 -07:00
2b1b918b43 Revert Flatcar Linux Azure to manual upload images
* Initial support for Flatcar Linux on Azure used the Flatcar
Linux Azure Marketplace images (e.g. `flatcar-stable`) in
https://github.com/poseidon/typhoon/pull/664
* Flatcar Linux Azure Marketplace images have some unresolved
items https://github.com/poseidon/typhoon/issues/703
* Until the Marketplace items are resolved, revert to requiring
Flatcar Linux's images be manually uploaded (like GCP and
DigitalOcean)
2020-04-18 15:40:57 -07:00
bf22222f7d Remove temporary workaround for v1.18.0 apply issue
* In v1.18.0, kubectl apply would fail to apply manifests if any
single manifest was unable to validate. For example, if a CRD and
CR were defined in the same directory, apply would fail since the
CR would be invalid as the CRD wouldn't exist
* Typhoon temporary workaround was to separate CNI CRD manifests
and explicitly apply them first. No longer needed in v1.18.1+
* Kubernetes v1.18.1 restored the prior behavior where kubectl apply
applies as many valid manifests as it can. In the example above, the
CRD would be applied and the CR could be applied if the kubectl apply
was re-run (allowing for apply loops).
* Upstream fix: https://github.com/kubernetes/kubernetes/pull/89864
2020-04-16 23:49:55 -07:00
671eacb86e Update Kubernetes from v1.18.1 to v1.18.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#changelog-since-v1181
2020-04-16 23:40:52 -07:00
e2d4af43be Fix Fedora CoreOS Azure MTU with Calico
* With Calico VXLAN on Fedora CoreOS the 1450 MTU should
be used
2020-04-12 23:20:04 -07:00
5c4a3f73d5 Add support for Fedora CoreOS on Azure
* Add `azure/fedora-coreos/kubernetes` module
2020-04-12 16:35:49 -07:00
76ab4c4c2a Change container-linux module preference to Flatcar Linux
* No change to Fedora CoreOS modules
* For Container Linx AWS and Azure, change the `os_image` default
from coreos-stable to flatcar-stable
* For Container Linux GCP and DigitalOcean, change `os_image` to
be required since users should upload a Flatcar Linux image and
set the variable
* For Container Linux bare-metal, recommend users change the
`os_channel` to Flatcar Linux. No actual module change.
2020-04-11 14:52:30 -07:00
1627ecaf27 Fix docs TOC to include Fedora CoreOS DigitalOcean 2020-04-11 14:07:09 -07:00
1420700bc0 Update CHANGES for v1.18.1 release
* Change order of modules in the README
2020-04-11 13:23:49 -07:00
80538e2953 Add support for Fedora CoreOS on DigitalOcean
* Add `digital-ocean/fedora-coreos/kubernetes` module
* DigitalOcean custom uploaded images do not permit
droplet IPv6 networking
2020-04-09 23:55:29 -07:00
73af2f3b7c Update Kubernetes from v1.18.0 to v1.18.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1181
2020-04-08 19:41:48 -07:00
17ea547723 Update etcd from v3.4.5 to v3.4.7
* https://github.com/etcd-io/etcd/releases/tag/v3.4.7
* https://github.com/etcd-io/etcd/releases/tag/v3.4.6
2020-04-06 21:09:25 -07:00
2b5dfece93 Update Grafana from v6.7.1 to v6.7.2
* https://github.com/grafana/grafana/releases/tag/v6.7.2
2020-04-04 13:13:19 -07:00
d47d40b517 Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh upstream Prometheus rules and alerts and Grafana
dashboards
* All Loki recording rules for convenience
2020-03-31 21:53:01 -07:00
3c1be7b0e0 Fix terraform fmt 2020-03-31 21:42:51 -07:00
bbbaf949f9 Fix UDP outbound and clock sync timeouts on Azure workers
* Add "lb" outbound rule for worker TCP _and_ UDP traffic
* Fix Azure worker nodes clock synchronization being inactive
due to timeouts reaching the CoreOS / Flatcar NTP pool
* Fix Azure worker nodes not providing outbount UDP connectivity

Background:

Azure provides VMs outbound connectivity either by having a public
IP or via an SNAT masquerade feature bundled with their virtual
load balancing abstraction (in contrast with, say, a NAT gateway).

Azure worker nodes have only a private IP, but are associated with
the cluster load balancer's backend pool and ingress frontend IP.
Outbound traffic uses SNAT with this frontend IP. A subtle detail
with Azure SNAT seems to be that since both inbound lb_rule's are
TCP only, outbound UDP traffic isn't SNAT'd (highlights the reasons
Azure shouldn't have conflated inbound load balancing with outbound
SNAT concepts). However, adding a separate outbound rule and
disabling outbound SNAT on our ingress lb_rule's we can tell Azure
to continue load balancing as before, and support outbound SNAT for
worker traffic of both the TCP and UDP protocol.

Fixes clock synchronization timeouts:

```
systemd-timesyncd[786]: Timed out waiting for reply from
45.79.36.123:123 (3.flatcar.pool.ntp.org)
```

Azure controller nodes have their own public IP, so controllers (and
etcd) nodes have not had clock synchronization or outbound UDP issues
2020-03-31 21:00:16 -07:00
135c6182b8 Update flannel from v0.11.0 to v0.12.0
* https://github.com/coreos/flannel/releases/tag/v0.12.0
2020-03-31 18:31:59 -07:00
c53dc66d4a Rename Container Linux snippets variable for consistency
* Rename controller_clc_snippets to controller_snippets (cloud platforms)
* Rename worker_clc_snippets to worker_snippets (cloud platforms)
* Rename clc_snippets to snippets (bare-metal)
2020-03-31 18:25:51 -07:00
9960972726 Fix bootstrap regression when networking="flannel"
* Fix bootstrap error for missing `manifests-networking/crd*yaml`
when `networking = "flannel"`
* Cleanup manifest-networking directory left during bootstrap
* Regressed in v1.18.0 changes for Calico https://github.com/poseidon/typhoon/pull/675
2020-03-31 18:21:59 -07:00
bac5acb3bd Change default kube-system DaemonSet tolerations
* Change kube-proxy, flannel, and calico-node DaemonSet
tolerations to tolerate `node.kubernetes.io/not-ready`
and `node-role.kubernetes.io/master` (i.e. controllers)
explicitly, rather than tolerating all taints
* kube-system DaemonSets will no longer tolerate custom
node taints by default. Instead, custom node taints must
be enumerated to opt-in to scheduling/executing the
kube-system DaemonSets
* Consider setting the daemonset_tolerations variable
of terraform-render-bootstrap at a later date

Background: Tolerating all taints ruled out use-cases
where certain nodes might legitimately need to keep
kube-proxy or CNI networking disabled
Related: https://github.com/poseidon/terraform-render-bootstrap/pull/179
2020-03-31 01:00:45 -07:00
70bdc9ec94 Allow bootstrap re-apply for Fedora CoreOS GCP
* Problem: Fedora CoreOS images are manually uploaded to GCP. When a
cluster is created with a stale image, Zincati immediately checks
for the latest stable image, fetches, and reboots. In practice,
this can unfortunately occur exactly during the initial cluster
bootstrap phase.

* Recommended: Upload the latest Fedora CoreOS image regularly
* Mitigation: Allow a failed bootstrap.service run (which won't touch
the done ConditionalPathExists) to be re-run by running `terraforma apply`
again. Add a known issue to CHANGES
* Update docs to show the current Fedora CoreOS stable version to
reduce likelihood users see this issue

 Longer term ideas:

* Ideal: Fedora CoreOS publishes a stable channel. Instances will always
boot with the latest image in a channel. The problem disappears since
it works the same way AWS does
* Timer: Consider some timer-based approach to have zincati delay any
system reboots for the first ~30 min of a machine's life. Possibly just
configured on the controller node https://github.com/coreos/zincati/pull/251
* External coordination: For Container Linux, locksmith filled a similar
role and was disabled to allow CLUO to coordinate reboots. By running
atop Kubernetes, it was not possible for the reboot to occur before
cluster bootstrap
* Rely on https://github.com/coreos/zincati/issues/115 to delay the
reboot since bootstrap involves an SSH session
* Use path-based activation of zincati on controllers and set that
path at the end of the bootstrap process

Rel: https://github.com/coreos/fedora-coreos-tracker/issues/239
2020-03-28 18:12:31 -07:00
144bb9403c Add support for Fedora CoreOS snippets
* Refresh snippets customization docs
* Requires terraform-provider-ct v0.5+
2020-03-28 16:15:04 -07:00
5fca08064b Fix Fedora CoreOS AMI to filter for stable images
* Fix issue observed in us-east-1 where AMI filters chose the
latest testing channel release, rather than the stable chanel
* Fedora CoreOS AMI filter selects the latest image with a
matching name, x86_64, and hvm, excluding dev images. Add a
filter for "Fedora CoreOS stable", which seems to be the only
distinguishing metadata indicating the channel
2020-03-28 12:57:45 -07:00
fc686c8fc7 Fix delete-node.service kubectl service exec's
* Fix delete-node service that runs on worker (cloud-only)
shutdown to delete a Kubernetes node. Regressed in #669
(unreleased)
* Use rkt `--exec` to invoke kubectl binary in the kubelet
image
* Use podman `--entrypoint` to invoke the kubectl binary in
the kubelet image
2020-03-28 12:35:23 -07:00
a1a5da6bc2 Add CoreOS Container Linux EOL recommendation to CHANGES
* Recommend that users who have not yet tried Fedora CoreOS or
Flatcar Linux do so. Likely, Container Linux will reach EOL
and platform support / stability ratings will be in a mixed
state. Nevertheless, folks should migrate by September.
2020-03-26 23:41:54 -07:00
076b8e3c42 Update Prometheus from v2.17.0 to v2.17.1
* https://github.com/prometheus/prometheus/releases/tag/v2.17.1
2020-03-26 22:17:13 -07:00
ef5f953e04 Set docker log driver to journald on Fedora CoreOS
* Before Kubernetes v1.18.0, Kubelet only supported kubectl
`--limit-bytes` with the Docker `json-file` log driver so
the Fedora CoreOS default was overridden for conformance.
See https://github.com/poseidon/typhoon/pull/642
* Kubelet v1.18+ implemented support for other docker log
drivers, so the Fedora CoreOS default `journald` can be
used again

Rel: https://github.com/kubernetes/kubernetes/issues/86367
2020-03-26 22:06:45 -07:00
d25f23e675 Update docs from Kubernetes v1.17.4 to v1.18.0 2020-03-25 20:28:30 -07:00
f100a90d28 Update Kubernetes from v1.17.4 to v1.18.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md
2020-03-25 17:51:50 -07:00
c3bf8bcf96 Add Fedora CoreOS to issue template and docs
* Update several Container Linux references to start
referring to Flatcar Linux
* Update docs and mentions of Fedora CoreOS
2020-03-25 00:36:15 -07:00
5d1e4ad333 Deprecate asset_dir variable and remove docs
* Remove docs for the `asset_dir` variable and deprecate
it in CHANGES. It will be removed in an upcoming release
* Typhoon v1.17.0 introduced a new mechanism for managing
and distributing generated assets that stopped relying on
writing out to disk. `asset_dir` became optional and
defaulted to being unset / off (recommended)
2020-03-25 00:00:01 -07:00
9f702c72d2 Rename DigitalOcean image variable to os_image
* Rename variable `image` to `os_image` to match the naming
used for the same purpose on other supported platforms (e.g.
AWS, Azure, GCP)
2020-03-24 23:49:37 -07:00
e556bc2167 Update Prometheus from v2.17.0-rc.3 to v2.17.0
* https://github.com/prometheus/prometheus/releases/tag/v2.17.0
2020-03-24 23:15:49 -07:00
1bf4f3b801 Fix image tag for Container Linux AWS workers
* #669 left one reference to the original SHA tagged image
before the v1.17.4 image tag was applied
2020-03-21 15:44:33 -07:00
590d941f50 Switch from upstream hyperkube image to individual images
* Kubernetes plans to stop releasing the hyperkube container image
* Upstream will continue to publish `kube-apiserver`, `kube-controller-manager`,
`kube-scheduler`, and `kube-proxy` container images to `k8s.gcr.io`
* Upstream will publish Kubelet only as a binary for distros to package,
either as a DEB/RPM on traditional distros or a container image on
container-optimized operating systems
* Typhoon will package the upstream Kubelet (checksummed) and its
dependencies as a container image for use on CoreOS Container Linux,
Flatcar Linux, and Fedora CoreOS
* Update the Typhoon container image security policy to list
`quay.io/poseidon/kubelet`as an official distributed artifact

Hyperkube: https://github.com/kubernetes/kubernetes/pull/88676
Kubelet Container Image: https://github.com/poseidon/kubelet
Kubelet Quay Repo: https://quay.io/repository/poseidon/kubelet
2020-03-21 15:43:05 -07:00
ddc1ff5348 Update Grafana from v6.6.2 to v6.7.1
* https://github.com/grafana/grafana/releases/tag/v6.7.1
2020-03-21 15:27:55 -07:00
61557e89a6 Update Prometheus from v2.16.0 to v2.17.0-rc.3
* https://github.com/prometheus/prometheus/releases/tag/v2.17.0-rc.3
2020-03-19 22:38:05 -07:00
c3ef21dbf5 Update etcd from v3.4.4 to v3.4.5
* https://github.com/etcd-io/etcd/releases/tag/v3.4.5
2020-03-18 20:50:41 -07:00
2a5dddeb9d Promote Fedora CoreOS AWS and Google Cloud
* Promote Fedora CoreOS AWS to stable
* Promote Fedora CoreOS GCP to beta
2020-03-16 22:12:26 -07:00
75fb4e5d11 Remove Container Linux Update Operator (CLUO) addon
* Stop providing example manifests for the Container Linux
Update Operator (CLUO)
* CLUO requires patches to support Kubernetes v1.16+, but the
project and push access is rather unowned
* CLUO hasn't been in active use in our clusters and won't be
relevant beyond Container Linux. Not to say folks can't patch
it and run it on their own. Examples just aren't provided here

Related: https://github.com/coreos/container-linux-update-operator/pull/197
2020-03-16 22:05:17 -07:00
1a139ef6f1 Update recommended Terraform versions and providers
* Sync the documented Terraform versions and provider
plugin versions to those that are actively used/tested
by the author
2020-03-16 21:40:52 -07:00
bc7902f40a Update Kubernetes from v1.17.3 to v1.17.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1174
2020-03-13 00:06:41 -07:00
70bf39bb9a Update Calico from v3.12.0 to v3.13.1
* https://docs.projectcalico.org/v3.13/release-notes/
2020-03-12 23:00:38 -07:00
4e1b8f22df Add support for Flatcar Linux on Azure
* Accept `os_image` "flatcar-stable" and "flatcar-beta" to
use Kinvolk's Flatcar Linux images from the Azure Marketplace

Note: Flatcar Linux Azure Marketplace images require terms be
accepted before use
2020-03-12 22:52:48 -07:00
ab7913a061 Accept initial worker node labels and taints map on bare-metal
* Add `worker_node_labels` map from node name to a list of initial
node label strings
* Add `worker_node_taints` map from node name to a list of initial
node taint strings
* Unlike cloud platforms, bare-metal node labels and taints
are defined via a map from node name to list of labels/taints.
Bare-metal clusters may have heterogeneous hardware so per node
labels and taints are accepted
* Only worker node names are allowed. Workloads are not scheduled
on controller nodes so altering their labels/taints isn't suitable

```
module "mercury" {
  ...

  worker_node_labels = {
    "node2" = ["role=special"]
  }

  worker_node_taints = {
    "node2" = ["role=special:NoSchedule"]
  }
}
```

Related: https://github.com/poseidon/typhoon/issues/429
2020-03-09 00:12:02 -07:00
7b0ea23cdc Upgrade terraform-provider-azurerm to v2.0+
* Add support for `terraform-provider-azurerm` v2.0+. Require
`terraform-provider-azurerm` v2.0+ and drop v1.x support since
the Azure provider major release is not backwards compatible
* Use Azure's new Linux VM and Linux VM Scale Set resources
* Change controller's Azure disk caching to None
* Associate subnets (in addition to NICs) with security groups
(aesthetic)
* If set, change `worker_priority` from `Low` to `Spot` (action required)

Related:

* https://www.terraform.io/docs/providers/azurerm/guides/2.0-upgrade-guide.html
2020-03-08 17:40:13 -07:00
c4683c5bad Refresh Prometheus alerts and Grafana dashboards
* Add 2 min wait before KubeNodeUnreachable to be less
noisy on premeptible clusters
* Add a BlackboxProbeFailure alert for any failing probes
for services annotated `prometheus.io/probe: true`
2020-03-02 20:08:37 -08:00
51cee6d5a4 Change Container Linux etcd-member to fetch with docker://
* Quay has historically generated ACI signatures for images to
facilitate rkt's notions of verification (it allowed authors to
actually sign images, though `--trust-keys-from-https` is in use
since etcd and most authors don't sign images). OCI standardization
didn't adopt verification ideas and checking signatures has fallen
out of favor.
* Fix an issue where Quay no longer seems to be generating ACI
signatures for new images (e.g. quay.io/coreos/etcd:v.3.4.4)
* Don't be alarmed by rkt `--insecure-options=image`. It refers
to disabling image signature checking (i.e. docker pull doesn't
check signatures either)
* System containers for Kubelet and bootstrap have transitioned
to the docker:// transport, so there is precedent and this brings
all the system containers on Container Linux controllers into
alignment
2020-03-02 19:57:45 -08:00
87f9a2fc35 Add automatic worker deletion on Fedora CoreOS clouds
* On clouds where workers can scale down or be preempted
(AWS, GCP, Azure), shutdown runs delete-node.service to
remove a node a prevent NotReady nodes from lingering
* Add the delete-node.service that wasn't carried over
from Container Linux and port it to use podman
2020-02-29 20:22:03 -08:00
6de5cf5a55 Update etcd from v3.4.3 to v3.4.4
* https://github.com/etcd-io/etcd/releases/tag/v3.4.4
2020-02-29 16:19:29 -08:00
3250994c95 Use a route table with separate (rather than inline) routes
* Allow users to extend the route table using a data reference
and adding route resources (e.g. unusual peering setups)
* Note: Internally connecting AWS clusters can reduce cross-cloud
flexibility and inhibits blue-green cluster patterns. It is not
recommended
2020-02-25 23:21:58 -08:00
f4d260645c Update node-exporter from v0.18.1 to v1.0.0-rc.0
* Update mdadm alert rule; node-exporter adds `state` label to
`node_md_disks` and removes `node_md_disks_active`
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.0
2020-02-25 22:29:52 -08:00
d9219a6722 Update nginx-ingress from v0.29.0 to v0.30.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.30.0
2020-02-25 22:11:59 -08:00
60c7eb85ee Update nginx-ingress from v0.28.0 to v0.29.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.29.0
2020-02-22 15:57:59 -08:00
4c964b56a0 Update kube-state-metrics from v1.9.4 to v1.9.5
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.5
2020-02-22 15:21:10 -08:00
1fbd6835f2 Update Grafana from v6.6.1 to v6.6.2
* https://github.com/grafana/grafana/releases/tag/v6.6.2
2020-02-22 15:19:24 -08:00
e4d977bfcd Fix worker_node_labels for initial Fedora CoreOS
* Add Terraform strip markers to consume beginning and
trailing whitespace in templated Kubelet arguments for
podman (Fedora CoreOS only)
* Fix initial `worker_node_labels` being quietly ignored
on Fedora CoreOS cloud platforms that offer the feature
* Close https://github.com/poseidon/typhoon/issues/650
2020-02-22 15:12:35 -08:00
947c2c1815 Update mkdocs-material from v4.6.2 to v4.6.3 2020-02-18 21:59:17 -08:00
4a38fb5927 Update CoreDNS from v1.6.6 to v1.6.7
* https://coredns.io/2020/01/28/coredns-1.6.7-release/
2020-02-18 21:46:19 -08:00
c4e64a9d1b Change Kubelet /var/lib/calico mount to read-only (#643)
* Kubelet only requires read access to /var/lib/calico

Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
2020-02-18 21:40:58 -08:00
7ca03e5219 Update Prometheus from v1.15.2 to v1.16.0
* https://github.com/prometheus/prometheus/releases/tag/v2.16.0
2020-02-14 12:10:56 -08:00
362b3fac5c Add guide for Typhoon with Flatcar Linux on DigitalOcean
* Add docs on manually uploading a Flatcar Linux DigitalOcean
bin image as a custom image and using a data reference
* Set status of Flatcar Linux on DigitalOcean to alpha
* IPv6 is not supported for DigitalOcean custom images
2020-02-14 12:08:58 -08:00
32db59b9eb Update CHANGELOG sections and links 2020-02-14 12:05:51 -08:00
0c53ad52e4 Update recommended Terraform versions and providers
* Sync the documented Terraform versions and provider
plugin versions to those that are actively used/tested
by the author
2020-02-13 14:39:48 -08:00
008817b0aa Promote Fedora CoreOS AWS/bare-metal to beta
* Remove alpha warnings from docs headers
2020-02-13 14:25:22 -08:00
49d3b9e6b3 Set docker log driver to json-file on Fedora CoreOS
* Fix the last minor issue for Fedora CoreOS clusters to pass CNCF's
Kubernetes conformance tests
* Kubelet supports a seldom used feature `kubectl logs --limit-bytes=N`
to trim a log stream to a desired length. Kubelet handles this in the
CRI driver. The Kubelet docker shim only supports the limit bytes
feature when Docker is configured with the default `json-file` logging
driver
* CNCF conformance tests started requiring limit-bytes be supported,
indirectly forcing the log driver choice until either the Kubelet or
the conformance tests are fixed
* Fedora CoreOS defaults Docker to use `journald` (desired). For now,
as a workaround to offer conformant clusters, the log driver can
be set back to `json-file`. RHEL CoreOS likely won't have noticed the
non-conformance since its using crio runtime
* https://github.com/kubernetes/kubernetes/issues/86367

Note: When upstream has a fix, the aim is to drop the docker config
override and use the journald default
2020-02-11 23:00:38 -08:00
1243f395d1 Update Kubernetes from v1.17.2 to v1.17.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1173
2020-02-11 20:22:14 -08:00
846f11097f Update Fedora CoreOS kernel arguments to align with upstream
* Align bare-metal kernel arguments with upstream docs
* Add missing initrd argument which can cause issues if
not present. Fix #638
* Add tty0 and ttyS0 consoles (matches Container Linux)
* Remove unused coreos.inst=yes

Related: https://docs.fedoraproject.org/en-US/fedora-coreos/bare-metal/
2020-02-11 20:11:19 -08:00
ba84f86dc7 Add guide for Typhoon with Flatcar Linux on Google Cloud
* Add docs on manually uploading a Flatcar Linux GCE/GCP gzipped
tarball image as a Compute Engine image for use with the Typhoon
container-linux module
* Set status of Flatcar Linux on Google Cloud to alpha
2020-02-11 19:38:40 -08:00
b49a1d715d Update docs generation packages
* Update mkdocs-material from v4.6.0 to v4.6.2
2020-02-08 15:12:12 -08:00
34c3d7cc39 Update Grafana from v6.6.0 to v6.6.1
* https://github.com/grafana/grafana/releases/tag/v6.6.1
2020-02-08 14:50:33 -08:00
ca96a1335c Update Calico from v3.11.2 to v3.12.0
* https://docs.projectcalico.org/release-notes/#v3120
* Remove reverse packet filter override, since Calico no
longer relies on the setting
* https://github.com/coreos/fedora-coreos-tracker/issues/219
* https://github.com/projectcalico/felix/pull/2189
2020-02-06 00:43:33 -08:00
e339fbd2b6 Update kube-state-metrics from v1.9.3 to v1.9.4
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.4
2020-02-04 21:33:34 -08:00
8cc303c9ac Add module for Fedora CoreOS on Google Cloud
* Add Typhoon Fedora CoreOS on Google Cloud as alpha
* Add docs on uploading the Fedora CoreOS GCP gzipped tarball to
Google Cloud storage to create a boot disk image
2020-02-01 15:21:40 -08:00
b19ba16afa Update nginx-ingress from v0.27.1 to v0.28.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.28.0
2020-01-30 18:00:23 -08:00
d127a7345c Update Grafana from v6.5.3 to v6.6.0
* https://github.com/grafana/grafana/releases/tag/v6.6.0
2020-01-27 20:46:32 -08:00
02a470d2f2 Fix minor typo in announcement date 2020-01-23 08:57:01 -08:00
5643ad525f Promote Fedora CoreOS from preview to alpha in docs
* Add an announcement to the website as well
2020-01-23 08:47:18 -08:00
d5b7ce8f27 Update kube-state-metrics from v1.9.2 to v1.9.3
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.3
2020-01-23 00:03:16 -08:00
1cda5bcd2a Update Kubernetes from v1.17.1 to v1.17.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1172
2020-01-21 18:27:39 -08:00
bda73264f7 Update nginx-ingress from v0.26.1 to v0.27.1
* Change runAsUser from 33 to 101 for new alpine-based image
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.1
2020-01-20 15:22:16 -08:00
dd930a2ff9 Update bare-metal Fedora CoreOS image location
* Use Fedora CoreOS production download streams (change)
* Use live PXE kernel and initramfs images
* https://getfedora.org/coreos/download/
* Update docs example to use public images (cache is still
recommended at large scale) and stable stream
2020-01-20 14:44:06 -08:00
03ff3a9cf3 Update kube-state-metrics from v1.9.1 to v1.9.2
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.2
2020-01-18 15:32:10 -08:00
48703f9906 Update Grafana from v6.5.2 to v6.5.3
* https://github.com/grafana/grafana/releases/tag/v6.5.3
2020-01-18 15:30:39 -08:00
7ddd3d096d Fix link in maintenance docs
* Also a fix version mention, since Terraform v0.12 was
added in Typhoon v1.15.0
2020-01-18 15:19:27 -08:00
7daabd28b5 Update Calico from v3.11.1 to v3.11.2
* https://docs.projectcalico.org/v3.11/release-notes/
2020-01-18 13:45:24 -08:00
b642e3b41b Update Kubernetes from v1.17.0 to v1.17.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1171
2020-01-14 20:21:36 -08:00
ac786a2efc Update AWS Fedora CoreOS AMI filter for fedora-coreos-31
* Select the most recent fedora-coreos-31 AMI on AWS, instead
of the most recent fedora-coreos-30 AMI (Nov 27, 2019)
* Evaluated with fedora-coreos-31.20200108.2.0-hvm
2020-01-14 20:06:14 -08:00
073fcb7067 Fix bare-metal instruction for watching install to disk
* Original instructions were to watch install to disk by SSH'ing
via port 2222 following Typhoon v1.10.1. Restore that message,
since the version number in the instruction was incorrectly bumped
on each release
2020-01-12 14:16:00 -08:00
ce0569e03b Remove unneeded Kubelet /var/run mount on Fedora CoreOS
* /var/run symlinks to /run (already mounted)
2020-01-11 15:15:39 -08:00
0e2fc89f78 Update kube-state-metrics from v1.9.0 to v1.9.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.1
2020-01-11 14:15:55 -08:00
b1f521fc4a Allow terraform-provider-google v3.x plugin versions
* Typhoon Google Cloud is compatible with `terraform-provider-google`
v3.x releases
* No v3.x specific features are used, so v2.19+ provider versions are
still allowed, to ease migrations
2020-01-11 14:07:18 -08:00
73588cfad3 Update Prometheus from v2.15.1 to v2.15.2
* https://github.com/prometheus/prometheus/releases/tag/v2.15.2
2020-01-06 22:08:34 -08:00
0223b31e1a Ensure /etc/kubernetes exists following Kubelet inlining
* Inlining the Kubelet service removed the need for the
kubelet.env file declared in Ignition. However, on some
platforms, this removed the guarantee that /etc/kubernetes
exists. Bare-Metal and DigitalOcean distribute the kubelet
kubeconfig through Terraform file provisioner (scp) and
place it in (now missing) /etc/kubernetes
* https://github.com/poseidon/typhoon/pull/606
* Fix bare-metal and DigitalOcean Ignition to ensure the
desired directory exists following first boot from disk
* Cloud platforms with worker pools distribute the kubeconfig
through Ignition user data (no impact or need)
2020-01-06 21:38:20 -08:00
bb586b60da Reduce Prometheus addon's node-exporter tolerations
* Change node-exporter DaemonSet tolerations from tolerating
all possible NoSchedule taints to tolerating the master taint
and the not ready taint (we'd like metrics regardless)
* Users who add custom node taints must add their custom taints
to the addon node-exporter DaemonSet. As an addon, its expected
users copy and manipulate manifests out-of-band in their own
systems
2020-01-06 21:24:24 -08:00
43e05b9131 Enable kube-proxy metrics and allow Prometheus scrapes
* Configure kube-proxy --metrics-bind-address=0.0.0.0 (default
127.0.0.1) to serve metrics on 0.0.0.0:10249
* Add firewall rules to allow Prometheus (resides on a worker) to
scrape kube-proxy service endpoints on controllers or workers
* Add a clusterIP: None service for kube-proxy endpoint discovery
2020-01-06 21:11:18 -08:00
b2eb3e05d0 Disable Kubelet 127.0.0.1.10248 healthz endpoint
* Kubelet runs a healthz server listening on 127.0.0.1:10248
by default. Its unused by Typhoon and can be disabled
* https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
2019-12-29 11:23:25 -08:00
f1f4cd6fc0 Inline Container Linux kubelet.service, deprecate kubelet-wrapper
* Change kubelet.service on Container Linux nodes to ExecStart Kubelet
inline to replace the use of the host OS kubelet-wrapper script
* Express rkt run flags and volume mounts in a clear, uniform way to
make the Kubelet service easier to audit, manage, and understand
* Eliminate reliance on a Container Linux kubelet-wrapper script
* Typhoon for Fedora CoreOS developed a kubelet.service that similarly
uses an inline ExecStart (except with podman instead of rkt) and a
more minimal set of volume mounts. Adopt the volume improvements:
  * Change Kubelet /etc/kubernetes volume to read-only
  * Change Kubelet /etc/resolv.conf volume to read-only
  * Remove unneeded /var/lib/cni volume mount

Background:

* kubelet-wrapper was added in CoreOS around the time of Kubernetes v1.0
to simplify running a CoreOS-built hyperkube ACI image via rkt-fly. The
script defaults are no longer ideal (e.g. rkt's notion of trust dates
back to quay.io ACI image serving and signing, which informed the OCI
standard images we use today, though they still lack rkt's signing ideas).
* Shipping kubelet-wrapper was regretted at CoreOS, but remains in the
distro for compatibility. The script is not updated to track hyperkube
changes, but it is stable and kubelet.env overrides bridge most gaps
* Typhoon Container Linux nodes have used kubelet-wrapper to rkt/rkt-fly
run the Kubelet via the official k8s.gcr.io hyperkube image using overrides
(new image registry, new image format, restart handling, new mounts, new
entrypoint in v1.17).
* Observation: Most of what it takes to run a Kubelet container is defined
in Typhoon, not in kubelet-wrapper. The wrapper's value is now undermined
by having to workaround its dated defaults. Typhoon may be better served
defining Kubelet.service explicitly
* Typhoon for Fedora CoreOS developed a kubelet.service without the use
of a host OS kubelet-wrapper which is both clearer and eliminated some
volume mounts
2019-12-29 11:17:26 -08:00
50db3d0231 Rename CLC files and favor Terraform list index syntax
* Rename Container Linux Config (CLC) files to *.yaml to align
with Fedora CoreOS Config (FCC) files and for syntax highlighting
* Replace common uses of Terraform `element` (which wraps around)
with `list[index]` syntax to surface index errors
2019-12-28 12:14:01 -08:00
11565ffa8a Update Calico from v3.10.2 to v3.11.1
* https://docs.projectcalico.org/v3.11/release-notes/
2019-12-28 11:08:03 -08:00
a4e843693f Update Prometheus from v2.15.0 to v2.15.1
* https://github.com/prometheus/prometheus/releases/tag/v2.15.1
2019-12-26 09:12:55 -05:00
f48e43c0b1 Update Prometheus from v2.14.0 to v2.15.0
* https://github.com/prometheus/prometheus/releases/tag/v2.15.0
2019-12-24 10:52:19 -05:00
daa8d9d9ec Update CoreDNS from v1.6.5 to v1.6.6
* https://coredns.io/2019/12/11/coredns-1.6.6-release/
2019-12-22 10:47:19 -05:00
52d11096dc Update kube-state-metrics from v1.9.0-rc.1 to v1.9.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0
2019-12-20 13:53:37 -08:00
00c431a9d2 Add Kubelet kubeconfig output for DigitalOcean
* Allow the raw kubelet kubeconfig to be consumed via
Terraform output
2019-12-18 23:20:55 -08:00
0ecb995890 Update kube-state-metrics from v1.8.0 to v1.9.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0
2019-12-14 17:20:49 -08:00
1b9fa2e688 Update Grafana from v6.5.1 to v6.5.2
* https://github.com/grafana/grafana/releases/tag/v6.5.2
2019-12-14 15:25:48 -08:00
2d8e367664 Update mkdocs-material from v4.5.1 to v4.6.0 2019-12-14 15:02:28 -08:00
c3e22f3d13 Fix minor example typo in README 2019-12-10 23:14:12 -08:00
f69dc2ea0f Update CHANGES and tutorial notes for release
* Update recommended Terraform and provider plugin versions
* Update the rough count of resources created per cluster
since its not been refreshed in a while (will vary based
on cluster options)
2019-12-10 23:03:39 -08:00
c0ce04e1de Update Calico from v3.10.1 to v3.10.2
* https://docs.projectcalico.org/v3.10/release-notes/
2019-12-09 21:03:00 -08:00
ed3550dce1 Update systemd services for the v0.17.x hyperkube
* Binary asset locations within the upstream hyperkube image
changed https://github.com/kubernetes/kubernetes/pull/84662
* Fix Container Linux and Flatcar Linux kubelet.service
(rkt-fly with fairly dated CoreOS kubelet-wrapper)
* Fix Fedora CoreOS kubelet.service (podman)
* Fix Fedora CoreOS bootstrap.service
* Fix delete-node kubectl usage for workers where nodes may
delete themselves on shutdown (e.g. preemptible instances)
2019-12-09 18:39:17 -08:00
de36d99afc Update Kubernetes from v1.16.3 to v1.17.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md/#v1170
2019-12-09 18:31:58 -08:00
4fce9485c8 Reduce kube-controller-manager pod eviction timeout from 5m to 1m
* Reduce time to delete pods on unready nodes from 5m to 1m
* Present since v1.13.3, but mistakenly removed in v1.16.0 static
pod control plane migration

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/148
* https://github.com/poseidon/terraform-render-bootstrap/pull/164
2019-12-08 22:58:31 -08:00
178afe4a9b Reduce apiserver metrics cardinality and extraneous labels
* Stop mapping node labels to targets discovered via Kubernetes
nodes (e.g. etcd, kubelet, cadvisor). It is rarely useful to
store node labels (e.g. kubernetes.io/os=linux) on these metrics
* kube-apiserver's apiserver_request_duration_seconds_bucket metric
has a high cardinality that includes labels for the API group, verb,
scope, resource, and component for each object type, including for
each CRD. This one metric has ~10k time series in a typical cluster
(btw 10-40% of total)
* Removing the apiserver request duration outright would make latency
alerts a NoOp and break a Grafana apiserver panel. Instead, drop series
that have a "group" label. Effectively, only request durations for
core Kubernetes APIs will be kept (e.g. cardinality won't grow with
each CRD added). This reduces the metric to ~2k unique series
2019-12-08 22:48:25 -08:00
d9c7a9e049 Add/update docs for asset_dir and kubeconfig usage
* Original tutorials favored including the platform (e.g.
google-cloud) in modules (e.g. google-cloud-yavin). Prefer
naming conventions where each module / cluster has a simple
name (e.g. yavin) since the platform is usually redundant
* Retain the example cluster naming themes per platform
2019-12-05 22:56:42 -08:00
2837275265 Introduce cluster creation without local writes to asset_dir
* Allow generated assets (TLS materials, manifests) to be
securely distributed to controller node(s) via file provisioner
(i.e. ssh-agent) as an assets bundle file, rather than relying
on assets being locally rendered to disk in an asset_dir and
then securely distributed
* Change `asset_dir` from required to optional. Left unset,
asset_dir defaults to "" and no assets will be written to
files on the machine that runs terraform apply
* Enhancement: Managed cluster assets are kept only in Terraform
state, which supports different backends (GCS, S3, etcd, etc) and
optional encryption. terraform apply accesses state, runs in-memory,
and distributes sensitive materials to controllers without making
use of local disk (simplifies use in CI systems)
* Enhancement: Improve asset unpack and layout process to position
etcd certificates and control plane certificates more cleanly,
without unneeded secret materials

Details:

* Terraform file provisioner support for distributing directories of
contents (with unknown structure) has been limited to reading from a
local directory, meaning local writes to asset_dir were required.
https://github.com/poseidon/typhoon/issues/585 discusses the problem
and newer or upcoming Terraform features that might help.
* Observation: Terraform provisioner support for single files works
well, but iteration isn't viable. We're also constrained to Terraform
language features on the apply side (no extra plugins, no shelling out)
and CoreOS / Fedora tools on the receive side.
* Take a map representation of the contents that would have been splayed
out in asset_dir and pack/encode them into a single file format devised
for easy unpacking. Use an awk one-liner on the receive side to unpack.
In pratice, this has worked well and its rather nice that a single
assets file is transferred by file provisioner (all or none)

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162
2019-12-05 01:24:50 -08:00
5fa002f4f7 Update mkdocs-material from v4.5.0 to v4.5.1 2019-12-02 21:21:16 -08:00
aa275796cb Fix DigitalOcean controller and worker ipv4/ipv6 outputs (#594)
* Fix controller and worker ipv4/ipv4 outputs to be lists of strings
* With Terraform v0.11 syntax, an enclosing list was required to coerce the
output to be a list of strings
* With Terraform v0.12 syntax, the enclosing list shouldn't be needed
2019-12-02 21:20:47 -08:00
26674083b6 Update Grafana from v6.5.0 to v6.5.1
* https://github.com/grafana/grafana/releases/tag/v6.5.1
2019-11-28 14:11:25 -08:00
030a4cec19 Update Grafana from v6.4.4 to v6.5.0
* https://grafana.com/docs/guides/whats-new-in-v6-5/
2019-11-25 22:45:58 -08:00
ddea7dc452 Use new resource dashboards in Grafana deployment
* kubernetes-mixin pod resource dashboards were split into
two ConfigMap parts because they provide richer networking
details
* New dashboards have been used by the author at the global
level, but were missing in the per-cluster Grafana tracked
here
2019-11-25 22:27:11 -08:00
4b485a9bf2 Fix recent deletion of bootstrap module pinned SHA
* Fix deletion of bootstrap module pinned SHA, which was
introduced recently through an automation mistake creating
https://github.com/poseidon/typhoon/pull/589
2019-11-21 22:34:09 -08:00
4704b494f0 Update mkdocs-material from v4.4.3 to v4.4.0
* Upgrade dependency packages as well
2019-11-18 23:05:29 -08:00
525ae23305 Add node-exporter alerts and Grafana dashboard
* Add Prometheus alerts from node-exporter
* Add Grafana dashboard nodes.json, from node-exporter
* Not adding recording rules, since those are only used
by some node-exporter USE dashboards not being included
2019-11-16 13:47:20 -08:00
8a9e8595ae Fix terraform fmt formatting 2019-11-13 23:44:02 -08:00
19ee57dc04 Use GCP region_instance_group_manager version block format
* terraform-provider-google v2.19.0 deprecates `instance_template`
within `google_compute_region_instance_group_manager` in order to
support a scheme with multiple version blocks. Adapt our single
version to the new format to resolve deprecation warnings.
* Fixes: Warning: "instance_template": [DEPRECATED] This field
will be replaced by `version.instance_template` in 3.0.0
* Require terraform-provider-google v2.19.0+ (action required)
2019-11-13 17:41:13 -08:00
0e4ee5efc9 Add small CPU resource requests to static pods
* Set small CPU requests on static pods kube-apiserver,
kube-controller-manager, and kube-scheduler to align with
upstream tooling and for edge cases
* Effectively, a practical case for these requests hasn't been
observed. However, a small static pod CPU request may offer
a slight benefit if a controller became overloaded and the
below mechanisms were insufficient

Existing safeguards:

* Control plane nodes are tainted to isolate them from
ordinary workloads. Even dense workloads can only compress
CPU resources on worker nodes.
* Control plane static pods use the highest priority class, so
contention favors control plane pods (over say node-exporter)
and CPU is compressible too.

See: https://github.com/poseidon/terraform-render-bootstrap/pull/161
2019-11-13 17:18:45 -08:00
a271b9f340 Update CoreDNS from v1.6.2 to v1.6.5
* Add health `lameduck` option 5s. Before CoreDNS shuts down, it will
wait and report unhealthy for 5s to allow time for plugins to shutdown
cleanly
* Minor bug fixes over a few releases
* https://coredns.io/2019/08/31/coredns-1.6.3-release/
* https://coredns.io/2019/09/27/coredns-1.6.4-release/
* https://coredns.io/2019/11/05/coredns-1.6.5-release/
2019-11-13 16:47:44 -08:00
cb0598e275 Adopt Terraform v0.12 templatefile function
* Update terraform-render-bootstrap module to adopt the
Terrform v0.12 templatefile function feature to replace
the use of terraform-provider-template's `template_dir`
* Require Terraform v0.12.6+ which adds `for_each`

Background:

* `template_dir` was added to `terraform-provider-template`
to add support for template directory rendering in CoreOS
Tectonic Kubernetes distribution (~2017)
* Terraform v0.12 introduced a native `templatefile` function
and v0.12.6 introduced native `for_each` support (July 2019)
that makes it possible to replace `template_dir` usage
2019-11-13 16:33:36 -08:00
261 changed files with 15974 additions and 12542 deletions

View File

@ -1,33 +0,0 @@
<!-- Fill in either the 'Bug' or 'Feature Request' section -->
## Bug
### Environment
* Platform: aws, azure, bare-metal, google-cloud, digital-ocean
* OS: container-linux, flatcar-linux
* Release: Typhoon version or Git SHA (reporting latest is **not** helpful)
* Terraform: `terraform version` (reporting latest is **not** helpful)
* Plugins: Provider plugin versions (reporting latest is **not** helpful)
### Problem
Describe the problem.
### Desired Behavior
Describe the goal.
### Steps to Reproduce
Provide clear steps to reproduce the issue unless already covered.
## Feature Request
### Feature
Describe the feature and what problem it solves.
### Tradeoffs
What are the pros and cons of this feature? How will it be exercised and maintained?

39
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@ -0,0 +1,39 @@
---
name: Bug report
about: Report a bug to improve the project
title: ''
labels: ''
assignees: ''
---
<!-- READ: Issues are used to receive focused bug reports from users and to track planned future enhancements by the authors. Topics like cluster operation, support, debugging help, advice, and Kubernetes concepts are out of scope and should not use issues-->
**Description**
A clear and concise description of what the bug is.
**Steps to Reproduce**
Provide clear steps to reproduce the bug.
- [ ] Relevant error messages if appropriate (concise, not a dump of everything).
- [ ] Explored using a vanilla cluster from the [tutorials](https://typhoon.psdn.io/#documentation). Ruled out [customizations](https://typhoon.psdn.io/advanced/customization/).
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment**
* Platform: aws, azure, bare-metal, google-cloud, digital-ocean
* OS: fedora-coreos, flatcar-linux (include release version)
* Release: Typhoon version or Git SHA (reporting latest is **not** helpful)
* Terraform: `terraform version` (reporting latest is **not** helpful)
* Plugins: Provider plugin versions (reporting latest is **not** helpful)
**Possible Solution**
<!-- Most bug reports should have some inkling about solutions. Otherwise, your report may be less of a bug and more of a support request (see top).-->
Link to a PR or description.

5
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,5 @@
blank_issues_enabled: true
contact_links:
- name: Security
url: https://typhoon.psdn.io/topics/security/
about: Report security vulnerabilities

15
.github/issue_template.md vendored Normal file
View File

@ -0,0 +1,15 @@
<!-- READ: Issues are used to receive focused bug reports from users and to track planned future enhancements by the authors. Topics like cluster operation, support, debugging help, advice, and Kubernetes concepts are out of scope and should not use issues-->
## Enhancement
### Overview
One paragraph explanation of the enhancement.
### Motivation
Describe the motivation and what problem this solves.
### Tradeoffs
What are the pros and cons of this feature? How will it be exercised and maintained?

View File

@ -2,7 +2,632 @@
Notable changes between versions. Notable changes between versions.
## Latest ## v1.20.0
* Kubernetes [v1.20.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1200)
* Add input variable validations ([#880](https://github.com/poseidon/typhoon/pull/880))
* Require Terraform v0.13+ ([migration guide](https://typhoon.psdn.io/topics/maintenance/#terraform-versions))
* Set output sensitive to suppress console display for some cases ([#885](https://github.com/poseidon/typhoon/pull/885))
* Add service account token [volume projection](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection) ([#897](https://github.com/poseidon/typhoon/pull/897))
* Scope kube-scheduler and kube-controller-manager permissions ([#898](https://github.com/poseidon/typhoon/pull/898))
* Update etcd from v3.4.12 to [v3.4.14](https://github.com/etcd-io/etcd/releases/tag/v3.4.14)
* Update Calico from v3.16.5 to v3.17.1 ([#890](https://github.com/poseidon/typhoon/pull/890))
* Enable Calico MTU auto-detection
* Remove [workaround](https://github.com/poseidon/typhoon/pull/724) to Calico cni-plugin [issue](https://github.com/projectcalico/cni-plugin/issues/874)
* Update Cilium from v1.9.0 to [v1.9.1](https://github.com/cilium/cilium/releases/tag/v1.9.1)
* Relax `terraform-provider-ct` version constraint to v0.6+ ([#893](https://github.com/poseidon/typhoon/pull/893))
* Allow upgrading `terraform-provider-ct` to v0.7.x ([warn](https://typhoon.psdn.io/topics/maintenance/#upgrade-terraform-provider-ct))
### AWS
* Enable Network Load Balancer (NLB) dualstack ([#883](https://github.com/poseidon/typhoon/pull/883))
* NLB subnets assigned both IPv4 and IPv6 addresses
* NLB DNS name has both A and AAAA records
* NLB to target node traffic is IPv4 (no change)
### Bare-Metal
* Remove iSCSI `/etc/iscsi` and `iscsadm` mounts from Kubelet ([#912](https://github.com/poseidon/typhoon/pull/912))
### Fedora CoreOS
#### AWS
* Fix AMI query for which could fail in some regions ([#887](https://github.com/poseidon/typhoon/pull/887))
#### Bare-Metal
* Promote Fedora CoreOS to stable
* Use initramfs and rootfs images as initrd's ([#889](https://github.com/poseidon/typhoon/pull/889))
* Requires Fedora CoreOS version with rootfs images (e.g. 32.20200923.3.0+)
### Addons
* Update Prometheus from v2.22.2 to [v2.23.0](https://github.com/prometheus/prometheus/releases/tag/v2.23.0)
* Update kube-state-metrics from v2.0.0-alpha.2 to [v2.0.0-alpha.3](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.3)
* Update Grafana from v7.3.2 to [v7.3.5](https://github.com/grafana/grafana/releases/tag/v7.3.5)
## v1.19.4
* Kubernetes [v1.19.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1194)
* Update Cilium from v1.8.4 to [v1.9.0](https://github.com/cilium/cilium/releases/tag/v1.9.0)
* Update Calico from v3.16.3 to [v3.16.5](https://github.com/projectcalico/calico/releases/tag/v3.16.5)
* Remove `asset_dir` variable (defaulted off in [v1.17.0](https://github.com/poseidon/typhoon/pull/595), deprecated in [v1.18.0](https://github.com/poseidon/typhoon/pull/678))
### Fedora CoreOS
* Improve `etcd-member.service` systemd unit ([#868](https://github.com/poseidon/typhoon/pull/868))
* Allow a snippet with a systemd dropin to set an alternate image (e.g. mirror)
* Fix local node delete oneshot on node shutdown ([#856](https://github.com/poseidon/typhoon/pull/855))
#### AWS
* Add experimental Fedora CoreOS arm64 support ([docs](https://typhoon.psdn.io/advanced/arm64/), [#875](https://github.com/poseidon/typhoon/pull/875))
* Allow arm64 full-cluster or mixed/hybrid cluster with worker pools
* Add `arch` variable to cluster module
* Add `daemonset_tolerations` variable to cluster module
* Add `node_taints` variable to workers module
* Requires flannel CNI provider and use of experimental AMI (see docs)
### Flatcar Linux
* Rename `container-linux` modules to `flatcar-linux` ([#858](https://github.com/poseidon/typhoon/issues/858)) (**action required**)
* Change on-host system containers from rkt to docker
* Change `etcd-member.service` container runnner from rkt to docker ([#867](https://github.com/poseidon/typhoon/pull/867))
* Change `kubelet.service` container runner from rkt-fly to docker ([#855](https://github.com/poseidon/typhoon/pull/855))
* Change `bootstrap.service` container runner from rkt to docker ([#873](https://github.com/poseidon/typhoon/pull/873))
* Change `delete-node.service` to use docker and an inline ExecStart ([#855](https://github.com/poseidon/typhoon/pull/855))
* Fix local node delete oneshot on node shutdown ([#855](https://github.com/poseidon/typhoon/pull/855))
* Remove CoreOS Container Linux Matchbox profiles ([#859](https://github.com/poseidon/typhoon/pull/858))
### Addons
* Update nginx-ingress from v0.40.2 to [v0.41.2](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.2)
* Update Prometheus from v2.22.0 to [v2.22.1](https://github.com/prometheus/prometheus/releases/tag/v2.22.1)
* Update kube-state-metrics from v2.0.0-alpha.1 to [v2.0.0-alpha.2](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.2)
* Update Grafana from v7.2.1 to [v7.3.2](https://github.com/grafana/grafana/releases/tag/v7.3.2)
## v1.19.3
* Kubernetes [v1.19.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193)
* Update Cilium from v1.8.3 to [v1.8.4](https://github.com/cilium/cilium/releases/tag/v1.8.4)
* Update Calico from v1.15.3 to [v1.16.3](https://github.com/projectcalico/calico/releases/tag/v3.16.3) ([#851](https://github.com/poseidon/typhoon/pull/851))
* Update flannel from v0.13.0-rc2 to v0.13.0 ([#219](https://github.com/poseidon/terraform-render-bootstrap/pull/219))
### Flatcar Linux
* Remove references to CoreOS Container Linux ([#839](https://github.com/poseidon/typhoon/pull/839))
* Fix error querying for coreos AMI on AWS ([#838](https://github.com/poseidon/typhoon/issues/838))
### Addons
* Update nginx-ingress from v0.35.0 to [v0.40.2](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.2)
* Update Grafana from v7.1.5 to [v7.2.1](https://github.com/grafana/grafana/releases/tag/v7.2.1)
* Update Prometheus from v2.21.0 to [v2.22.0](https://github.com/prometheus/prometheus/releases/tag/v2.22.0)
* Update kube-state-metrics from v1.9.7 to [v2.0.0-alpha.1](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.1)
## v1.19.2
* Kubernetes [v1.19.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1192)
* Update flannel from v0.12.0 to v0.13.0-rc2 ([#216](https://github.com/poseidon/terraform-render-bootstrap/pull/216))
* Update flannel-cni from v0.4.0 to v0.4.1
* Update CNI plugins from v0.8.6 to v0.8.7
### Addons
* Refresh Prometheus rules/alerts and Grafana dashboards ([#831](https://github.com/poseidon/typhoon/pull/831))
* Reduce apiserver metrics cardinality for non-core APIs ([#830](https://github.com/poseidon/typhoon/pull/830))
## v1.19.1
* Kubernetes [v1.19.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191)
* Change control plane seccomp annotations to GA `seccompProfile` ([#822](https://github.com/poseidon/typhoon/pull/822))
* Update Cilium from v1.8.2 to [v1.8.3](https://github.com/cilium/cilium/releases/tag/v1.8.3)
* Promote Cilium from experimental to general availability ([#827](https://github.com/poseidon/typhoon/pull/827))
* Update Calico from v1.15.2 to [v1.15.3](https://github.com/projectcalico/calico/releases/tag/v3.15.3)
### Fedora CoreOS
* Update Fedora CoreOS Config version from v1.0.0 to v1.1.0
* Require any [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customizations to update to v1.1.0
### Addons
* Update IngressClass resources to `networking.k8s.io/v1` ([#824](https://github.com/poseidon/typhoon/pull/824))
* Update Prometheus from v2.20.0 to [v2.21.0](https://github.com/prometheus/prometheus/releases/tag/v2.21.0)
* Remove Kubernetes node name labelmap `relabel_config` from etcd, Kubelet, and CAdvisor scrape config ([#828](https://github.com/poseidon/typhoon/pull/828))
## v1.19.0
* Kubernetes [v1.19.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1190)
* Update etcd from v3.4.10 to [v3.4.12](https://github.com/etcd-io/etcd/releases/tag/v3.4.12)
* Update Calico from v3.15.1 to [v3.15.2](https://docs.projectcalico.org/v3.15/release-notes/)
### Fedora CoreOS
* Fix race condition during bootstrap of multi-controller clusters ([#808](https://github.com/poseidon/typhoon/pull/808))
* Fix SELinux label of bootstrap-secrets on non-bootstrap controllers
### Addons
* Introduce [fleetlock](https://github.com/poseidon/fleetlock) for Fedora CoreOS reboot coordination ([#814](https://github.com/poseidon/typhoon/pull/814))
* Update nginx-ingress from v0.34.1 to [v0.35.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.35.0)
* Repository changed to `k8s.gcr.io/ingress-nginx/controller`
* Update Grafana from v7.1.3 to [v7.1.5](https://github.com/grafana/grafana/releases/tag/v7.1.5)
## v1.18.8
* Kubernetes [v1.18.8](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1188)
* Migrate from Terraform v0.12.x to v0.13.x ([#804](https://github.com/poseidon/typhoon/pull/804)) (**action required**)
* Recommend Terraform v0.13.x ([migration guide](https://typhoon.psdn.io/topics/maintenance/#terraform-versions))
* Support automatic install of poseidon's provider plugins ([poseidon/ct](https://registry.terraform.io/providers/poseidon/ct/latest), [poseidon/matchbox](https://registry.terraform.io/providers/poseidon/matchbox/latest))
* Require Terraform v0.12.26+ (migration compatibility)
* Require `terraform-provider-ct` v0.6.1
* Require `terraform-provider-matchbox` v0.4.1
* Update etcd from v3.4.9 to [v3.4.10](https://github.com/etcd-io/etcd/releases/tag/v3.4.10)
* Update CoreDNS from v1.6.7 to [v1.7.0](https://coredns.io/2020/06/15/coredns-1.7.0-release/)
* Update Cilium from v1.8.1 to [v1.8.2](https://github.com/cilium/cilium/releases/tag/v1.8.2)
* Update [coreos/flannel-cni](https://github.com/coreos/flannel-cni) to [poseidon/flannel-cni](https://github.com/poseidon/flannel-cni) ([#798](https://github.com/poseidon/typhoon/pull/798))
* Update CNI plugins and fix CVEs with Flannel CNI (non-default)
* Transition to a poseidon maintained container image
### AWS
* Allow `terraform-provider-aws` v3.0+ ([#803](https://github.com/poseidon/typhoon/pull/803))
* Recommend updating `terraform-provider-aws` to v3.0+
* Continue to allow v2.23+, no v3.x specific features are used
### DigitalOcean
* Require `terraform-provider-digitalocean` v1.21+ for Terraform v0.13.x (unenforced)
* Require `terraform-provider-digitalocean` v1.20+ for Terraform v0.12.x
### Fedora CoreOS
* Fix support for Flannel with Fedora CoreOS ([#795](https://github.com/poseidon/typhoon/pull/795))
* Configure `flannel.1` link to select its own MAC address to solve flannel
pod-to-pod traffic drops starting with default link changes in Fedora CoreOS
32.20200629.3.0 ([details](https://github.com/coreos/fedora-coreos-tracker/issues/574#issuecomment-665487296))
#### Addons
* Update Prometheus from v2.19.2 to [v2.20.0](https://github.com/prometheus/prometheus/releases/tag/v2.20.0)
* Update Grafana from v7.0.6 to [v7.1.3](https://github.com/grafana/grafana/releases/tag/v7.1.3)
## v1.18.6
* Kubernetes [v1.18.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1186)
* Update Calico from v3.15.0 to [v3.15.1](https://docs.projectcalico.org/v3.15/release-notes/)
* Update Cilium from v1.8.0 to [v1.8.1](https://github.com/cilium/cilium/releases/tag/v1.8.1)
#### Addons
* Update nginx-ingress from v0.33.0 to [v0.34.1](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.34.1)
* [ingress-nginx](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.0) will publish images only to gcr.io
* Update Prometheus from v2.19.1 to [v2.19.2](https://github.com/prometheus/prometheus/releases/tag/v2.19.2)
* Update Grafana from v7.0.4 to [v7.0.6](https://github.com/grafana/grafana/releases/tag/v7.0.6)
## v1.18.5
* Kubernetes [v1.18.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1185)
* Add Cilium v1.8.0 as a (experimental) CNI provider option ([#760](https://github.com/poseidon/typhoon/pull/760))
* Set `networking` to "cilium" to enable
* Update Calico from v3.14.1 to [v3.15.0](https://docs.projectcalico.org/v3.15/release-notes/)
#### DigitalOcean
* Isolate each cluster in an independent DigitalOcean VPC ([#776](https://github.com/poseidon/typhoon/pull/776))
* Create droplets in a VPC per cluster (matches Typhoon AWS, Azure, and GCP)
* Require `terraform-provider-digitalocean` v1.16.0+ (action required)
* Output `vpc_id` for use with an attached DigitalOcean [loadbalancer](https://github.com/poseidon/typhoon/blob/v1.18.5/docs/architecture/digitalocean.md#custom-load-balancer)
### Fedora CoreOS
#### Google Cloud
* Promote Fedora CoreOS to stable
* Remove `os_image` variable deprecated in v1.18.3 ([#777](https://github.com/poseidon/typhoon/pull/777))
* Use `os_stream` to select a Fedora CoreOS image stream
### Flatcar Linux
#### Azure
* Allow using Flatcar Linux Edge by setting `os_image` to "flatcar-edge" ([#778](https://github.com/poseidon/typhoon/pull/778))
#### Addons
* Update Prometheus from v2.19.0 to [v2.19.1](https://github.com/prometheus/prometheus/releases/tag/v2.19.1)
* Update Grafana from v7.0.3 to [v7.0.4](https://github.com/grafana/grafana/releases/tag/v7.0.4)
## v1.18.4
* Kubernetes [v1.18.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1184)
* Update Kubelet image publishing ([#749](https://github.com/poseidon/typhoon/pull/749))
* Build Kubelet images internally and publish to Quay and Dockerhub
* [quay.io/poseidon/kubelet](https://quay.io/repository/poseidon/kubelet) (official)
* [docker.io/psdn/kubelet](https://hub.docker.com/r/psdn/kubelet) (fallback)
* Continue offering automated image builds with an alternate tag strategy (see [docs](https://typhoon.psdn.io/topics/security/#container-images))
* [Document](https://typhoon.psdn.io/advanced/customization/#kubelet) use of alternate Kubelet images during registry incidents
* Update Calico from v3.14.0 to [v3.14.1](https://docs.projectcalico.org/v3.14/release-notes/)
* Fix [CVE-2020-13597](https://github.com/kubernetes/kubernetes/issues/91507)
* Rename controller NoSchedule taint from `node-role.kubernetes.io/master` to `node-role.kubernetes.io/controller` ([#764](https://github.com/poseidon/typhoon/pull/764))
* Tolerate the new taint name for workloads that may run on controller nodes
* Remove node label `node.kubernetes.io/master` from controller nodes ([#764](https://github.com/poseidon/typhoon/pull/764))
* Use `node.kubernetes.io/controller` (present since v1.9.5, [#160](https://github.com/poseidon/typhoon/pull/160)) to node select controllers
* Remove unused Kubelet `-lock-file` and `-exit-on-lock-contention` ([#758](https://github.com/poseidon/typhoon/pull/758))
### Fedora CoreOS
#### Azure
* Use `strict` Fedora CoreOS Config (FCC) snippet parsing ([#755](https://github.com/poseidon/typhoon/pull/755))
* Reduce Calico vxlan interface MTU to maintain performance ([#767](https://github.com/poseidon/typhoon/pull/766))
#### AWS
* Fix Kubelet service race with hostname update ([#766](https://github.com/poseidon/typhoon/pull/766))
* Wait for a hostname to avoid Kubelet trying to register as `localhost`
### Flatcar Linux
* Use `strict` Container Linux Config (CLC) snippet parsing ([#755](https://github.com/poseidon/typhoon/pull/755))
* Require `terraform-provider-ct` v0.4+, recommend v0.5+ (**action required**)
### Addons
* Update nginx-ingress from v0.32.0 to [v0.33.0](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.33.0)
* Update Prometheus from v2.18.1 to [v2.19.0](https://github.com/prometheus/prometheus/releases/tag/v2.19.0)
* Update node-exporter from v1.0.0-rc.1 to [v1.0.1](https://github.com/prometheus/node_exporter/releases/tag/v1.0.1)
* Update kube-state-metrics from v1.9.6 to v1.9.7
* Update Grafana from v7.0.0 to v7.0.3
## v1.18.3
* Kubernetes [v1.18.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1183)
* Use Kubelet [TLS bootstrap](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/) with bootstrap token authentication ([#713](https://github.com/poseidon/typhoon/pull/713))
* Enable Node [Authorization](https://kubernetes.io/docs/reference/access-authn-authz/node/) and [NodeRestriction](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#noderestriction) to reduce authorization scope
* Renew Kubelet certificates every 72 hours
* Update etcd from v3.4.7 to [v3.4.9](https://github.com/etcd-io/etcd/releases/tag/v3.4.9)
* Update Calico from v3.13.1 to [v3.14.0](https://docs.projectcalico.org/v3.14/release-notes/)
* Add CoreDNS node affinity preference for controller nodes ([#188](https://github.com/poseidon/terraform-render-bootstrap/pull/188))
* Deprecate CoreOS Container Linux support (no OS [updates](https://coreos.com/os/eol/) after May 2020)
* Use a `fedora-coreos` module for Fedora CoreOS
* Use a `container-linux` module for Flatcar Linux
### AWS
* Fix Terraform plan error when `controller_count` exceeds AWS zones (e.g. 5 controllers) ([#714](https://github.com/poseidon/typhoon/pull/714))
* Regressed in v1.17.1 ([#605](https://github.com/poseidon/typhoon/pull/605))
### Azure
* Update Azure subnets to set `address_prefixes` list ([#730](https://github.com/poseidon/typhoon/pull/730))
* Fix warning that `address_prefix` is deprecated
* Require `terraform-provider-azurerm` v2.8.0+ (action required)
### DigitalOcean
* Promote DigitalOcean to beta on both Fedora CoreOS and Flatcar Linux
### Fedora CoreOS
* Fix Calico `install-cni` crashloop on Pod restarts ([#724](https://github.com/poseidon/typhoon/pull/724))
* SELinux enforcement requires consistent file context MCS level
* Restarting a node resolved the issue as a previous workaround
#### AWS
* Support Fedora CoreOS [image streams](https://docs.fedoraproject.org/en-US/fedora-coreos/update-streams/) ([#727](https://github.com/poseidon/typhoon/pull/727))
* Add `os_stream` variable to set the stream to `stable` (default), `testing`, or `next`
* Remove unused `os_image` variable
#### Google
* Support Fedora CoreOS [image streams](https://docs.fedoraproject.org/en-US/fedora-coreos/update-streams/) ([#723](https://github.com/poseidon/typhoon/pull/723))
* Add `os_stream` variable to set the stream to `stable` (default), `testing`, or `next`
* Deprecate `os_image` variable. Manual image uploads are no longer needed
### Flatcar Linux
#### Azure
* Use the Flatcar Linux Azure Marketplace image
* Restore [#664](https://github.com/poseidon/typhoon/pull/664) (reverted in [#707](https://github.com/poseidon/typhoon/pull/707)) but use Flatcar Linux new free offer (not byol)
* Change `os_image` to use a `flatcar-stable` default
#### Google
* Promote Flatcar Linux to beta
### Addons
* Update nginx-ingress from v0.30.0 to [v0.32.0](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.32.0)
* Add support for [IngressClass](https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class)
* Update Prometheus from v2.17.1 to v2.18.1
* Update kube-state-metrics from v1.9.5 to [v1.9.6](https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.6)
* Update node-exporter from v1.0.0-rc.0 to [v1.0.0-rc.1](https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.1)
* Update Grafana from v6.7.2 to [v7.0.0](https://grafana.com/docs/grafana/latest/guides/whats-new-in-v7-0/)
## v1.18.2
* Kubernetes [v1.18.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1182)
* Choose Fedora CoreOS or Flatcar Linux (**action required**)
* Use a `fedora-coreos` module for Fedora CoreOS
* Use a `container-linux` module for Flatcar Linux
* Change Container Linux modules' defaults from CoreOS Container Linux to [Flatcar Container Linux](https://typhoon.psdn.io/architecture/operating-systems/) ([#702](https://github.com/poseidon/typhoon/pull/702))
* CoreOS Container Linux [won't receive updates](https://coreos.com/os/eol/) after May 2020
### Fedora CoreOS
* Fix bootstrap race condition from SELinux unshared content label ([#708](https://github.com/poseidon/typhoon/pull/708))
#### Azure
* Add support for Fedora CoreOS ([#704](https://github.com/poseidon/typhoon/pull/704))
#### DigitalOcean
* Fix race condition creating firewall allow rules ([#709](https://github.com/poseidon/typhoon/pull/709))
### Flatcar Linux
#### AWS
* Change `os_image` default from `coreos-stable` to `flatcar-stable` ([#702](https://github.com/poseidon/typhoon/pull/702))
#### Azure
* Change `os_image` to be required. Recommend uploading a Flatcar Linux image (**action required**) ([#702](https://github.com/poseidon/typhoon/pull/702))
* Disable Flatcar Linux Azure Marketplace image [support](https://github.com/poseidon/typhoon/pull/664) (**breaking**, [#707](https://github.com/poseidon/typhoon/pull/707))
* Revert to manual uploading until marketplace issue is closed ([#703](https://github.com/poseidon/typhoon/issues/703))
#### Bare-Metal
* Recommend changing [os_channel](https://typhoon.psdn.io/cl/bare-metal/#required) from `coreos-stable` to `flatcar-stable`
#### Google
* Change `os_image` to be required. Recommend uploading a Flatcar Linux image (**action required**) ([#702](https://github.com/poseidon/typhoon/pull/702))
#### DigitalOcean
* Change `os_image` to be required. Recommend uploading a Flatcar Linux image (**action required**) ([#702](https://github.com/poseidon/typhoon/pull/702))
* Fix race condition creating firewall allow rules ([#709](https://github.com/poseidon/typhoon/pull/709))
## v1.18.1
* Kubernetes [v1.18.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1181)
* Choose Fedora CoreOS or Flatcar Linux (**action recommended**)
* Use a `fedora-coreos` module for Fedora CoreOS
* Use a `container-linux` module with OS set to Flatcar Linux
* Update etcd from v3.4.5 to [v3.4.7](https://github.com/etcd-io/etcd/releases/tag/v3.4.7)
* Change `kube-proxy` and `calico` or `flannel` to tolerate specific taints ([#682](https://github.com/poseidon/typhoon/pull/682))
* Tolerate master and not-ready taints, rather than tolerating all taints
* Update flannel from v0.11.0 to v0.12.0 ([#690](https://github.com/poseidon/typhoon/pull/690))
* Fix bootstrap when `networking` mode `flannel` (non-default) is chosen ([#689](https://github.com/poseidon/typhoon/pull/689))
* Regressed in v1.18.0 changes for Calico ([#675](https://github.com/poseidon/typhoon/pull/675))
* Rename Container Linux `controller_clc_snippets` to `controller_snippets` for consistency ([#688](https://github.com/poseidon/typhoon/pull/688))
* Rename Container Linux `worker_clc_snippets` to `worker_snippets` for consistency
* Rename Container Linux `clc_snippets` (bare-metal) to `snippets` for consistency
* Drop support for [gitRepo](https://kubernetes.io/docs/concepts/storage/volumes/#gitrepo) volumes ([kubelet#3](https://github.com/poseidon/kubelet/pull/3))
#### Azure
* Fix Azure worker UDP outbound connections ([#691](https://github.com/poseidon/typhoon/pull/691))
* Fix Azure worker clock sync timeouts
#### DigitalOcean
* Add support for Fedora CoreOS ([#699](https://github.com/poseidon/typhoon/pull/699))
#### Addons
* Refresh Prometheus rules/alerts and Grafana dashboards ([#692](https://github.com/poseidon/typhoon/pull/692))
* Update Grafana from v6.7.1 to v6.7.2
## v1.18.0
* Kubernetes [v1.18.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1180)
* Update etcd from v3.4.4 to [v3.4.5](https://github.com/etcd-io/etcd/releases/tag/v3.4.5)
* Switch from upstream hyperkube image to individual images ([#669](https://github.com/poseidon/typhoon/pull/669))
* Use upstream k8s.gcr.io `kube-apiserver`, `kube-controller-manager`, `kube-scheduler`, and `kube-proxy` container images
* Use [poseidon/kubelet](https://github.com/poseidon/kubelet) to package the upstream Kubelet binary and dependencies as a container image (checksummed, automated build)
* Add [quay.io/poseidon/kubelet](https://quay.io/repository/poseidon/kubelet) as a Typhoon distributed artifact in the security policy
* Update base images from debian 9 to debian 10
* Background: Kubernetes will [stop releasing](https://github.com/kubernetes/kubernetes/pull/88676) the hyperkube container image and provide the Kubelet as a binary for packaging
* Choose Fedora CoreOS or Flatcar Linux (**action recommended**)
* Use a `fedora-coreos` module for Fedora CoreOS
* Use a `container-linux` module with OS set for Flatcar Linux (varies, see docs)
* CoreOS Container Linux [won't receive updates](https://coreos.com/os/eol/) after May 2020
* Add support for Fedora CoreOS snippets (`terraform-provider-ct` v0.5+) ([#686](https://github.com/poseidon/typhoon/pull/686))
* Recommend updating `terraform-provider-ct` plugin from v0.4.0 to [v0.5.0](https://github.com/poseidon/terraform-provider-ct/releases/tag/v0.5.0)
* Set Fedora CoreOS log driver back to the default `journald` ([#681](https://github.com/poseidon/typhoon/pull/681))
* Deprecate `asset_dir` variable and remove docs ([#678](https://github.com/poseidon/typhoon/pull/678))
* Deprecate support for [gitRepo](https://kubernetes.io/docs/concepts/storage/volumes/#gitrepo) volumes. A future release will drop support.
#### AWS
* Fix Fedora CoreOS AMI to filter for stable images ([#685](https://github.com/poseidon/typhoon/pull/685))
* Latest Fedora CoreOS `testing` or `bodhi-update` images could be chosen depending on the region
#### Bare-Metal
* Update Fedora CoreOS default `os_stream` from testing to stable
#### Google Cloud
* Known: Use of stale Fedora CoreOS image may require terraform re-apply during bootstrap ([#687](https://github.com/poseidon/typhoon/pull/687))
#### DigitalOcean
* Rename `image` variable to `os_image` for consistency ([#677](https://github.com/poseidon/typhoon/pull/677)) (action required)
#### Addons
* Update Prometheus from v2.16.0 to [v2.17.1](https://github.com/prometheus/prometheus/releases/tag/v2.17.1)
* Update Grafana from v6.6.2 to [v6.7.1](https://github.com/grafana/grafana/releases/tag/v6.7.1)
## v1.17.4
* Kubernetes [v1.17.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1174)
* Update etcd from v3.4.3 to [v3.4.4](https://github.com/etcd-io/etcd/releases/tag/v3.4.4)
* On Container Linux, fetch using the docker transport format ([#659](https://github.com/poseidon/typhoon/pull/659))
* Update CoreDNS from v1.6.6 to v1.6.7 ([#648](https://github.com/poseidon/typhoon/pull/648))
* Update Calico from v3.12.0 to [v3.13.1](https://docs.projectcalico.org/v3.13/release-notes/)
#### AWS
* Promote Fedora CoreOS to stable ([#668](https://github.com/poseidon/typhoon/pull/668))
* Allow VPC route table extension via reference ([#654](https://github.com/poseidon/typhoon/pull/654))
* Fix `worker_node_labels` on Fedora CoreOS ([#651](https://github.com/poseidon/typhoon/pull/651))
* Fix automatic worker node delete on shutdown on Fedora CoreOS ([#657](https://github.com/poseidon/typhoon/pull/657))
#### Azure
* Upgrade to `terraform-provider-azurerm` [v2.0+](https://www.terraform.io/docs/providers/azurerm/guides/2.0-upgrade-guide.html) (action required)
* Change `worker_priority` from `Low` to `Spot` if used (action required)
* Switch to Azure's new Linux VM and Linux VM Scale Set resources
* Set controller's Azure disk caching to None
* Associate subnets (in addition to NICs) with security groups (aesthetic)
* Add support for Flatcar Container Linux ([#664](https://github.com/poseidon/typhoon/pull/664))
* Requires accepting Flatcar Linux Azure Marketplace terms
#### Bare-Metal
* Add `worker_node_labels` map variable for per-worker node labels ([#663](https://github.com/poseidon/typhoon/pull/663))
* Add `worker_node_taints` map variable for per-worker node taints ([#663](https://github.com/poseidon/typhoon/pull/663))
#### DigitalOcean
* Add support for Flatcar Container Linux ([#644](https://github.com/poseidon/typhoon/pull/644))
#### Google Cloud
* Promote Fedora CoreOS to beta ([#668](https://github.com/poseidon/typhoon/pull/668))
* Fix `worker_node_labels` on Fedora CoreOS ([#651](https://github.com/poseidon/typhoon/pull/651))
* Fix automatic worker node delete on shutdown on Fedora CoreOS ([#657](https://github.com/poseidon/typhoon/pull/657))
#### Addons
* Update nginx-ingress from v0.28.0 to [v0.30.0](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.30.0)
* Update Prometheus from v2.15.2 to [v2.16.0](https://github.com/prometheus/prometheus/releases/tag/v2.16.0)
* Refresh Prometheus rules and alerts
* Add a BlackboxProbeFailure alert
* Update kube-state-metrics from v1.9.4 to v1.9.5
* Update node-exporter from v0.18.1 to [v1.0.0-rc.0](https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.0)
* Update Grafana from v6.6.1 to v6.6.2
* Refresh Grafana dashboards
* Remove Container Linux Update Operator (CLUO) addon example ([#667](https://github.com/poseidon/typhoon/pull/667))
* CLUO hasn't been in active use in our clusters and won't be relevant
beyond Container Linux. Requires patches for use on Kubernetes v1.16+
## v1.17.3
* Kubernetes [v1.17.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1173)
* Update Calico from v3.11.2 to v3.12.0
* Allow Fedora CoreOS clusters to pass CNCF conformance suite
* Set Docker log driver to `json-file` as a workaround
* Try Fedora CoreOS or Flatcar Linux alongside CoreOS [Container Linux](https://coreos.com/os/eol/) clusters (recommended)
#### AWS
* Promote Fedora CoreOS to beta ([#645](https://github.com/poseidon/typhoon/pull/645))
#### Bare-Metal
* Promote Fedora CoreOS to beta ([#645](https://github.com/poseidon/typhoon/pull/645))
* Add Fedora CoreOS kernel arguments initrd and console ([#640](https://github.com/poseidon/typhoon/pull/640))
#### Google Cloud
* Add Terraform module for Fedora CoreOS ([#632](https://github.com/poseidon/typhoon/pull/632))
* Add support for Flatcar Container Linux ([#639](https://github.com/poseidon/typhoon/pull/639))
#### Addons
* Update nginx-ingress from v0.27.1 to v0.28.0
* Update kube-state-metrics from v1.9.3 to v1.9.4
* Update Grafana from v6.5.3 to v6.6.1
## v1.17.2
* Kubernetes [v1.17.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1172)
#### AWS
* Promote Fedora CoreOS from preview to alpha
#### Bare-Metal
* Promote Fedora CoreOS from preview to alpha
* Update Fedora CoreOS images location
* Use Fedora CoreOS production [download](https://getfedora.org/coreos/download/) streams
* Use live PXE kernel and initramfs images
#### Addons
* Update nginx-ingress from v0.26.1 to [v0.27.1](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.1) ([#625](https://github.com/poseidon/typhoon/pull/625))
* Change runAsUser from 33 to 101 for alpine-based image
* Update kube-state-metrics from v1.9.2 to v1.9.3
## v1.17.1
* Kubernetes [v1.17.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1171)
* Update CoreDNS from v1.6.5 to [v1.6.6](https://coredns.io/2019/12/11/coredns-1.6.6-release/) ([#602](https://github.com/poseidon/typhoon/pull/602))
* Update Calico from v3.10.2 to v3.11.2 ([#604](https://github.com/poseidon/typhoon/pull/604))
* Inline Kubelet service on Container Linux nodes ([#606](https://github.com/poseidon/typhoon/pull/606))
* Disable unused Kubelet `127.0.0.1:10248` healthz listener ([#607](https://github.com/poseidon/typhoon/pull/607))
* Enable kube-proxy metrics and allow Prometheus scrapes
* Allow TCP/10249 traffic with worker node sources
#### AWS
* Update Fedora CoreOS AMI filter for fedora-coreos-31 ([#620](https://github.com/poseidon/typhoon/pull/620))
#### Google
* Allow `terraform-provider-google` v3.0+ ([#617](https://github.com/poseidon/typhoon/pull/617))
* Only enforce `v2.19+` to ease migration, as no v3.x features are used
#### Addons
* Update Prometheus from v2.14.0 to [v2.15.2](https://github.com/prometheus/prometheus/releases/tag/v2.15.2)
* Add discovery for kube-proxy service endpoints
* Update kube-state-metrics from v1.8.0 to v1.9.2
* Reduce node-exporter DaemonSet tolerations ([#614](https://github.com/poseidon/typhoon/pull/614))
* Update Grafana from v6.5.1 to v6.5.3
## v1.17.0
* Kubernetes [v1.17.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1170)
* Manage clusters without using a local `asset_dir` ([#595](https://github.com/poseidon/typhoon/pull/595))
* Change `asset_dir` to be optional. Remove the variable to skip writing assets locally (**action recommended**)
* Allow keeping cluster assets only in Terraform state ([pluggable](https://www.terraform.io/docs/backends/types/remote.html), encryption) and allow `terraform apply` from stateless automation systems
* Improve asset unpacking on controllers
* Obtain kubeconfig from Terraform module outputs
* Replace usage of `template_dir` with `templatefile` function ([#587](https://github.com/poseidon/typhoon/pull/587))
* Require Terraform version v0.12.6+ (**action required**)
* Update CoreDNS from v1.6.2 to v1.6.5 ([#588](https://github.com/poseidon/typhoon/pull/588))
* Add health `lameduck` option to wait before shutdown
* Update Calico from v3.10.1 to v3.10.2 ([#599](https://github.com/poseidon/typhoon/pull/599))
* Reduce pod eviction timeout for deleting pods on unready nodes from 5m to 1m ([#597](https://github.com/poseidon/typhoon/pull/597))
* Present since [v1.13.3](#v1133), but mistakenly removed in v1.16.0
* Add CPU requests for control plane static pods ([#589](https://github.com/poseidon/typhoon/pull/589))
* May provide slight edge case benefits and aligns with upstream
#### Google
* Use new `google_compute_region_instance_group_manager` version block format
* Fixes warning that `instance_template` is deprecated
* Require `terraform-provider-google` v2.19.0+ (**action required**)
#### Addons
* Update Grafana from v6.4.4 to [v6.5.1](https://grafana.com/docs/guides/whats-new-in-v6-5/)
* Add pod networking details in dashboards ([#593](https://github.com/poseidon/typhoon/pull/593))
* Add node alerts and Grafana dashboard from node-exporter ([#591](https://github.com/poseidon/typhoon/pull/591))
* Reduce Prometheus high cardinality time series ([#596](https://github.com/poseidon/typhoon/pull/596))
## v1.16.3 ## v1.16.3
@ -195,7 +820,7 @@ Notable changes between versions.
* Require `terraform-provider-azurerm` v1.27+ to support Terraform v0.12 (action required) * Require `terraform-provider-azurerm` v1.27+ to support Terraform v0.12 (action required)
* Avoid unneeded rotations of Regular priority virtual machine scale sets * Avoid unneeded rotations of Regular priority virtual machine scale sets
* Azure only allows `eviction_policy` to be set for Low priority VMs. Supporting Low priority VMs meant when Regular VMs were used, each `terraform apply` rolled workers, to set eviction_policy to null. * Azure only allows `eviction_policy` to be set for Low priority VMs. Supporting Low priority VMs meant when Regular VMs were used, each `terraform apply` rolled workers, to set eviction_policy to null.
* Terraform v0.12 nullable variables fix the issue so plan does not produce a diff. * Terraform v0.12 nullable variables fix the issue so plan does not produce a diff.
#### Bare-Metal #### Bare-Metal
@ -250,7 +875,7 @@ Notable changes between versions.
* Update Grafana from v6.1.6 to v6.2.1 * Update Grafana from v6.1.6 to v6.2.1
## v1.14.2 ## v1.14.2
* Kubernetes [v1.14.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#v1142) * Kubernetes [v1.14.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#v1142)
* Update etcd from v3.3.12 to [v3.3.13](https://github.com/etcd-io/etcd/releases/tag/v3.3.13) * Update etcd from v3.3.12 to [v3.3.13](https://github.com/etcd-io/etcd/releases/tag/v3.3.13)
* Upgrade Calico from v3.6.1 to [v3.7.2](https://docs.projectcalico.org/v3.7/release-notes/) * Upgrade Calico from v3.6.1 to [v3.7.2](https://docs.projectcalico.org/v3.7/release-notes/)
@ -321,7 +946,7 @@ Notable changes between versions.
* Add ability to load balance TCP/UDP applications ([#442](https://github.com/poseidon/typhoon/pull/442)) * Add ability to load balance TCP/UDP applications ([#442](https://github.com/poseidon/typhoon/pull/442))
* Add worker instances to a target pool, output as `worker_target_pool` * Add worker instances to a target pool, output as `worker_target_pool`
* Health check for workers with Ingress controllers. Forward rules don't support differing internal/external ports, but some Ingress controllers support TCP/UDP proxy as a workaround * Health check for workers with Ingress controllers. Forward rules don't support differing internal/external ports, but some Ingress controllers support TCP/UDP proxy as a workaround
* Remove Haswell minimum CPU platform requirement ([#439](https://github.com/poseidon/typhoon/pull/439)) * Remove Haswell minimum CPU platform requirement ([#439](https://github.com/poseidon/typhoon/pull/439))
* Google Cloud API implements `min_cpu_platform` to mean "use exactly this CPU". Revert [#405](https://github.com/poseidon/typhoon/pull/405) added in v1.13.4. * Google Cloud API implements `min_cpu_platform` to mean "use exactly this CPU". Revert [#405](https://github.com/poseidon/typhoon/pull/405) added in v1.13.4.
* Fix error creating clusters in new regions without Haswell (e.g. europe-west2) ([#438](https://github.com/poseidon/typhoon/issues/438)) * Fix error creating clusters in new regions without Haswell (e.g. europe-west2) ([#438](https://github.com/poseidon/typhoon/issues/438))
@ -506,7 +1131,7 @@ Notable changes between versions.
* Update Calico from v3.3.0 to [v3.3.1](https://docs.projectcalico.org/v3.3/releases/) * Update Calico from v3.3.0 to [v3.3.1](https://docs.projectcalico.org/v3.3/releases/)
* Disable Felix usage reporting by default ([#345](https://github.com/poseidon/typhoon/pull/345)) * Disable Felix usage reporting by default ([#345](https://github.com/poseidon/typhoon/pull/345))
* Improve flannel manifests * Improve flannel manifests
* [Rename](https://github.com/poseidon/terraform-render-bootkube/commit/d045a8e6b8eccfbb9d69bb51953b5a93d23f67f7) `kube-flannel` DaemonSet to `flannel` and `kube-flannel-cfg` ConfigMap to `flannel-config` * [Rename](https://github.com/poseidon/terraform-render-bootkube/commit/d045a8e6b8eccfbb9d69bb51953b5a93d23f67f7) `kube-flannel` DaemonSet to `flannel` and `kube-flannel-cfg` ConfigMap to `flannel-config`
* [Drop](https://github.com/poseidon/terraform-render-bootkube/commit/39f9afb3360ec642e5b98457c8bd07eda35b6c96) unused mounts and add a CPU resource request * [Drop](https://github.com/poseidon/terraform-render-bootkube/commit/39f9afb3360ec642e5b98457c8bd07eda35b6c96) unused mounts and add a CPU resource request
* Update CoreDNS from v1.2.4 to [v1.2.6](https://coredns.io/2018/11/05/coredns-1.2.6-release/) * Update CoreDNS from v1.2.4 to [v1.2.6](https://coredns.io/2018/11/05/coredns-1.2.6-release/)
* Enable CoreDNS `loop` and `loadbalance` plugins ([#340](https://github.com/poseidon/typhoon/pull/340)) * Enable CoreDNS `loop` and `loadbalance` plugins ([#340](https://github.com/poseidon/typhoon/pull/340))
@ -668,7 +1293,7 @@ Notable changes between versions.
* Force apiserver to stop listening on `127.0.0.1:8080` * Force apiserver to stop listening on `127.0.0.1:8080`
* Replace `kube-dns` with [CoreDNS](https://coredns.io/) ([#261](https://github.com/poseidon/typhoon/pull/261)) * Replace `kube-dns` with [CoreDNS](https://coredns.io/) ([#261](https://github.com/poseidon/typhoon/pull/261))
* Edit the `coredns` ConfigMap to [customize](https://coredns.io/plugins/) * Edit the `coredns` ConfigMap to [customize](https://coredns.io/plugins/)
* CoreDNS doesn't use a resizer. For large clusters, scaling may be required. * CoreDNS doesn't use a resizer. For large clusters, scaling may be required.
#### AWS #### AWS
@ -713,7 +1338,7 @@ Notable changes between versions.
* Switch `kube-apiserver` port from 443 to 6443 ([#248](https://github.com/poseidon/typhoon/pull/248)) * Switch `kube-apiserver` port from 443 to 6443 ([#248](https://github.com/poseidon/typhoon/pull/248))
* Users who exposed kube-apiserver on a WAN via their router/load-balancer will need to adjust its configuration (e.g. DNAT 6443). Most apiservers are on a LAN (internal, VPN-only, etc) so if you didn't specially configure network gear for 443, no change is needed. (possible action required) * Users who exposed kube-apiserver on a WAN via their router/load-balancer will need to adjust its configuration (e.g. DNAT 6443). Most apiservers are on a LAN (internal, VPN-only, etc) so if you didn't specially configure network gear for 443, no change is needed. (possible action required)
* Fix possible deadlock when provisioning clusters larger than 10 nodes ([#244](https://github.com/poseidon/typhoon/pull/244)) * Fix possible deadlock when provisioning clusters larger than 10 nodes ([#244](https://github.com/poseidon/typhoon/pull/244))
#### DigitalOcean #### DigitalOcean
@ -781,7 +1406,7 @@ Notable changes between versions.
* Please change values stable, beta, or alpha to coreos-stable, coreos-beta, coreos-alpha (**action required!**) * Please change values stable, beta, or alpha to coreos-stable, coreos-beta, coreos-alpha (**action required!**)
* Replace `container_linux_version` variable with `os_version` * Replace `container_linux_version` variable with `os_version`
* Add `network_ip_autodetection_method` variable for Calico host IPv4 address detection * Add `network_ip_autodetection_method` variable for Calico host IPv4 address detection
* Use Calico's default "first-found" to support single NIC and bonded NIC nodes * Use Calico's default "first-found" to support single NIC and bonded NIC nodes
* Allow [alternative](https://docs.projectcalico.org/v3.1/reference/node/configuration#ip-autodetection-methods) methods for multi NIC nodes, like can-reach=IP or interface=REGEX * Allow [alternative](https://docs.projectcalico.org/v3.1/reference/node/configuration#ip-autodetection-methods) methods for multi NIC nodes, like can-reach=IP or interface=REGEX
* Deprecate `container_linux_oem` variable * Deprecate `container_linux_oem` variable
@ -814,7 +1439,7 @@ Notable changes between versions.
#### Google Cloud #### Google Cloud
* Add support for multi-controller clusters (i.e. multi-master) ([#54](https://github.com/poseidon/typhoon/issues/54), [#190](https://github.com/poseidon/typhoon/pull/190)) * Add support for multi-controller clusters (i.e. multi-master) ([#54](https://github.com/poseidon/typhoon/issues/54), [#190](https://github.com/poseidon/typhoon/pull/190))
* Switch from Google Cloud network load balancer to a TCP proxy load balancer. Avoid a [bug](https://issuetracker.google.com/issues/67366622) in Google network load balancers that limited clusters to only bootstrapping one controller node. * Switch from Google Cloud network load balancer to a TCP proxy load balancer. Avoid a [bug](https://issuetracker.google.com/issues/67366622) in Google network load balancers that limited clusters to only bootstrapping one controller node.
* Add TCP health check for apiserver pods on controllers. Replace kubelet check approximation. * Add TCP health check for apiserver pods on controllers. Replace kubelet check approximation.
#### Addons #### Addons
@ -1045,7 +1670,7 @@ Notable changes between versions.
* Container Linux stable, beta, and alpha now provide Docker 17.09 (instead * Container Linux stable, beta, and alpha now provide Docker 17.09 (instead
of 1.12) of 1.12)
* Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09 * Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09
* Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69)) * Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69))
* Add optional `cluster_domain_suffix` variable (#74) * Add optional `cluster_domain_suffix` variable (#74)
* Use kubernetes-incubator/bootkube v0.9.1 * Use kubernetes-incubator/bootkube v0.9.1

View File

@ -11,44 +11,50 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.3 (upstream) * Kubernetes v1.20.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/cl/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/flatcar-linux/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, CSI, or other [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, CSI, or other [addons](https://typhoon.psdn.io/addons/overview/)
## Modules ## Modules
Typhoon provides a Terraform Module for each supported operating system and platform. Typhoon provides a Terraform Module for each supported operating system and platform.
| Platform | Operating System | Terraform Module | Status | Typhoon is available for [Fedora CoreOS](https://getfedora.org/coreos/).
|---------------|------------------|------------------|--------|
| AWS | Container Linux / Flatcar Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable |
| Azure | Container Linux | [azure/container-linux/kubernetes](azure/container-linux/kubernetes) | alpha |
| Bare-Metal | Container Linux / Flatcar Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
| Digital Ocean | Container Linux | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta |
| Google Cloud | Container Linux | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | stable |
A preview of Typhoon for [Fedora CoreOS](https://getfedora.org/coreos/) is available for testing.
| Platform | Operating System | Terraform Module | Status | | Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------| |---------------|------------------|------------------|--------|
| AWS | Fedora CoreOS | [aws/fedora-coreos/kubernetes](aws/fedora-coreos/kubernetes) | preview | | AWS | Fedora CoreOS | [aws/fedora-coreos/kubernetes](aws/fedora-coreos/kubernetes) | stable |
| Bare-Metal | Fedora CoreOS | [bare-metal/fedora-coreos/kubernetes](bare-metal/fedora-coreos/kubernetes) | preview | | Azure | Fedora CoreOS | [azure/fedora-coreos/kubernetes](azure/fedora-coreos/kubernetes) | alpha |
| Bare-Metal | Fedora CoreOS | [bare-metal/fedora-coreos/kubernetes](bare-metal/fedora-coreos/kubernetes) | stable |
| DigitalOcean | Fedora CoreOS | [digital-ocean/fedora-coreos/kubernetes](digital-ocean/fedora-coreos/kubernetes) | beta |
| Google Cloud | Fedora CoreOS | [google-cloud/fedora-coreos/kubernetes](google-cloud/fedora-coreos/kubernetes) | stable |
Typhoon is available for [Flatcar Linux](https://www.flatcar-linux.org/releases/).
| Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------|
| AWS | Flatcar Linux | [aws/flatcar-linux/kubernetes](aws/flatcar-linux/kubernetes) | stable |
| Azure | Flatcar Linux | [azure/flatcar-linux/kubernetes](azure/flatcar-linux/kubernetes) | alpha |
| Bare-Metal | Flatcar Linux | [bare-metal/flatcar-linux/kubernetes](bare-metal/flatcar-linux/kubernetes) | stable |
| DigitalOcean | Flatcar Linux | [digital-ocean/flatcar-linux/kubernetes](digital-ocean/flatcar-linux/kubernetes) | beta |
| Google Cloud | Flatcar Linux | [google-cloud/flatcar-linux/kubernetes](google-cloud/flatcar-linux/kubernetes) | beta |
## Documentation ## Documentation
* [Docs](https://typhoon.psdn.io) * [Docs](https://typhoon.psdn.io)
* Architecture [concepts](https://typhoon.psdn.io/architecture/concepts/) and [operating systems](https://typhoon.psdn.io/architecture/operating-systems/) * Architecture [concepts](https://typhoon.psdn.io/architecture/concepts/) and [operating systems](https://typhoon.psdn.io/architecture/operating-systems/)
* Tutorials for [AWS](docs/cl/aws.md), [Azure](docs/cl/azure.md), [Bare-Metal](docs/cl/bare-metal.md), [Digital Ocean](docs/cl/digital-ocean.md), and [Google-Cloud](docs/cl/google-cloud.md) * Fedora CoreOS tutorials for [AWS](docs/fedora-coreos/aws.md), [Azure](docs/fedora-coreos/azure.md), [Bare-Metal](docs/fedora-coreos/bare-metal.md), [DigitalOcean](docs/fedora-coreos/digitalocean.md), and [Google Cloud](docs/fedora-coreos/google-cloud.md)
* Flatcar Linux tutorials for [AWS](docs/flatcar-linux/aws.md), [Azure](docs/flatcar-linux/azure.md), [Bare-Metal](docs/flatcar-linux/bare-metal.md), [DigitalOcean](docs/flatcar-linux/digitalocean.md), and [Google Cloud](docs/flatcar-linux/google-cloud.md)
## Usage ## Usage
Define a Kubernetes cluster by using the Terraform module for your chosen platform and operating system. Here's a minimal example: Define a Kubernetes cluster by using the Terraform module for your chosen platform and operating system. Here's a minimal example:
```tf ```tf
module "google-cloud-yavin" { module "yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.16.3" source = "git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=v1.20.0"
# Google Cloud # Google Cloud
cluster_name = "yavin" cluster_name = "yavin"
@ -58,12 +64,17 @@ module "google-cloud-yavin" {
# configuration # configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..." ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
asset_dir = "/home/user/.secrets/clusters/yavin"
# optional # optional
worker_count = 2 worker_count = 2
worker_preemptible = true worker_preemptible = true
} }
# Obtain cluster kubeconfig
resource "local_file" "kubeconfig-yavin" {
content = module.yavin.kubeconfig-admin
filename = "/home/user/.kube/configs/yavin-config"
}
``` ```
Initialize modules, plan the changes to be made, and apply the changes. Initialize modules, plan the changes to be made, and apply the changes.
@ -71,20 +82,20 @@ Initialize modules, plan the changes to be made, and apply the changes.
```sh ```sh
$ terraform init $ terraform init
$ terraform plan $ terraform plan
Plan: 64 to add, 0 to change, 0 to destroy. Plan: 62 to add, 0 to change, 0 to destroy.
$ terraform apply $ terraform apply
Apply complete! Resources: 64 added, 0 changed, 0 destroyed. Apply complete! Resources: 62 added, 0 changed, 0 destroyed.
``` ```
In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes. In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.
```sh ```sh
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig $ export KUBECONFIG=/home/user/.kube/configs/yavin-config
$ kubectl get nodes $ kubectl get nodes
NAME ROLES STATUS AGE VERSION NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.16.3 yavin-controller-0.c.example-com.internal <none> Ready 6m v1.20.0
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.16.3 yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.20.0
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.16.3 yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.20.0
``` ```
List the pods. List the pods.

View File

@ -1,4 +0,0 @@
apiVersion: v1
kind: Namespace
metadata:
name: reboot-coordinator

View File

@ -1,12 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: reboot-coordinator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: reboot-coordinator
subjects:
- kind: ServiceAccount
namespace: reboot-coordinator
name: default

View File

@ -1,45 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: reboot-coordinator
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- get
- update
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- delete
- apiGroups:
- "extensions"
resources:
- daemonsets
verbs:
- get

View File

@ -1,68 +0,0 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: container-linux-update-agent
namespace: reboot-coordinator
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
name: container-linux-update-agent
template:
metadata:
labels:
name: container-linux-update-agent
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: update-agent
image: quay.io/coreos/container-linux-update-operator:v0.7.0
command:
- "/bin/update-agent"
env:
# read by update-agent as the node name to manage reboots for
- name: UPDATE_AGENT_NODE
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 10m
memory: 20Mi
limits:
cpu: 20m
memory: 40Mi
volumeMounts:
- mountPath: /var/run/dbus
name: var-run-dbus
- mountPath: /etc/coreos
name: etc-coreos
- mountPath: /usr/share/coreos
name: usr-share-coreos
- mountPath: /etc/os-release
name: etc-os-release
volumes:
- name: var-run-dbus
hostPath:
path: /var/run/dbus
- name: etc-coreos
hostPath:
path: /etc/coreos
- name: usr-share-coreos
hostPath:
path: /usr/share/coreos
- name: etc-os-release
hostPath:
path: /etc/os-release

View File

@ -1,39 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: container-linux-update-operator
namespace: reboot-coordinator
spec:
replicas: 1
selector:
matchLabels:
name: container-linux-update-operator
template:
metadata:
labels:
name: container-linux-update-operator
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: update-operator
image: quay.io/coreos/container-linux-update-operator:v0.7.0
command:
- "/bin/update-operator"
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 10m
memory: 20Mi
limits:
cpu: 20m
memory: 40Mi

View File

@ -49,6 +49,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -72,7 +73,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(coredns_dns_request_count_total{instance=~\"$instance\"}[5m])) by (proto)", "expr": "sum(rate(coredns_dns_requests_total{instance=~\"$instance\"}[5m])) by (proto)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{proto}}", "legendFormat": "{{proto}}",
@ -140,6 +141,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -163,7 +165,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(coredns_dns_request_type_count_total{instance=~\"$instance\"}[5m])) by (type)", "expr": "sum(rate(coredns_dns_requests_total{instance=~\"$instance\"}[5m])) by (type)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{type}}", "legendFormat": "{{type}}",
@ -231,6 +233,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -254,7 +257,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(coredns_dns_request_count_total{instance=~\"$instance\"}[5m])) by (zone)", "expr": "sum(rate(coredns_dns_requests_total{instance=~\"$instance\"}[5m])) by (zone)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{zone}}", "legendFormat": "{{zone}}",
@ -335,6 +338,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -440,6 +444,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -463,7 +468,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(coredns_dns_response_rcode_count_total{instance=~\"$instance\"}[5m])) by (rcode)", "expr": "sum(rate(coredns_dns_responses_total{instance=~\"$instance\"}[5m])) by (rcode)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{rcode}}", "legendFormat": "{{rcode}}",
@ -544,6 +549,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -649,6 +655,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -767,6 +774,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -790,7 +798,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(coredns_cache_size{instance=~\"$instance\"}) by (type)", "expr": "sum(coredns_cache_entries{instance=~\"$instance\"}) by (type)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{type}}", "legendFormat": "{{type}}",
@ -858,6 +866,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },

View File

@ -11,7 +11,6 @@ data:
"editable": true, "editable": true,
"gnetId": null, "gnetId": null,
"hideControls": false, "hideControls": false,
"id": 6,
"links": [ "links": [
], ],
@ -343,7 +342,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "etcd_debugging_mvcc_db_total_size_in_bytes{job=\"$cluster\"}", "expr": "etcd_mvcc_db_total_size_in_bytes{job=\"$cluster\"}",
"hide": false, "hide": false,
"interval": "", "interval": "",
"intervalFactor": 2, "intervalFactor": 2,

View File

@ -21,7 +21,7 @@ data:
"links": [ "links": [
], ],
"refresh": "", "refresh": "10s",
"rows": [ "rows": [
{ {
"collapse": false, "collapse": false,
@ -172,7 +172,7 @@ data:
"tableColumn": "", "tableColumn": "",
"targets": [ "targets": [
{ {
"expr": "sum(kubelet_running_pod_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})", "expr": "sum(kubelet_running_pods{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}}", "legendFormat": "{{instance}}",
@ -256,7 +256,7 @@ data:
"tableColumn": "", "tableColumn": "",
"targets": [ "targets": [
{ {
"expr": "sum(kubelet_running_container_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})", "expr": "sum(kubelet_running_containers{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}}", "legendFormat": "{{instance}}",
@ -558,15 +558,16 @@ data:
}, },
"id": 8, "id": 8,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -649,15 +650,16 @@ data:
}, },
"id": 9, "id": 9,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -753,15 +755,16 @@ data:
}, },
"id": 10, "id": 10,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -857,15 +860,16 @@ data:
}, },
"id": 11, "id": 11,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -955,15 +959,16 @@ data:
}, },
"id": 12, "id": 12,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1066,17 +1071,18 @@ data:
}, },
"id": 13, "id": 13,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"hideEmpty": "true", "hideEmpty": true,
"hideZero": "true", "hideZero": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1159,17 +1165,18 @@ data:
}, },
"id": 14, "id": 14,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"hideEmpty": "true", "hideEmpty": true,
"hideZero": "true", "hideZero": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1265,17 +1272,18 @@ data:
}, },
"id": 15, "id": 15,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"hideEmpty": "true", "hideEmpty": true,
"hideZero": "true", "hideZero": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1371,15 +1379,16 @@ data:
}, },
"id": 16, "id": 16,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1462,15 +1471,16 @@ data:
}, },
"id": 17, "id": 17,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1567,15 +1577,16 @@ data:
}, },
"id": 18, "id": 18,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1658,15 +1669,16 @@ data:
}, },
"id": 19, "id": 19,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1762,15 +1774,16 @@ data:
}, },
"id": 20, "id": 20,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1873,6 +1886,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1991,15 +2005,16 @@ data:
}, },
"id": 22, "id": 22,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -2021,7 +2036,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}} {{verb}} {{url}}", "legendFormat": "{{instance}} {{verb}} {{url}}",
@ -2102,6 +2117,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2193,6 +2209,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2284,6 +2301,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2373,8 +2391,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -2427,7 +2445,7 @@ data:
"options": [ "options": [
], ],
"query": "label_values(kubelet_runtime_operations{cluster=\"$cluster\", job=\"kubelet\"}, instance)", "query": "label_values(kubelet_runtime_operations_total{cluster=\"$cluster\", job=\"kubelet\"}, instance)",
"refresh": 2, "refresh": 2,
"regex": "", "regex": "",
"sort": 1, "sort": 1,
@ -2470,7 +2488,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Kubelet", "title": "Kubernetes / Kubelet",
"uid": "3138fa155d5915769fbded898ac09fd9", "uid": "3138fa155d5915769fbded898ac09fd9",
"version": 0 "version": 0
@ -2496,7 +2514,7 @@ data:
"links": [ "links": [
], ],
"refresh": "", "refresh": "10s",
"rows": [ "rows": [
{ {
"collapse": false, "collapse": false,
@ -2607,6 +2625,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2691,15 +2710,16 @@ data:
}, },
"id": 4, "id": 4,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -2802,6 +2822,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2886,15 +2907,16 @@ data:
}, },
"id": 6, "id": 6,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -2997,6 +3019,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3109,6 +3132,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3132,7 +3156,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-proxy\",instance=~\"$instance\",verb=\"POST\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-proxy\",instance=~\"$instance\",verb=\"POST\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -3206,15 +3230,16 @@ data:
}, },
"id": 9, "id": 9,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -3236,7 +3261,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-proxy\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-proxy\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -3317,6 +3342,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3408,6 +3434,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3499,6 +3526,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -3588,8 +3616,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -3659,7 +3687,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Proxy", "title": "Kubernetes / Proxy",
"uid": "632e265de029684c40b21cb76bca4f94", "uid": "632e265de029684c40b21cb76bca4f94",
"version": 0 "version": 0

View File

@ -31,6 +31,7 @@ data:
"fill": 1, "fill": 1,
"format": "percentunit", "format": "percentunit",
"id": 1, "id": 1,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -59,7 +60,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "1 - avg(rate(node_cpu_seconds_total{mode=\"idle\", cluster=\"$cluster\"}[1m]))", "expr": "1 - avg(rate(node_cpu_seconds_total{mode=\"idle\", cluster=\"$cluster\"}[$__interval]))",
"format": "time_series", "format": "time_series",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -686,6 +687,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1", "linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #A", "pattern": "Value #A",
@ -704,6 +706,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to workloads", "linkTooltip": "Drill down to workloads",
"linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1", "linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #B", "pattern": "Value #B",
@ -722,6 +725,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -740,6 +744,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -758,6 +763,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -776,6 +782,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -794,6 +801,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #G", "pattern": "Value #G",
@ -812,6 +820,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell", "linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace", "pattern": "namespace",
@ -839,7 +848,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "count(mixin_pod_workload{cluster=\"$cluster\"}) by (namespace)", "expr": "sum(kube_pod_owner{cluster=\"$cluster\"}) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -848,7 +857,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "count(avg(mixin_pod_workload{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)", "expr": "count(avg(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1105,6 +1114,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1", "linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #A", "pattern": "Value #A",
@ -1123,6 +1133,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to workloads", "linkTooltip": "Drill down to workloads",
"linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1", "linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #B", "pattern": "Value #B",
@ -1141,6 +1152,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -1159,6 +1171,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -1177,6 +1190,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -1195,6 +1209,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -1213,6 +1228,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #G", "pattern": "Value #G",
@ -1231,6 +1247,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell", "linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace", "pattern": "namespace",
@ -1258,7 +1275,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "count(mixin_pod_workload{cluster=\"$cluster\"}) by (namespace)", "expr": "sum(kube_pod_owner{cluster=\"$cluster\"}) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1267,7 +1284,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "count(avg(mixin_pod_workload{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)", "expr": "count(avg(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1384,6 +1401,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 11, "id": 11,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1426,6 +1444,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -1444,6 +1463,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -1462,6 +1482,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -1480,6 +1501,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -1498,6 +1520,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -1516,6 +1539,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -1534,6 +1558,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell", "linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace", "pattern": "namespace",
@ -1561,7 +1586,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1570,7 +1595,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1579,7 +1604,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1588,7 +1613,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1597,7 +1622,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1606,7 +1631,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1706,7 +1731,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{namespace}}", "legendFormat": "{{namespace}}",
@ -1804,7 +1829,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{namespace}}", "legendFormat": "{{namespace}}",
@ -1902,7 +1927,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{namespace}}", "legendFormat": "{{namespace}}",
@ -2000,7 +2025,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{namespace}}", "legendFormat": "{{namespace}}",
@ -2098,7 +2123,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{namespace}}", "legendFormat": "{{namespace}}",
@ -2196,7 +2221,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{namespace}}", "legendFormat": "{{namespace}}",
@ -2294,7 +2319,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{namespace}}", "legendFormat": "{{namespace}}",
@ -2392,7 +2417,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)", "expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{namespace}}", "legendFormat": "{{namespace}}",
@ -2458,8 +2483,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -2475,63 +2500,28 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"text": "prod", "text": "",
"value": "prod" "value": ""
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 2, "hide": 2,
"includeAll": false, "includeAll": false,
"label": "cluster", "label": null,
"multi": false, "multi": false,
"name": "cluster", "name": "cluster",
"options": [ "options": [
], ],
"query": "label_values(node_cpu_seconds_total, cluster)", "query": "label_values(node_cpu_seconds_total, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2, "refresh": 2,
"regex": "", "regex": "",
"skipUrlSync": false,
"sort": 1, "sort": 1,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
], ],
"tagsQuery": "", "tagsQuery": "",
"type": "interval", "type": "query",
"useTags": false "useTags": false
} }
] ]
@ -2565,7 +2555,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Cluster", "title": "Kubernetes / Compute Resources / Cluster",
"uid": "efa86fd1d0c121a26444b636a3f509a8", "uid": "efa86fd1d0c121a26444b636a3f509a8",
"version": 0 "version": 0
@ -2586,6 +2576,354 @@ data:
], ],
"refresh": "10s", "refresh": "10s",
"rows": [ "rows": [
{
"collapse": false,
"height": "100px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation (from requests)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation (from limits)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\", image!=\"\"}) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilization (from requests)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\", image!=\"\"}) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilisation (from limits)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Headlines",
"titleSize": "h6"
},
{ {
"collapse": false, "collapse": false,
"height": "250px", "height": "250px",
@ -2599,7 +2937,7 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 1, "id": 5,
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -2620,7 +2958,26 @@ data:
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"seriesOverrides": [ "seriesOverrides": [
{
"alias": "quota - requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
},
{
"alias": "quota - limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
}
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 12, "span": 12,
@ -2634,6 +2991,22 @@ data:
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
"legendLink": null, "legendLink": null,
"step": 10 "step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - requests",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - limits",
"legendLink": null,
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -2697,7 +3070,7 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 2, "id": 6,
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -2740,6 +3113,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -2758,6 +3132,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -2776,6 +3151,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -2794,6 +3170,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -2812,6 +3189,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -2830,6 +3208,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -2964,7 +3343,7 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 3, "id": 7,
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -2985,7 +3364,26 @@ data:
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"seriesOverrides": [ "seriesOverrides": [
{
"alias": "quota - requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
},
{
"alias": "quota - limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
}
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 12, "span": 12,
@ -2993,12 +3391,28 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}) by (pod)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
"legendLink": null, "legendLink": null,
"step": 10 "step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - requests",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - limits",
"legendLink": null,
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -3062,7 +3476,7 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 4, "id": 8,
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -3105,6 +3519,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -3123,6 +3538,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -3141,6 +3557,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -3159,6 +3576,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -3177,6 +3595,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -3195,6 +3614,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -3213,6 +3633,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #G", "pattern": "Value #G",
@ -3231,6 +3652,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #H", "pattern": "Value #H",
@ -3249,6 +3671,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -3276,7 +3699,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\", image!=\"\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3294,7 +3717,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"}) by (pod)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\", image!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3312,7 +3735,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"}) by (pod)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\", image!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3410,7 +3833,8 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 5, "id": 9,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -3453,6 +3877,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -3471,6 +3896,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -3489,6 +3915,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -3507,6 +3934,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -3525,6 +3953,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -3543,6 +3972,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -3561,6 +3991,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -3588,7 +4019,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3597,7 +4028,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3606,7 +4037,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3615,7 +4046,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3624,7 +4055,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3633,7 +4064,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3698,398 +4129,6 @@ data:
{ {
"aliasColors": { "aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
}, },
"bars": false, "bars": false,
"dashLength": 10, "dashLength": 10,
@ -4125,7 +4164,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -4138,7 +4177,7 @@ data:
], ],
"timeFrom": null, "timeFrom": null,
"timeShift": null, "timeShift": null,
"title": "Rate of Received Packets Dropped", "title": "Receive Bandwidth",
"tooltip": { "tooltip": {
"shared": false, "shared": false,
"sort": 0, "sort": 0,
@ -4223,7 +4262,399 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 12,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 13,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 14,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 15,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -4289,8 +4720,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -4306,13 +4737,13 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"text": "prod", "text": "",
"value": "prod" "value": ""
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 2, "hide": 2,
"includeAll": false, "includeAll": false,
"label": "cluster", "label": null,
"multi": false, "multi": false,
"name": "cluster", "name": "cluster",
"options": [ "options": [
@ -4321,7 +4752,7 @@ data:
"query": "label_values(kube_pod_info, cluster)", "query": "label_values(kube_pod_info, cluster)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 2, "sort": 1,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
@ -4333,13 +4764,13 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"text": "prod", "text": "",
"value": "prod" "value": ""
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 0, "hide": 0,
"includeAll": false, "includeAll": false,
"label": "namespace", "label": null,
"multi": false, "multi": false,
"name": "namespace", "name": "namespace",
"options": [ "options": [
@ -4348,48 +4779,13 @@ data:
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)", "query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1, "sort": 1,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
], ],
"tagsQuery": "", "tagsQuery": "",
"type": "interval", "type": "query",
"useTags": false "useTags": false
} }
] ]
@ -4423,7 +4819,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Namespace (Pods)", "title": "Kubernetes / Compute Resources / Namespace (Pods)",
"uid": "85a562078cdf77779eaa1add43ccec1e", "uid": "85a562078cdf77779eaa1add43ccec1e",
"version": 0 "version": 0
@ -4486,7 +4882,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -4598,6 +4994,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -4616,6 +5013,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -4634,6 +5032,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -4652,6 +5051,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -4670,6 +5070,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -4688,6 +5089,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "pod", "pattern": "pod",
@ -4715,7 +5117,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4724,7 +5126,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4733,7 +5135,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=~\"$node\"}) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4742,7 +5144,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4751,7 +5153,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=~\"$node\"}) by (pod) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4851,7 +5253,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\", container!=\"\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=~\"$node\", container!=\"\"}) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -4963,6 +5365,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -4981,6 +5384,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -4999,6 +5403,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -5017,6 +5422,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -5035,6 +5441,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -5053,6 +5460,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -5071,6 +5479,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #G", "pattern": "Value #G",
@ -5089,6 +5498,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #H", "pattern": "Value #H",
@ -5107,6 +5517,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "pod", "pattern": "pod",
@ -5134,7 +5545,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5143,7 +5554,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5152,7 +5563,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5161,7 +5572,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", node=\"$node\"}) by (pod)", "expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5170,7 +5581,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{node=\"$node\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{node=~\"$node\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5179,7 +5590,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_memory_rss{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_rss{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5188,7 +5599,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_memory_cache{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_cache{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5197,7 +5608,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(node_namespace_pod_container:container_memory_swap{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)", "expr": "sum(node_namespace_pod_container:container_memory_swap{cluster=\"$cluster\", node=~\"$node\",container!=\"\"}) by (pod)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -5265,8 +5676,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -5282,13 +5693,13 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"text": "prod", "text": "",
"value": "prod" "value": ""
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 2, "hide": 2,
"includeAll": false, "includeAll": false,
"label": "cluster", "label": null,
"multi": false, "multi": false,
"name": "cluster", "name": "cluster",
"options": [ "options": [
@ -5297,7 +5708,7 @@ data:
"query": "label_values(kube_pod_info, cluster)", "query": "label_values(kube_pod_info, cluster)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 2, "sort": 1,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
@ -5309,14 +5720,14 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"text": "prod", "text": "",
"value": "prod" "value": ""
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 0, "hide": 0,
"includeAll": false, "includeAll": false,
"label": "node", "label": null,
"multi": false, "multi": true,
"name": "node", "name": "node",
"options": [ "options": [
@ -5324,7 +5735,7 @@ data:
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, node)", "query": "label_values(kube_pod_info{cluster=\"$cluster\"}, node)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 2, "sort": 1,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
@ -5364,7 +5775,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Node (Pods)", "title": "Kubernetes / Compute Resources / Node (Pods)",
"uid": "200ac8fdbfbb74b39aff88118e4d1c2c", "uid": "200ac8fdbfbb74b39aff88118e4d1c2c",
"version": 0 "version": 0

View File

@ -50,7 +50,24 @@ data:
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"seriesOverrides": [ "seriesOverrides": [
{
"alias": "requests",
"color": "#F2495C",
"fill": 0,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
},
{
"alias": "limits",
"color": "#FF9830",
"fill": 0,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
}
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 12, "span": 12,
@ -64,6 +81,22 @@ data:
"legendFormat": "{{container}}", "legendFormat": "{{container}}",
"legendLink": null, "legendLink": null,
"step": 10 "step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "requests",
"legendLink": null,
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limits",
"legendLink": null,
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -126,8 +159,113 @@ data:
"dashLength": 10, "dashLength": 10,
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 10,
"id": 2, "id": 2,
"legend": {
"avg": false,
"current": true,
"max": true,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(container_cpu_cfs_throttled_periods_total{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\", cluster=\"$cluster\"}[5m])) by (container) /sum(increase(container_cpu_cfs_periods_total{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\", cluster=\"$cluster\"}[5m])) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
{
"colorMode": "critical",
"fill": true,
"line": true,
"op": "gt",
"value": 0.80000000000000004,
"yaxis": "left"
}
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Throttling",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Throttling",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 3,
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -170,6 +308,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -188,6 +327,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -206,6 +346,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -224,6 +365,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -242,6 +384,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -260,6 +403,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "container", "pattern": "container",
@ -394,7 +538,7 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 3, "id": 4,
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -415,7 +559,26 @@ data:
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"seriesOverrides": [ "seriesOverrides": [
{
"alias": "requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
},
{
"alias": "limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
}
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 12, "span": 12,
@ -423,26 +586,26 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(container_memory_rss{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\", image!=\"\"}) by (container)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{container}} (RSS)", "legendFormat": "{{container}}",
"legendLink": null, "legendLink": null,
"step": 10 "step": 10
}, },
{ {
"expr": "sum(container_memory_cache{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)", "expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{container}} (Cache)", "legendFormat": "requests",
"legendLink": null, "legendLink": null,
"step": 10 "step": 10
}, },
{ {
"expr": "sum(container_memory_swap{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)", "expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{container}} (Swap)", "legendFormat": "limits",
"legendLink": null, "legendLink": null,
"step": 10 "step": 10
} }
@ -508,7 +671,7 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 4, "id": 5,
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -551,6 +714,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -569,6 +733,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -587,6 +752,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -605,6 +771,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -623,6 +790,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -641,6 +809,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -659,6 +828,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #G", "pattern": "Value #G",
@ -677,6 +847,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #H", "pattern": "Value #H",
@ -695,6 +866,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "container", "pattern": "container",
@ -722,7 +894,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\", image!=\"\"}) by (container)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -740,7 +912,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", image!=\"\"}) by (container) / sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -758,7 +930,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\"}) by (container) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)", "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container) / sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -856,7 +1028,8 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 5, "id": 6,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -885,7 +1058,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_receive_bytes_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_receive_bytes_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -954,7 +1127,8 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 6, "id": 7,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -983,7 +1157,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_transmit_bytes_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_transmit_bytes_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -1052,7 +1226,8 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 7, "id": 8,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1081,7 +1256,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_receive_packets_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_receive_packets_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -1150,7 +1325,8 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 8, "id": 9,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1179,7 +1355,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_transmit_packets_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_transmit_packets_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -1248,7 +1424,8 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 9, "id": 10,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1277,7 +1454,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_receive_packets_dropped_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_receive_packets_dropped_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -1346,7 +1523,8 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 10, "fill": 10,
"id": 10, "id": 11,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -1375,7 +1553,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(irate(container_network_transmit_packets_dropped_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)", "expr": "sum(irate(container_network_transmit_packets_dropped_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -1441,8 +1619,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -1458,13 +1636,13 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"text": "prod", "text": "",
"value": "prod" "value": ""
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 2, "hide": 2,
"includeAll": false, "includeAll": false,
"label": "cluster", "label": null,
"multi": false, "multi": false,
"name": "cluster", "name": "cluster",
"options": [ "options": [
@ -1473,7 +1651,7 @@ data:
"query": "label_values(kube_pod_info, cluster)", "query": "label_values(kube_pod_info, cluster)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 2, "sort": 1,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
@ -1485,13 +1663,13 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"text": "prod", "text": "",
"value": "prod" "value": ""
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 0, "hide": 0,
"includeAll": false, "includeAll": false,
"label": "namespace", "label": null,
"multi": false, "multi": false,
"name": "namespace", "name": "namespace",
"options": [ "options": [
@ -1500,75 +1678,40 @@ data:
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)", "query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "pod",
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=\"$namespace\"}, pod)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1, "sort": 1,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
], ],
"tagsQuery": "", "tagsQuery": "",
"type": "interval", "type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=\"$namespace\"}, pod)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false "useTags": false
} }
] ]
@ -1602,7 +1745,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Pod", "title": "Kubernetes / Compute Resources / Pod",
"uid": "6581e46e4e5c7ba40a07646395ef7b23", "uid": "6581e46e4e5c7ba40a07646395ef7b23",
"version": 0 "version": 0
@ -1665,7 +1808,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -1777,6 +1920,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -1795,6 +1939,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -1813,6 +1958,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -1831,6 +1977,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -1849,6 +1996,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -1867,6 +2015,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -1894,7 +2043,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1903,7 +2052,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1912,7 +2061,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1921,7 +2070,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -1930,7 +2079,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2030,7 +2179,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -2142,6 +2291,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -2160,6 +2310,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -2178,6 +2329,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -2196,6 +2348,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -2214,6 +2367,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -2232,6 +2386,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -2259,7 +2414,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2268,7 +2423,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2277,7 +2432,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2286,7 +2441,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2295,7 +2450,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2367,6 +2522,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 5, "id": 5,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -2409,6 +2565,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -2427,6 +2584,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -2445,6 +2603,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -2463,6 +2622,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -2481,6 +2641,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -2499,6 +2660,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -2517,6 +2679,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell", "linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod", "pattern": "pod",
@ -2544,7 +2707,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2553,7 +2716,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2562,7 +2725,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2571,7 +2734,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2580,7 +2743,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2589,7 +2752,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -2689,7 +2852,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -2787,7 +2950,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -2885,7 +3048,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -2983,7 +3146,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3081,7 +3244,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3179,7 +3342,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3277,7 +3440,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3375,7 +3538,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod) \ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n", "expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@ -3441,8 +3604,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -3458,13 +3621,13 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"text": "prod", "text": "",
"value": "prod" "value": ""
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 2, "hide": 2,
"includeAll": false, "includeAll": false,
"label": "cluster", "label": null,
"multi": false, "multi": false,
"name": "cluster", "name": "cluster",
"options": [ "options": [
@ -3473,7 +3636,7 @@ data:
"query": "label_values(kube_pod_info, cluster)", "query": "label_values(kube_pod_info, cluster)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 2, "sort": 1,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
@ -3485,13 +3648,13 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"text": "prod", "text": "",
"value": "prod" "value": ""
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 0, "hide": 0,
"includeAll": false, "includeAll": false,
"label": "namespace", "label": null,
"multi": false, "multi": false,
"name": "namespace", "name": "namespace",
"options": [ "options": [
@ -3500,102 +3663,67 @@ data:
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)", "query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "workload",
"multi": false,
"name": "workload",
"options": [
],
"query": "label_values(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}, workload)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "type",
"multi": false,
"name": "type",
"options": [
],
"query": "label_values(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\"}, workload_type)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1, "sort": 1,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
], ],
"tagsQuery": "", "tagsQuery": "",
"type": "interval", "type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "workload",
"options": [
],
"query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\"}, workload)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "type",
"options": [
],
"query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\"}, workload_type)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false "useTags": false
} }
] ]
@ -3629,7 +3757,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Workload", "title": "Kubernetes / Compute Resources / Workload",
"uid": "a164a7f0339f99e89cea5cb47e9be617", "uid": "a164a7f0339f99e89cea5cb47e9be617",
"version": 0 "version": 0
@ -3684,7 +3812,26 @@ data:
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"seriesOverrides": [ "seriesOverrides": [
{
"alias": "quota - requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
},
{
"alias": "quota - limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
}
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 12, "span": 12,
@ -3692,12 +3839,28 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}} - {{workload_type}}", "legendFormat": "{{workload}} - {{workload_type}}",
"legendLink": null, "legendLink": null,
"step": 10 "step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - requests",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - limits",
"legendLink": null,
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -3804,6 +3967,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -3822,6 +3986,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -3840,6 +4005,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -3858,6 +4024,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -3876,6 +4043,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -3894,6 +4062,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -3912,6 +4081,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2", "linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"pattern": "workload", "pattern": "workload",
@ -3930,6 +4100,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "workload_type", "pattern": "workload_type",
@ -3957,7 +4128,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "count(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)", "expr": "count(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3966,7 +4137,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3975,7 +4146,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3984,7 +4155,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -3993,7 +4164,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4002,7 +4173,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4094,7 +4265,26 @@ data:
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"seriesOverrides": [ "seriesOverrides": [
{
"alias": "quota - requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
},
{
"alias": "quota - limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
}
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 12, "span": 12,
@ -4102,12 +4292,28 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}} - {{workload_type}}", "legendFormat": "{{workload}} - {{workload_type}}",
"legendLink": null, "legendLink": null,
"step": 10 "step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - requests",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - limits",
"legendLink": null,
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -4214,6 +4420,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0, "decimals": 0,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -4232,6 +4439,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -4250,6 +4458,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -4268,6 +4477,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -4286,6 +4496,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -4304,6 +4515,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -4322,6 +4534,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2", "linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"pattern": "workload", "pattern": "workload",
@ -4340,6 +4553,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "workload_type", "pattern": "workload_type",
@ -4367,7 +4581,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "count(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)", "expr": "count(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4376,7 +4590,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4385,7 +4599,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4394,7 +4608,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4403,7 +4617,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4412,7 +4626,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n", "expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4484,6 +4698,7 @@ data:
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 5, "id": 5,
"interval": "1m",
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
@ -4526,6 +4741,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -4544,6 +4760,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -4562,6 +4779,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #C", "pattern": "Value #C",
@ -4580,6 +4798,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #D", "pattern": "Value #D",
@ -4598,6 +4817,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #E", "pattern": "Value #E",
@ -4616,6 +4836,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #F", "pattern": "Value #F",
@ -4634,6 +4855,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": true, "link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods", "linkTooltip": "Drill down to pods",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$type", "linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$type",
"pattern": "workload", "pattern": "workload",
@ -4652,6 +4874,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "workload_type", "pattern": "workload_type",
@ -4679,7 +4902,7 @@ data:
], ],
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4688,7 +4911,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4697,7 +4920,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4706,7 +4929,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4715,7 +4938,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4724,7 +4947,7 @@ data:
"step": 10 "step": 10
}, },
{ {
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@ -4824,7 +5047,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -4922,7 +5145,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5020,7 +5243,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5118,7 +5341,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5216,7 +5439,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5314,7 +5537,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5412,7 +5635,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5510,7 +5733,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod) \ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n", "expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{workload}}", "legendFormat": "{{workload}}",
@ -5576,8 +5799,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -5590,95 +5813,6 @@ data:
"regex": "", "regex": "",
"type": "datasource" "type": "datasource"
}, },
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
},
{ {
"allValue": null, "allValue": null,
"auto": false, "auto": false,
@ -5689,7 +5823,7 @@ data:
"value": "deployment" "value": "deployment"
}, },
"datasource": "$datasource", "datasource": "$datasource",
"definition": "label_values(mixin_pod_workload{namespace=~\"$namespace\", workload=~\".+\"}, workload_type)", "definition": "label_values(namespace_workload_pod:kube_pod_owner:relabel{namespace=~\"$namespace\", workload=~\".+\"}, workload_type)",
"hide": 0, "hide": 0,
"includeAll": false, "includeAll": false,
"label": null, "label": null,
@ -5698,7 +5832,7 @@ data:
"options": [ "options": [
], ],
"query": "label_values(mixin_pod_workload{namespace=~\"$namespace\", workload=~\".+\"}, workload_type)", "query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{namespace=~\"$namespace\", workload=~\".+\"}, workload_type)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"skipUrlSync": false, "skipUrlSync": false,
@ -5706,6 +5840,60 @@ data:
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
], ],
"tagsQuery": "", "tagsQuery": "",
"type": "query", "type": "query",
@ -5742,7 +5930,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Compute Resources / Namespace (Workloads)", "title": "Kubernetes / Compute Resources / Namespace (Workloads)",
"uid": "a87fb0d919ec0ea5f6543124e16c42a5", "uid": "a87fb0d919ec0ea5f6543124e16c42a5",
"version": 0 "version": 0

View File

@ -1,6193 +0,0 @@
apiVersion: v1
data:
k8s-resources-cluster.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "100px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "1 - avg(rate(node_cpu_seconds_total{mode=\"idle\", cluster=\"$cluster\"}[1m]))",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_cpu_cores{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Requests Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_cpu_cores{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Limits Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "1 - sum(:node_memory_MemFreeCachedBuffers_bytes:sum{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_memory_bytes{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_memory_bytes{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Requests Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_memory_bytes{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Limits Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Headlines",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Pods",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workloads",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to workloads",
"linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count(mixin_pod_workload{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "count(avg(mixin_pod_workload{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Pods",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workloads",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to workloads",
"linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count(mixin_pod_workload{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "count(avg(mixin_pod_workload{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace) / sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace) / sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Requests by Namespace",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Requests",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(node_cpu_seconds_total, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Cluster",
"uid": "efa86fd1d0c121a26444b636a3f509a8",
"version": 0
}
k8s-resources-namespace.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Usage (RSS)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Cache)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Swap)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(container_memory_cache{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sum(container_memory_swap{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Namespace (Pods)",
"uid": "85a562078cdf77779eaa1add43ccec1e",
"version": 0
}
k8s-resources-node.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\", container!=\"\"}) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Usage (RSS)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Cache)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Swap)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(container_memory_cache{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sum(container_memory_swap{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "node",
"multi": false,
"name": "node",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, node)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Node (Pods)",
"uid": "200ac8fdbfbb74b39aff88118e4d1c2c",
"version": 0
}
k8s-resources-pod.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", cluster=\"$cluster\"}) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Container",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "container",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}} (RSS)",
"legendLink": null,
"step": 10
},
{
"expr": "sum(container_memory_cache{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}} (Cache)",
"legendLink": null,
"step": 10
},
{
"expr": "sum(container_memory_swap{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}} (Swap)",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Usage (RSS)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Cache)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Swap)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Container",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "container",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\"}) by (container) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container != \"\", container != \"POD\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(container_memory_cache{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container != \"\", container != \"POD\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sum(container_memory_swap{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container != \"\", container != \"POD\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "pod",
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=\"$namespace\"}, pod)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Pod",
"uid": "6581e46e4e5c7ba40a07646395ef7b23",
"version": 0
}
k8s-resources-workload.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "workload",
"multi": false,
"name": "workload",
"options": [
],
"query": "label_values(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}, workload)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "type",
"multi": false,
"name": "type",
"options": [
],
"query": "label_values(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\"}, workload_type)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Workload",
"uid": "a164a7f0339f99e89cea5cb47e9be617",
"version": 0
}
k8s-resources-workloads-namespace.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}} - {{workload_type}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Running Pods",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Workload",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"pattern": "workload",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workload Type",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "workload_type",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}) by (workload, workload_type)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}} - {{workload_type}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Running Pods",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Workload",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"pattern": "workload",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workload Type",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "workload_type",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}) by (workload, workload_type)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Namespace (Workloads)",
"uid": "a87fb0d919ec0ea5f6543124e16c42a5",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s-resources
namespace: monitoring

View File

@ -21,7 +21,25 @@ data:
"links": [ "links": [
], ],
"refresh": "", "panels": [
{
"content": "The SLO (service level objective) and other metrics displayed on this dashboard are for informational purposes only.",
"datasource": null,
"description": "The SLO (service level objective) and other metrics displayed on this dashboard are for informational purposes only.",
"gridPos": {
"h": 2,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"mode": "markdown",
"span": 12,
"title": "Notice",
"type": "text"
}
],
"refresh": "10s",
"rows": [ "rows": [
{ {
"collapse": false, "collapse": false,
@ -37,7 +55,9 @@ data:
"#d44a3a" "#d44a3a"
], ],
"datasource": "$datasource", "datasource": "$datasource",
"format": "none", "decimals": 3,
"description": "How many percent of requests (both read and write) in 30 days have been answered successfully and fast enough?",
"format": "percentunit",
"gauge": { "gauge": {
"maxValue": 100, "maxValue": 100,
"minValue": 0, "minValue": 0,
@ -48,7 +68,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 2, "id": 3,
"interval": null, "interval": null,
"links": [ "links": [
@ -78,7 +98,7 @@ data:
"to": "null" "to": "null"
} }
], ],
"span": 2, "span": 4,
"sparkline": { "sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)", "fillColor": "rgba(31, 118, 189, 0.18)",
"full": false, "full": false,
@ -88,7 +108,7 @@ data:
"tableColumn": "", "tableColumn": "",
"targets": [ "targets": [
{ {
"expr": "sum(up{job=\"apiserver\"})", "expr": "apiserver_request:availability30d{verb=\"all\", cluster=\"$cluster\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "", "legendFormat": "",
@ -96,7 +116,7 @@ data:
} }
], ],
"thresholds": "", "thresholds": "",
"title": "Up", "title": "Availability (30d) > 99.000%",
"tooltip": { "tooltip": {
"shared": false "shared": false
}, },
@ -109,7 +129,7 @@ data:
"value": "null" "value": "null"
} }
], ],
"valueName": "min" "valueName": "avg"
}, },
{ {
"aliasColors": { "aliasColors": {
@ -119,11 +139,13 @@ data:
"dashLength": 10, "dashLength": 10,
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "decimals": 3,
"description": "How much error budget is left looking at our 0.990% availability gurantees?",
"fill": 10,
"gridPos": { "gridPos": {
}, },
"id": 3, "id": 4,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -132,6 +154,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -150,37 +173,16 @@ data:
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 5, "span": 8,
"stack": false, "stack": false,
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"2..\"}[5m]))", "expr": "100 * (apiserver_request:availability30d{verb=\"all\", cluster=\"$cluster\"} - 0.990000)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "2xx", "legendFormat": "errorbudget",
"refId": "A" "refId": "A"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"3..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "3xx",
"refId": "B"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"4..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "4xx",
"refId": "C"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"5..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5xx",
"refId": "D"
} }
], ],
"thresholds": [ "thresholds": [
@ -188,7 +190,7 @@ data:
], ],
"timeFrom": null, "timeFrom": null,
"timeShift": null, "timeShift": null,
"title": "RPC Rate", "title": "ErrorBudget (30d) > 99.000%",
"tooltip": { "tooltip": {
"shared": false, "shared": false,
"sort": 0, "sort": 0,
@ -206,7 +208,8 @@ data:
}, },
"yaxes": [ "yaxes": [
{ {
"format": "ops", "decimals": 3,
"format": "percentunit",
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
@ -214,7 +217,215 @@ data:
"show": true "show": true
}, },
{ {
"format": "ops", "decimals": 3,
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"decimals": 3,
"description": "How many percent of read requests (LIST,GET) in 30 days have been answered successfully and fast enough?",
"format": "percentunit",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "apiserver_request:availability30d{verb=\"read\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Read Availability (30d)",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many read requests (LIST,GET) per second do the apiservers get by code?",
"fill": 10,
"gridPos": {
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "/2../i",
"color": "#56A64B"
},
{
"alias": "/3../i",
"color": "#F2CC0C"
},
{
"alias": "/4../i",
"color": "#3274D9"
},
{
"alias": "/5../i",
"color": "#E02F44"
}
],
"spaceLength": 10,
"span": 3,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (code) (code_resource:apiserver_request_total:rate5m{verb=\"read\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ code }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Read SLI - Requests",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "reqps",
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
@ -231,21 +442,23 @@ data:
"dashLength": 10, "dashLength": 10,
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"description": "How many percent of read requests (LIST,GET) per second are returned with errors (5xx)?",
"fill": 1, "fill": 1,
"gridPos": { "gridPos": {
}, },
"id": 4, "id": 7,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": false,
"avg": false, "avg": false,
"current": "true", "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": false,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": false
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -262,15 +475,15 @@ data:
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 5, "span": 3,
"stack": false, "stack": false,
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (verb, le))", "expr": "sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"read\",code=~\"5..\", cluster=\"$cluster\"}) / sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"read\", cluster=\"$cluster\"})",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}}", "legendFormat": "{{ resource }}",
"refId": "A" "refId": "A"
} }
], ],
@ -279,7 +492,493 @@ data:
], ],
"timeFrom": null, "timeFrom": null,
"timeShift": null, "timeShift": null,
"title": "Request duration 99th quantile", "title": "Read SLI - Errors",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many seconds is the 99th percentile for reading (LIST|GET) a given resource?",
"fill": 1,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{verb=\"read\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ resource }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Read SLI - Duration",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"decimals": 3,
"description": "How many percent of write requests (POST|PUT|PATCH|DELETE) in 30 days have been answered successfully and fast enough?",
"format": "percentunit",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 9,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "apiserver_request:availability30d{verb=\"write\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Write Availability (30d)",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many write requests (POST|PUT|PATCH|DELETE) per second do the apiservers get by code?",
"fill": 10,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "/2../i",
"color": "#56A64B"
},
{
"alias": "/3../i",
"color": "#F2CC0C"
},
{
"alias": "/4../i",
"color": "#3274D9"
},
{
"alias": "/5../i",
"color": "#E02F44"
}
],
"spaceLength": 10,
"span": 3,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (code) (code_resource:apiserver_request_total:rate5m{verb=\"write\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ code }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Write SLI - Requests",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many percent of write requests (POST|PUT|PATCH|DELETE) per second are returned with errors (5xx)?",
"fill": 1,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"write\",code=~\"5..\", cluster=\"$cluster\"}) / sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"write\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ resource }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Write SLI - Errors",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many seconds is the 99th percentile for writing (POST|PUT|PATCH|DELETE) a given resource?",
"fill": 1,
"gridPos": {
},
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{verb=\"write\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ resource }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Write SLI - Duration",
"tooltip": { "tooltip": {
"shared": false, "shared": false,
"sort": 0, "sort": 0,
@ -339,7 +1038,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 5, "id": 13,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -348,6 +1047,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": false, "show": false,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -371,7 +1071,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(workqueue_adds_total{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (instance, name)", "expr": "sum(rate(workqueue_adds_total{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance, name)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}", "legendFormat": "{{instance}} {{name}}",
@ -430,7 +1130,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 6, "id": 14,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -439,6 +1139,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": false, "show": false,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -462,7 +1163,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(workqueue_depth{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (instance, name)", "expr": "sum(rate(workqueue_depth{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance, name)",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}", "legendFormat": "{{instance}} {{name}}",
@ -521,17 +1222,18 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 7, "id": 15,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -553,7 +1255,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (instance, name, le))", "expr": "histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance, name, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}", "legendFormat": "{{instance}} {{name}}",
@ -625,7 +1327,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 8, "id": 16,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -634,6 +1336,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -657,307 +1360,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "etcd_helper_cache_entry_total{job=\"apiserver\", instance=~\"$instance\"}", "expr": "process_resident_memory_bytes{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ETCD Cache Entry Total",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_helper_cache_hit_total{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (intance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} hit",
"refId": "A"
},
{
"expr": "sum(rate(etcd_helper_cache_miss_total{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} miss",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ETCD Cache Hit/Miss Rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99,sum(rate(etcd_request_cache_get_duration_seconds_bucket{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} get",
"refId": "A"
},
{
"expr": "histogram_quantile(0.99,sum(rate(etcd_request_cache_add_duration_seconds_bucket{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} miss",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ETCD Cache Duration 99th Quantile",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes{job=\"apiserver\",instance=~\"$instance\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}}", "legendFormat": "{{instance}}",
@ -1016,7 +1419,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 12, "id": 17,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -1025,6 +1428,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1048,7 +1452,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "rate(process_cpu_seconds_total{job=\"apiserver\",instance=~\"$instance\"}[5m])", "expr": "rate(process_cpu_seconds_total{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[5m])",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}}", "legendFormat": "{{instance}}",
@ -1107,7 +1511,7 @@ data:
"gridPos": { "gridPos": {
}, },
"id": 13, "id": 18,
"legend": { "legend": {
"alignAsTable": false, "alignAsTable": false,
"avg": false, "avg": false,
@ -1116,6 +1520,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1139,7 +1544,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "go_goroutines{job=\"apiserver\",instance=~\"$instance\"}", "expr": "go_goroutines{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{instance}}", "legendFormat": "{{instance}}",
@ -1205,8 +1610,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -1223,6 +1628,32 @@ data:
"allValue": null, "allValue": null,
"current": { "current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(apiserver_request_total, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 0, "hide": 0,
@ -1233,7 +1664,7 @@ data:
"options": [ "options": [
], ],
"query": "label_values(apiserver_request_total{job=\"apiserver\"}, instance)", "query": "label_values(apiserver_request_total{job=\"apiserver\", cluster=\"$cluster\"}, instance)",
"refresh": 2, "refresh": 2,
"regex": "", "regex": "",
"sort": 1, "sort": 1,
@ -1276,7 +1707,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / API server", "title": "Kubernetes / API server",
"uid": "09ec8aa1e996d6ffcd6817bbaff4db1b", "uid": "09ec8aa1e996d6ffcd6817bbaff4db1b",
"version": 0 "version": 0
@ -1302,7 +1733,7 @@ data:
"links": [ "links": [
], ],
"refresh": "", "refresh": "10s",
"rows": [ "rows": [
{ {
"collapse": false, "collapse": false,
@ -1406,15 +1837,16 @@ data:
}, },
"id": 3, "id": 3,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1510,15 +1942,16 @@ data:
}, },
"id": 4, "id": 4,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1614,15 +2047,16 @@ data:
}, },
"id": 5, "id": 5,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1725,6 +2159,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1837,6 +2272,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1860,7 +2296,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"POST\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"POST\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -1934,15 +2370,16 @@ data:
}, },
"id": 8, "id": 8,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -1964,7 +2401,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -2045,6 +2482,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2136,6 +2574,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2227,6 +2666,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -2316,8 +2756,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -2387,7 +2827,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Controller Manager", "title": "Kubernetes / Controller Manager",
"uid": "72e0e05bef5099e5f049b05fdc429ed4", "uid": "72e0e05bef5099e5f049b05fdc429ed4",
"version": 0 "version": 0
@ -2413,7 +2853,7 @@ data:
"links": [ "links": [
], ],
"refresh": "", "refresh": "10s",
"rows": [ "rows": [
{ {
"collapse": false, "collapse": false,
@ -2440,6 +2880,7 @@ data:
"min": true, "min": true,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -2635,6 +3076,7 @@ data:
"min": true, "min": true,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": true "values": true
}, },
@ -2815,8 +3257,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -2938,669 +3380,11 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Persistent Volumes", "title": "Kubernetes / Persistent Volumes",
"uid": "919b92a8e8041bd567af9edab12c840c", "uid": "919b92a8e8041bd567af9edab12c840c",
"version": 0 "version": 0
} }
pods.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "$datasource",
"enable": true,
"expr": "time() == BOOL timestamp(rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[2m]) > 0)",
"hide": false,
"iconColor": "rgba(215, 44, 44, 1)",
"name": "Restarts",
"showIn": 0,
"tags": [
"restart"
],
"type": "rows"
}
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(container) (container_memory_usage_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container=~\"$container\", container!=\"POD\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Current: {{ container }}",
"refId": "A"
},
{
"expr": "sum by(container) (kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Requested: {{ container }}",
"refId": "B"
},
{
"expr": "sum by(container) (kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Limit: {{ container }}",
"refId": "C"
},
{
"expr": "sum by(container) (container_memory_cache{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\", container=~\"$container\", container!=\"POD\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Cache: {{ container }}",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (container) (irate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", image!=\"\", pod=\"$pod\", container=~\"$container\", container!=\"POD\"}[4m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Current: {{ container }}",
"refId": "A"
},
{
"expr": "sum by(container) (kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Requested: {{ container }}",
"refId": "B"
},
{
"expr": "sum by(container) (kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Limit: {{ container }}",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum by (pod) (irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[4m])))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "RX: {{ pod }}",
"refId": "A"
},
{
"expr": "sort_desc(sum by (pod) (irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[4m])))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "TX: {{ pod }}",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network I/O",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 5,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by (container) (kube_pod_container_status_restarts_total{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Restarts: {{ container }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Total Restarts Per Container",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Pod",
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=~\"$namespace\"}, pod)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "Container",
"multi": false,
"name": "container",
"options": [
],
"query": "label_values(kube_pod_container_info{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}, container)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Pods",
"uid": "ab4f13a9892a76a4d21ce8c2445bf4ea",
"version": 0
}
scheduler.json: |- scheduler.json: |-
{ {
"__inputs": [ "__inputs": [
@ -3622,7 +3406,7 @@ data:
"links": [ "links": [
], ],
"refresh": "", "refresh": "10s",
"rows": [ "rows": [
{ {
"collapse": false, "collapse": false,
@ -3726,15 +3510,16 @@ data:
}, },
"id": 3, "id": 3,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -3838,15 +3623,16 @@ data:
}, },
"id": 4, "id": 4,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -3970,6 +3756,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -4082,6 +3869,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -4105,7 +3893,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-scheduler\", instance=~\"$instance\", verb=\"POST\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-scheduler\", instance=~\"$instance\", verb=\"POST\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -4179,15 +3967,16 @@ data:
}, },
"id": 7, "id": 7,
"legend": { "legend": {
"alignAsTable": "true", "alignAsTable": true,
"avg": false, "avg": false,
"current": "true", "current": true,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": "true", "rightSide": true,
"show": "true", "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
@ -4209,7 +3998,7 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-scheduler\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))", "expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job=\"kube-scheduler\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}", "legendFormat": "{{verb}} {{url}}",
@ -4290,6 +4079,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -4381,6 +4171,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -4472,6 +4263,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -4561,8 +4353,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -4632,7 +4424,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / Scheduler", "title": "Kubernetes / Scheduler",
"uid": "2e6b6a3b4bddf1427b3a55aa1311c656", "uid": "2e6b6a3b4bddf1427b3a55aa1311c656",
"version": 0 "version": 0
@ -5297,6 +5089,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -5413,8 +5206,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -5536,7 +5329,7 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "", "timezone": "UTC",
"title": "Kubernetes / StatefulSets", "title": "Kubernetes / StatefulSets",
"uid": "a31c1f46e6f727cb37c0d731a7245005", "uid": "a31c1f46e6f727cb37c0d731a7245005",
"version": 0 "version": 0

View File

@ -308,6 +308,7 @@ data:
"min": false, "min": false,
"rightSide": "true", "rightSide": "true",
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -399,6 +400,7 @@ data:
"min": false, "min": false,
"rightSide": "true", "rightSide": "true",
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -503,6 +505,7 @@ data:
"min": false, "min": false,
"rightSide": "true", "rightSide": "true",
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -621,6 +624,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -719,6 +723,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },
@ -810,6 +815,7 @@ data:
"min": false, "min": false,
"rightSide": false, "rightSide": false,
"show": "true", "show": "true",
"sideWidth": null,
"total": false, "total": false,
"values": "true" "values": "true"
}, },

View File

@ -0,0 +1,975 @@
apiVersion: v1
data:
nodes.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(\n (1 - rate(node_cpu_seconds_total{job=\"node-exporter\", mode=\"idle\", instance=\"$instance\"}[$__interval]))\n/ ignoring(cpu) group_left\n count without (cpu)( node_cpu_seconds_total{job=\"node-exporter\", mode=\"idle\", instance=\"$instance\"})\n)\n",
"format": "time_series",
"interval": "1m",
"intervalFactor": 5,
"legendFormat": "{{cpu}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node_load1{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "1m load average",
"refId": "A"
},
{
"expr": "node_load5{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5m load average",
"refId": "B"
},
{
"expr": "node_load15{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "15m load average",
"refId": "C"
},
{
"expr": "count(node_cpu_seconds_total{job=\"node-exporter\", instance=\"$instance\", mode=\"idle\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "logical cores",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Load Average",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(\n node_memory_MemTotal_bytes{job=\"node-exporter\", instance=\"$instance\"}\n-\n node_memory_MemFree_bytes{job=\"node-exporter\", instance=\"$instance\"}\n-\n node_memory_Buffers_bytes{job=\"node-exporter\", instance=\"$instance\"}\n-\n node_memory_Cached_bytes{job=\"node-exporter\", instance=\"$instance\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory used",
"refId": "A"
},
{
"expr": "node_memory_Buffers_bytes{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory buffers",
"refId": "B"
},
{
"expr": "node_memory_Cached_bytes{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory cached",
"refId": "C"
},
{
"expr": "node_memory_MemFree_bytes{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory free",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "100 -\n(\n node_memory_MemAvailable_bytes{job=\"node-exporter\", instance=\"$instance\"}\n/\n node_memory_MemTotal_bytes{job=\"node-exporter\", instance=\"$instance\"}\n* 100\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "Memory Usage",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"gridPos": {
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "/ read| written/",
"yaxis": 1
},
{
"alias": "/ io time/",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_disk_read_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!~\"dm.*\"}[$__interval])",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "{{device}} read",
"refId": "A"
},
{
"expr": "rate(node_disk_written_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!~\"dm.*\"}[$__interval])",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "{{device}} written",
"refId": "B"
},
{
"expr": "rate(node_disk_io_time_seconds_total{job=\"node-exporter\", instance=\"$instance\", device!~\"dm.*\"}[$__interval])",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "{{device}} io time",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk I/O",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 7,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "used",
"color": "#E0B400"
},
{
"alias": "available",
"color": "#73BF69"
}
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n max by (device) (\n node_filesystem_size_bytes{job=\"node-exporter\", instance=\"$instance\", fstype!~\"tmpfs|nsfs|vfat\"}\n -\n node_filesystem_avail_bytes{job=\"node-exporter\", instance=\"$instance\", fstype!~\"tmpfs|nsfs|vfat\"}\n )\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "used",
"refId": "A"
},
{
"expr": "sum(\n max by (device) (\n node_filesystem_avail_bytes{job=\"node-exporter\", instance=\"$instance\", fstype!~\"tmpfs|nsfs|vfat\"}\n )\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "available",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Space Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_receive_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!=\"lo\"}[$__interval])",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Received",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_transmit_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!=\"lo\"}[$__interval])",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Transmitted",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(node_exporter_build_info{job=\"node-exporter\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Nodes",
"uid": "fa49a4706d07a042595b664c87fb33ea",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-node-exporter
namespace: monitoring

View File

@ -2,6 +2,12 @@ apiVersion: v1
data: data:
prometheus-remote-write.json: |- prometheus-remote-write.json: |-
{ {
"__inputs": [
],
"__requires": [
],
"annotations": { "annotations": {
"list": [ "list": [
@ -11,14 +17,15 @@ data:
"gnetId": null, "gnetId": null,
"graphTooltip": 0, "graphTooltip": 0,
"hideControls": false, "hideControls": false,
"id": null,
"links": [ "links": [
], ],
"refresh": "10s", "refresh": "",
"rows": [ "rows": [
{ {
"collapse": false, "collapse": false,
"height": "250px", "collapsed": false,
"panels": [ "panels": [
{ {
"aliasColors": { "aliasColors": {
@ -29,13 +36,19 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 1, "gridPos": {
},
"id": 2,
"legend": { "legend": {
"alignAsTable": false,
"avg": false, "avg": false,
"current": false, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -44,11 +57,12 @@ data:
"links": [ "links": [
], ],
"nullPointMode": "null as zero", "nullPointMode": "null",
"percentage": false, "percentage": false,
"pointradius": 5, "pointradius": 5,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"repeat": null,
"seriesOverrides": [ "seriesOverrides": [
], ],
@ -58,12 +72,11 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"} - ignoring(queue) group_right(instance) prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}", "expr": "(\n prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"} \n- \n ignoring(remote_name, url) group_right(instance) prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}\n)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"legendLink": null, "refId": "A"
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -89,11 +102,11 @@ data:
}, },
"yaxes": [ "yaxes": [
{ {
"format": "s", "format": "short",
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": 0, "min": null,
"show": true "show": true
}, },
{ {
@ -102,7 +115,7 @@ data:
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": null, "min": null,
"show": false "show": true
} }
] ]
}, },
@ -115,13 +128,19 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 2, "gridPos": {
},
"id": 3,
"legend": { "legend": {
"alignAsTable": false,
"avg": false, "avg": false,
"current": false, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -130,11 +149,12 @@ data:
"links": [ "links": [
], ],
"nullPointMode": "null as zero", "nullPointMode": "null",
"percentage": false, "percentage": false,
"pointradius": 5, "pointradius": 5,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"repeat": null,
"seriesOverrides": [ "seriesOverrides": [
], ],
@ -144,12 +164,11 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "rate(prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) - ignoring (queue) group_right(instance) rate(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])", "expr": "(\n rate(prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) \n- \n ignoring (remote_name, url) group_right(instance) rate(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n)\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"legendLink": null, "refId": "A"
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -179,7 +198,7 @@ data:
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": 0, "min": null,
"show": true "show": true
}, },
{ {
@ -188,7 +207,7 @@ data:
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": null, "min": null,
"show": false "show": true
} }
] ]
} }
@ -198,11 +217,12 @@ data:
"repeatRowId": null, "repeatRowId": null,
"showTitle": true, "showTitle": true,
"title": "Timestamps", "title": "Timestamps",
"titleSize": "h6" "titleSize": "h6",
"type": "row"
}, },
{ {
"collapse": false, "collapse": false,
"height": "250px", "collapsed": false,
"panels": [ "panels": [
{ {
"aliasColors": { "aliasColors": {
@ -213,13 +233,19 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 3, "gridPos": {
},
"id": 4,
"legend": { "legend": {
"alignAsTable": false,
"avg": false, "avg": false,
"current": false, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -228,11 +254,12 @@ data:
"links": [ "links": [
], ],
"nullPointMode": "null as zero", "nullPointMode": "null",
"percentage": false, "percentage": false,
"pointradius": 5, "pointradius": 5,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"repeat": null,
"seriesOverrides": [ "seriesOverrides": [
], ],
@ -242,12 +269,11 @@ data:
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "rate(prometheus_remote_storage_samples_in_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])- ignoring(queue) group_right(instance) rate(prometheus_remote_storage_succeeded_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) - rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])", "expr": "rate(\n prometheus_remote_storage_samples_in_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n- \n ignoring(remote_name, url) group_right(instance) rate(prometheus_remote_storage_succeeded_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n- \n rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"legendLink": null, "refId": "A"
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -277,7 +303,7 @@ data:
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": 0, "min": null,
"show": true "show": true
}, },
{ {
@ -286,7 +312,7 @@ data:
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": null, "min": null,
"show": false "show": true
} }
] ]
} }
@ -296,11 +322,12 @@ data:
"repeatRowId": null, "repeatRowId": null,
"showTitle": true, "showTitle": true,
"title": "Samples", "title": "Samples",
"titleSize": "h6" "titleSize": "h6",
"type": "row"
}, },
{ {
"collapse": false, "collapse": false,
"height": "250px", "collapsed": false,
"panels": [ "panels": [
{ {
"aliasColors": { "aliasColors": {
@ -311,13 +338,19 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 4, "gridPos": {
},
"id": 5,
"legend": { "legend": {
"alignAsTable": false,
"avg": false, "avg": false,
"current": false, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -326,16 +359,18 @@ data:
"links": [ "links": [
], ],
"nullPointMode": "null as zero", "minSpan": 6,
"nullPointMode": "null",
"percentage": false, "percentage": false,
"pointradius": 5, "pointradius": 5,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"repeat": null,
"seriesOverrides": [ "seriesOverrides": [
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 6, "span": 12,
"stack": false, "stack": false,
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
@ -343,9 +378,8 @@ data:
"expr": "prometheus_remote_storage_shards{cluster=~\"$cluster\", instance=~\"$instance\"}", "expr": "prometheus_remote_storage_shards{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"legendLink": null, "refId": "A"
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -353,7 +387,7 @@ data:
], ],
"timeFrom": null, "timeFrom": null,
"timeShift": null, "timeShift": null,
"title": "Num. Shards", "title": "Current Shards",
"tooltip": { "tooltip": {
"shared": true, "shared": true,
"sort": 0, "sort": 0,
@ -375,7 +409,7 @@ data:
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": 0, "min": null,
"show": true "show": true
}, },
{ {
@ -384,7 +418,7 @@ data:
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": null, "min": null,
"show": false "show": true
} }
] ]
}, },
@ -397,13 +431,19 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 5, "gridPos": {
},
"id": 6,
"legend": { "legend": {
"alignAsTable": false,
"avg": false, "avg": false,
"current": false, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -412,26 +452,26 @@ data:
"links": [ "links": [
], ],
"nullPointMode": "null as zero", "nullPointMode": "null",
"percentage": false, "percentage": false,
"pointradius": 5, "pointradius": 5,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"repeat": null,
"seriesOverrides": [ "seriesOverrides": [
], ],
"spaceLength": 10, "spaceLength": 10,
"span": 6, "span": 4,
"stack": false, "stack": false,
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "prometheus_remote_storage_shard_capacity{cluster=~\"$cluster\", instance=~\"$instance\"}", "expr": "prometheus_remote_storage_shards_max{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"legendLink": null, "refId": "A"
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -439,7 +479,7 @@ data:
], ],
"timeFrom": null, "timeFrom": null,
"timeShift": null, "timeShift": null,
"title": "Capacity", "title": "Max Shards",
"tooltip": { "tooltip": {
"shared": true, "shared": true,
"sort": 0, "sort": 0,
@ -461,7 +501,7 @@ data:
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": 0, "min": null,
"show": true "show": true
}, },
{ {
@ -470,7 +510,191 @@ data:
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": null, "min": null,
"show": false "show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 7,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_shards_min{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Min Shards",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_shards_desired{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Desired Shards",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
} }
] ]
} }
@ -480,11 +704,12 @@ data:
"repeatRowId": null, "repeatRowId": null,
"showTitle": true, "showTitle": true,
"title": "Shards", "title": "Shards",
"titleSize": "h6" "titleSize": "h6",
"type": "row"
}, },
{ {
"collapse": false, "collapse": false,
"height": "250px", "collapsed": false,
"panels": [ "panels": [
{ {
"aliasColors": { "aliasColors": {
@ -495,13 +720,19 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 6, "gridPos": {
},
"id": 9,
"legend": { "legend": {
"alignAsTable": false,
"avg": false, "avg": false,
"current": false, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -510,11 +741,406 @@ data:
"links": [ "links": [
], ],
"nullPointMode": "null as zero", "nullPointMode": "null",
"percentage": false, "percentage": false,
"pointradius": 5, "pointradius": 5,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_shard_capacity{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Shard Capacity",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_pending_samples{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Pending Samples",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Shard Details",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_tsdb_wal_segment_current{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "TSDB Current Segment",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "none",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_wal_watcher_current_segment{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}} {{consumer}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Remote Write Current Segment",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "none",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Segments",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 13,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [ "seriesOverrides": [
], ],
@ -527,9 +1153,8 @@ data:
"expr": "rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])", "expr": "rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"legendLink": null, "refId": "A"
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -559,7 +1184,7 @@ data:
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": 0, "min": null,
"show": true "show": true
}, },
{ {
@ -568,7 +1193,7 @@ data:
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": null, "min": null,
"show": false "show": true
} }
] ]
}, },
@ -581,13 +1206,19 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 7, "gridPos": {
},
"id": 14,
"legend": { "legend": {
"alignAsTable": false,
"avg": false, "avg": false,
"current": false, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -596,11 +1227,12 @@ data:
"links": [ "links": [
], ],
"nullPointMode": "null as zero", "nullPointMode": "null",
"percentage": false, "percentage": false,
"pointradius": 5, "pointradius": 5,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"repeat": null,
"seriesOverrides": [ "seriesOverrides": [
], ],
@ -613,9 +1245,8 @@ data:
"expr": "rate(prometheus_remote_storage_failed_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])", "expr": "rate(prometheus_remote_storage_failed_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"legendLink": null, "refId": "A"
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -645,7 +1276,7 @@ data:
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": 0, "min": null,
"show": true "show": true
}, },
{ {
@ -654,7 +1285,7 @@ data:
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": null, "min": null,
"show": false "show": true
} }
] ]
}, },
@ -667,13 +1298,19 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 8, "gridPos": {
},
"id": 15,
"legend": { "legend": {
"alignAsTable": false,
"avg": false, "avg": false,
"current": false, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -682,11 +1319,12 @@ data:
"links": [ "links": [
], ],
"nullPointMode": "null as zero", "nullPointMode": "null",
"percentage": false, "percentage": false,
"pointradius": 5, "pointradius": 5,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"repeat": null,
"seriesOverrides": [ "seriesOverrides": [
], ],
@ -699,9 +1337,8 @@ data:
"expr": "rate(prometheus_remote_storage_retried_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])", "expr": "rate(prometheus_remote_storage_retried_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"legendLink": null, "refId": "A"
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -731,7 +1368,7 @@ data:
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": 0, "min": null,
"show": true "show": true
}, },
{ {
@ -740,7 +1377,7 @@ data:
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": null, "min": null,
"show": false "show": true
} }
] ]
}, },
@ -753,13 +1390,19 @@ data:
"dashes": false, "dashes": false,
"datasource": "$datasource", "datasource": "$datasource",
"fill": 1, "fill": 1,
"id": 9, "gridPos": {
},
"id": 16,
"legend": { "legend": {
"alignAsTable": false,
"avg": false, "avg": false,
"current": false, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"rightSide": false,
"show": true, "show": true,
"sideWidth": null,
"total": false, "total": false,
"values": false "values": false
}, },
@ -768,11 +1411,12 @@ data:
"links": [ "links": [
], ],
"nullPointMode": "null as zero", "nullPointMode": "null",
"percentage": false, "percentage": false,
"pointradius": 5, "pointradius": 5,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"repeat": null,
"seriesOverrides": [ "seriesOverrides": [
], ],
@ -785,9 +1429,8 @@ data:
"expr": "rate(prometheus_remote_storage_enqueue_retries_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])", "expr": "rate(prometheus_remote_storage_enqueue_retries_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series", "format": "time_series",
"intervalFactor": 2, "intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}", "legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"legendLink": null, "refId": "A"
"step": 10
} }
], ],
"thresholds": [ "thresholds": [
@ -817,7 +1460,7 @@ data:
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": 0, "min": null,
"show": true "show": true
}, },
{ {
@ -826,7 +1469,7 @@ data:
"logBase": 1, "logBase": 1,
"max": null, "max": null,
"min": null, "min": null,
"show": false "show": true
} }
] ]
} }
@ -835,8 +1478,9 @@ data:
"repeatIteration": null, "repeatIteration": null,
"repeatRowId": null, "repeatRowId": null,
"showTitle": true, "showTitle": true,
"title": "Misc Rates.", "title": "Misc. Rates",
"titleSize": "h6" "titleSize": "h6",
"type": "row"
} }
], ],
"schemaVersion": 14, "schemaVersion": 14,
@ -847,10 +1491,6 @@ data:
"templating": { "templating": {
"list": [ "list": [
{ {
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0, "hide": 0,
"label": null, "label": null,
"name": "datasource", "name": "datasource",
@ -865,23 +1505,30 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"selected": true, "text": {
"text": "All", "selected": true,
"value": "$__all" "text": "All",
"value": "$__all"
},
"value": {
"selected": true,
"text": "All",
"value": "$__all"
}
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 0, "hide": 0,
"includeAll": true, "includeAll": true,
"label": "instance", "label": null,
"multi": true, "multi": false,
"name": "instance", "name": "instance",
"options": [ "options": [
], ],
"query": "label_values(prometheus_build_info, instance)", "query": "label_values(prometheus_build_info, instance)",
"refresh": 1, "refresh": 2,
"regex": "", "regex": "",
"sort": 2, "sort": 0,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
@ -893,23 +1540,56 @@ data:
{ {
"allValue": null, "allValue": null,
"current": { "current": {
"selected": true, "text": {
"text": "All", "selected": true,
"value": "$__all" "text": "All",
"value": "$__all"
},
"value": {
"selected": true,
"text": "All",
"value": "$__all"
}
}, },
"datasource": "$datasource", "datasource": "$datasource",
"hide": 0, "hide": 0,
"includeAll": true, "includeAll": true,
"label": "cluster", "label": null,
"multi": true, "multi": false,
"name": "cluster", "name": "cluster",
"options": [ "options": [
], ],
"query": "label_values(kube_pod_container_info{image=~\".*prometheus.*\"}, cluster)", "query": "label_values(kube_pod_container_info{image=~\".*prometheus.*\"}, cluster)",
"refresh": 1, "refresh": 2,
"regex": "", "regex": "",
"sort": 2, "sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": null,
"multi": false,
"name": "url",
"options": [
],
"query": "label_values(prometheus_remote_storage_shards{cluster=~\"$cluster\", instance=~\"$instance\"}, url)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "", "tagValuesQuery": "",
"tags": [ "tags": [
@ -921,7 +1601,7 @@ data:
] ]
}, },
"time": { "time": {
"from": "now-1h", "from": "now-6h",
"to": "now" "to": "now"
}, },
"timepicker": { "timepicker": {
@ -949,9 +1629,8 @@ data:
"30d" "30d"
] ]
}, },
"timezone": "utc", "timezone": "browser",
"title": "Prometheus Remote Write", "title": "Prometheus Remote Write",
"uid": "",
"version": 0 "version": 0
} }
prometheus.json: |- prometheus.json: |-
@ -1026,6 +1705,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #A", "pattern": "Value #A",
@ -1044,6 +1724,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "Value #B", "pattern": "Value #B",
@ -1062,6 +1743,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "instance", "pattern": "instance",
@ -1080,6 +1762,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "job", "pattern": "job",
@ -1098,6 +1781,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss", "dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2, "decimals": 2,
"link": false, "link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down", "linkTooltip": "Drill down",
"linkUrl": "", "linkUrl": "",
"pattern": "version", "pattern": "version",
@ -2048,8 +2732,8 @@ data:
"list": [ "list": [
{ {
"current": { "current": {
"text": "Prometheus", "text": "default",
"value": "Prometheus" "value": "default"
}, },
"hide": 0, "hide": 0,
"label": null, "label": null,
@ -2150,7 +2834,7 @@ data:
] ]
}, },
"timezone": "utc", "timezone": "utc",
"title": "Prometheus", "title": "Prometheus Overview",
"uid": "", "uid": "",
"version": 0 "version": 0
} }

View File

@ -18,12 +18,13 @@ spec:
labels: labels:
name: grafana name: grafana
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: grafana - name: grafana
image: docker.io/grafana/grafana:6.4.4 image: docker.io/grafana/grafana:7.3.5
env: env:
- name: GF_PATHS_CONFIG - name: GF_PATHS_CONFIG
value: "/etc/grafana/custom.ini" value: "/etc/grafana/custom.ini"
@ -56,14 +57,18 @@ spec:
mountPath: /etc/grafana/provisioning/dashboards mountPath: /etc/grafana/provisioning/dashboards
- name: dashboards-etcd - name: dashboards-etcd
mountPath: /etc/grafana/dashboards/etcd mountPath: /etc/grafana/dashboards/etcd
- name: dashboards-node-exporter
mountPath: /etc/grafana/dashboards/node-exporter
- name: dashboards-prom - name: dashboards-prom
mountPath: /etc/grafana/dashboards/prom mountPath: /etc/grafana/dashboards/prom
- name: dashboards-k8s - name: dashboards-k8s
mountPath: /etc/grafana/dashboards/k8s mountPath: /etc/grafana/dashboards/k8s
- name: dashboards-k8s-nodes - name: dashboards-k8s-nodes
mountPath: /etc/grafana/dashboards/k8s-nodes mountPath: /etc/grafana/dashboards/k8s-nodes
- name: dashboards-k8s-resources - name: dashboards-k8s-resources-1
mountPath: /etc/grafana/dashboards/k8s-resources mountPath: /etc/grafana/dashboards/k8s-resources-1
- name: dashboards-k8s-resources-2
mountPath: /etc/grafana/dashboards/k8s-resources-2
- name: dashboards-coredns - name: dashboards-coredns
mountPath: /etc/grafana/dashboards/coredns mountPath: /etc/grafana/dashboards/coredns
- name: dashboards-nginx-ingress - name: dashboards-nginx-ingress
@ -81,6 +86,9 @@ spec:
- name: dashboards-etcd - name: dashboards-etcd
configMap: configMap:
name: grafana-dashboards-etcd name: grafana-dashboards-etcd
- name: dashboards-node-exporter
configMap:
name: grafana-dashboards-node-exporter
- name: dashboards-prom - name: dashboards-prom
configMap: configMap:
name: grafana-dashboards-prom name: grafana-dashboards-prom
@ -90,9 +98,12 @@ spec:
- name: dashboards-k8s-nodes - name: dashboards-k8s-nodes
configMap: configMap:
name: grafana-dashboards-k8s-nodes name: grafana-dashboards-k8s-nodes
- name: dashboards-k8s-resources - name: dashboards-k8s-resources-1
configMap: configMap:
name: grafana-dashboards-k8s-resources name: grafana-dashboards-k8s-resources-1
- name: dashboards-k8s-resources-2
configMap:
name: grafana-dashboards-k8s-resources-2
- name: dashboards-coredns - name: dashboards-coredns
configMap: configMap:
name: grafana-dashboards-coredns name: grafana-dashboards-coredns

View File

@ -0,0 +1,6 @@
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: public
spec:
controller: k8s.io/ingress-nginx

View File

@ -17,12 +17,13 @@ spec:
labels: labels:
name: nginx-ingress-controller name: nginx-ingress-controller
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1 image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --ingress-class=public - --ingress-class=public
@ -47,7 +48,6 @@ spec:
containerPort: 10254 containerPort: 10254
hostPort: 10254 hostPort: 10254
livenessProbe: livenessProbe:
failureThreshold: 3
httpGet: httpGet:
path: /healthz path: /healthz
port: 10254 port: 10254
@ -55,15 +55,16 @@ spec:
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5 timeoutSeconds: 5
readinessProbe: readinessProbe:
failureThreshold: 3
httpGet: httpGet:
path: /healthz path: /healthz
port: 10254 port: 10254
scheme: HTTP scheme: HTTP
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5 timeoutSeconds: 5
lifecycle: lifecycle:
preStop: preStop:
@ -76,6 +77,6 @@ spec:
- NET_BIND_SERVICE - NET_BIND_SERVICE
drop: drop:
- ALL - ALL
runAsUser: 33 # www-data runAsUser: 101 # www-data
restartPolicy: Always restartPolicy: Always
terminationGracePeriodSeconds: 300 terminationGracePeriodSeconds: 300

View File

@ -51,3 +51,12 @@ rules:
- ingresses/status - ingresses/status
verbs: verbs:
- update - update
- apiGroups:
- "networking.k8s.io"
resources:
- ingressclasses
verbs:
- get
- list
- watch

View File

@ -0,0 +1,6 @@
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: public
spec:
controller: k8s.io/ingress-nginx

View File

@ -17,12 +17,13 @@ spec:
labels: labels:
name: nginx-ingress-controller name: nginx-ingress-controller
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1 image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --ingress-class=public - --ingress-class=public
@ -47,7 +48,6 @@ spec:
containerPort: 10254 containerPort: 10254
hostPort: 10254 hostPort: 10254
livenessProbe: livenessProbe:
failureThreshold: 3
httpGet: httpGet:
path: /healthz path: /healthz
port: 10254 port: 10254
@ -55,15 +55,16 @@ spec:
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5 timeoutSeconds: 5
readinessProbe: readinessProbe:
failureThreshold: 3
httpGet: httpGet:
path: /healthz path: /healthz
port: 10254 port: 10254
scheme: HTTP scheme: HTTP
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5 timeoutSeconds: 5
lifecycle: lifecycle:
preStop: preStop:
@ -76,6 +77,6 @@ spec:
- NET_BIND_SERVICE - NET_BIND_SERVICE
drop: drop:
- ALL - ALL
runAsUser: 33 # www-data runAsUser: 101 # www-data
restartPolicy: Always restartPolicy: Always
terminationGracePeriodSeconds: 300 terminationGracePeriodSeconds: 300

View File

@ -51,3 +51,12 @@ rules:
- ingresses/status - ingresses/status
verbs: verbs:
- update - update
- apiGroups:
- "networking.k8s.io"
resources:
- ingressclasses
verbs:
- get
- list
- watch

View File

@ -0,0 +1,6 @@
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: public
spec:
controller: k8s.io/ingress-nginx

View File

@ -1,7 +1,7 @@
apiVersion: apps/v1 apiVersion: apps/v1
kind: Deployment kind: Deployment
metadata: metadata:
name: ingress-controller-public name: nginx-ingress-controller
namespace: ingress namespace: ingress
spec: spec:
replicas: 2 replicas: 2
@ -10,19 +10,20 @@ spec:
maxUnavailable: 1 maxUnavailable: 1
selector: selector:
matchLabels: matchLabels:
name: ingress-controller-public name: nginx-ingress-controller
phase: prod phase: prod
template: template:
metadata: metadata:
labels: labels:
name: ingress-controller-public name: nginx-ingress-controller
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1 image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --ingress-class=public - --ingress-class=public
@ -73,7 +74,6 @@ spec:
- NET_BIND_SERVICE - NET_BIND_SERVICE
drop: drop:
- ALL - ALL
runAsUser: 33 # www-data runAsUser: 101 # www-data
restartPolicy: Always restartPolicy: Always
terminationGracePeriodSeconds: 300 terminationGracePeriodSeconds: 300

View File

@ -51,3 +51,12 @@ rules:
- ingresses/status - ingresses/status
verbs: verbs:
- update - update
- apiGroups:
- "networking.k8s.io"
resources:
- ingressclasses
verbs:
- get
- list
- watch

View File

@ -0,0 +1,6 @@
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: public
spec:
controller: k8s.io/ingress-nginx

View File

@ -17,12 +17,13 @@ spec:
labels: labels:
name: nginx-ingress-controller name: nginx-ingress-controller
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1 image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --ingress-class=public - --ingress-class=public
@ -47,7 +48,6 @@ spec:
containerPort: 10254 containerPort: 10254
hostPort: 10254 hostPort: 10254
livenessProbe: livenessProbe:
failureThreshold: 3
httpGet: httpGet:
path: /healthz path: /healthz
port: 10254 port: 10254
@ -55,15 +55,16 @@ spec:
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5 timeoutSeconds: 5
readinessProbe: readinessProbe:
failureThreshold: 3
httpGet: httpGet:
path: /healthz path: /healthz
port: 10254 port: 10254
scheme: HTTP scheme: HTTP
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5 timeoutSeconds: 5
lifecycle: lifecycle:
preStop: preStop:
@ -76,6 +77,6 @@ spec:
- NET_BIND_SERVICE - NET_BIND_SERVICE
drop: drop:
- ALL - ALL
runAsUser: 33 # www-data runAsUser: 101 # www-data
restartPolicy: Always restartPolicy: Always
terminationGracePeriodSeconds: 300 terminationGracePeriodSeconds: 300

View File

@ -51,3 +51,12 @@ rules:
- ingresses/status - ingresses/status
verbs: verbs:
- update - update
- apiGroups:
- "networking.k8s.io"
resources:
- ingressclasses
verbs:
- get
- list
- watch

View File

@ -0,0 +1,6 @@
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: public
spec:
controller: k8s.io/ingress-nginx

View File

@ -17,12 +17,13 @@ spec:
labels: labels:
name: nginx-ingress-controller name: nginx-ingress-controller
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1 image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --ingress-class=public - --ingress-class=public
@ -47,7 +48,6 @@ spec:
containerPort: 10254 containerPort: 10254
hostPort: 10254 hostPort: 10254
livenessProbe: livenessProbe:
failureThreshold: 3
httpGet: httpGet:
path: /healthz path: /healthz
port: 10254 port: 10254
@ -55,15 +55,16 @@ spec:
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5 timeoutSeconds: 5
readinessProbe: readinessProbe:
failureThreshold: 3
httpGet: httpGet:
path: /healthz path: /healthz
port: 10254 port: 10254
scheme: HTTP scheme: HTTP
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5 timeoutSeconds: 5
lifecycle: lifecycle:
preStop: preStop:
@ -76,6 +77,6 @@ spec:
- NET_BIND_SERVICE - NET_BIND_SERVICE
drop: drop:
- ALL - ALL
runAsUser: 33 # www-data runAsUser: 101 # www-data
restartPolicy: Always restartPolicy: Always
terminationGracePeriodSeconds: 300 terminationGracePeriodSeconds: 300

View File

@ -51,3 +51,12 @@ rules:
- ingresses/status - ingresses/status
verbs: verbs:
- update - update
- apiGroups:
- "networking.k8s.io"
resources:
- ingressclasses
verbs:
- get
- list
- watch

View File

@ -34,7 +34,7 @@ data:
- job_name: 'kubernetes-apiservers' - job_name: 'kubernetes-apiservers'
kubernetes_sd_configs: kubernetes_sd_configs:
- role: endpoints - role: endpoints
scheme: https scheme: https
tls_config: tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
@ -65,13 +65,19 @@ data:
- source_labels: [__name__] - source_labels: [__name__]
action: drop action: drop
regex: apiserver_admission_step_admission_latencies_seconds_.* regex: apiserver_admission_step_admission_latencies_seconds_.*
- source_labels: [__name__, group]
regex: apiserver_request_duration_seconds_bucket;.+
action: drop
- source_labels: [__name__, group]
regex: apiserver_request_duration_seconds_count;.+
action: drop
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore # Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
# metrics from a node by scraping kubelet (127.0.0.1:10250/metrics). # metrics from a node by scraping kubelet (127.0.0.1:10250/metrics).
- job_name: 'kubelet' - job_name: 'kubelet'
kubernetes_sd_configs: kubernetes_sd_configs:
- role: node - role: node
scheme: https scheme: https
tls_config: tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
@ -79,10 +85,6 @@ data:
insecure_skip_verify: true insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
# Scrape config for Kubelet cAdvisor. Explore metrics from a node by # Scrape config for Kubelet cAdvisor. Explore metrics from a node by
# scraping kubelet (127.0.0.1:10250/metrics/cadvisor). # scraping kubelet (127.0.0.1:10250/metrics/cadvisor).
- job_name: 'kubernetes-cadvisor' - job_name: 'kubernetes-cadvisor'
@ -97,9 +99,6 @@ data:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
metric_relabel_configs: metric_relabel_configs:
- source_labels: [__name__, image] - source_labels: [__name__, image]
action: drop action: drop
@ -115,16 +114,14 @@ data:
- role: node - role: node
scheme: http scheme: http
relabel_configs: relabel_configs:
- source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_controller] - source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_controller]
action: keep action: keep
regex: 'true' regex: 'true'
- action: labelmap - source_labels: [__meta_kubernetes_node_address_InternalIP]
regex: __meta_kubernetes_node_label_(.+) action: replace
- source_labels: [__meta_kubernetes_node_address_InternalIP] target_label: __address__
action: replace replacement: '${1}:2381'
target_label: __address__
replacement: '${1}:2381'
# Scrape config for service endpoints. # Scrape config for service endpoints.
# #
# The relabeling allows the actual service scrape endpoint to be configured # The relabeling allows the actual service scrape endpoint to be configured
@ -169,7 +166,7 @@ data:
- source_labels: [__meta_kubernetes_service_name] - source_labels: [__meta_kubernetes_service_name]
action: replace action: replace
target_label: job target_label: job
metric_relabel_configs: metric_relabel_configs:
- source_labels: [__name__] - source_labels: [__name__]
action: drop action: drop

View File

@ -14,13 +14,14 @@ spec:
labels: labels:
name: prometheus name: prometheus
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
serviceAccountName: prometheus serviceAccountName: prometheus
containers: containers:
- name: prometheus - name: prometheus
image: quay.io/prometheus/prometheus:v2.14.0 image: quay.io/prometheus/prometheus:v2.23.0
args: args:
- --web.listen-address=0.0.0.0:9090 - --web.listen-address=0.0.0.0:9090
- --config.file=/etc/prometheus/prometheus.yaml - --config.file=/etc/prometheus/prometheus.yaml

View File

@ -1,3 +1,4 @@
# Allow Prometheus to scrape service endpoints
apiVersion: v1 apiVersion: v1
kind: Service kind: Service
metadata: metadata:
@ -7,7 +8,6 @@ metadata:
prometheus.io/scrape: 'true' prometheus.io/scrape: 'true'
spec: spec:
type: ClusterIP type: ClusterIP
# service is created to allow prometheus to scrape endpoints
clusterIP: None clusterIP: None
selector: selector:
k8s-app: kube-controller-manager k8s-app: kube-controller-manager

View File

@ -0,0 +1,19 @@
# Allow Prometheus to scrape service endpoints
apiVersion: v1
kind: Service
metadata:
name: kube-proxy
namespace: kube-system
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '10249'
spec:
type: ClusterIP
clusterIP: None
selector:
k8s-app: kube-proxy
ports:
- name: metrics
protocol: TCP
port: 10249
targetPort: 10249

View File

@ -1,3 +1,4 @@
# Allow Prometheus to scrape service endpoints
apiVersion: v1 apiVersion: v1
kind: Service kind: Service
metadata: metadata:
@ -7,7 +8,6 @@ metadata:
prometheus.io/scrape: 'true' prometheus.io/scrape: 'true'
spec: spec:
type: ClusterIP type: ClusterIP
# service is created to allow prometheus to scrape endpoints
clusterIP: None clusterIP: None
selector: selector:
k8s-app: kube-scheduler k8s-app: kube-scheduler

View File

@ -74,13 +74,30 @@ rules:
- storage.k8s.io - storage.k8s.io
resources: resources:
- storageclasses - storageclasses
- volumeattachments
verbs: verbs:
- list - list
- watch - watch
- apiGroups: - apiGroups:
- autoscaling.k8s.io - admissionregistration.k8s.io
resources: resources:
- verticalpodautoscalers - mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
- ingresses
verbs:
- list
- watch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs: verbs:
- list - list
- watch - watch

View File

@ -18,16 +18,19 @@ spec:
labels: labels:
name: kube-state-metrics name: kube-state-metrics
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
securityContext:
seccompProfile:
type: RuntimeDefault
serviceAccountName: kube-state-metrics serviceAccountName: kube-state-metrics
containers: containers:
- name: kube-state-metrics - name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.8.0 image: quay.io/coreos/kube-state-metrics:v2.0.0-alpha.3
ports: ports:
- name: metrics - name: metrics
containerPort: 8080 containerPort: 8080
- name: telemetry
containerPort: 8081
livenessProbe: livenessProbe:
httpGet: httpGet:
path: /healthz path: /healthz
@ -40,3 +43,5 @@ spec:
port: 8081 port: 8081
initialDelaySeconds: 5 initialDelaySeconds: 5
timeoutSeconds: 5 timeoutSeconds: 5
securityContext:
runAsUser: 65534

View File

@ -17,18 +17,18 @@ spec:
labels: labels:
name: node-exporter name: node-exporter
phase: prod phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
serviceAccountName: node-exporter serviceAccountName: node-exporter
securityContext: securityContext:
runAsNonRoot: true runAsNonRoot: true
runAsUser: 65534 runAsUser: 65534
seccompProfile:
type: RuntimeDefault
hostNetwork: true hostNetwork: true
hostPID: true hostPID: true
containers: containers:
- name: node-exporter - name: node-exporter
image: quay.io/prometheus/node-exporter:v0.18.1 image: quay.io/prometheus/node-exporter:v1.0.1
args: args:
- --path.procfs=/host/proc - --path.procfs=/host/proc
- --path.sysfs=/host/sys - --path.sysfs=/host/sys
@ -57,7 +57,9 @@ spec:
mountPath: /host/root mountPath: /host/root
readOnly: true readOnly: true
tolerations: tolerations:
- effect: NoSchedule - key: node-role.kubernetes.io/master
operator: Exists
- key: node.kubernetes.io/not-ready
operator: Exists operator: Exists
volumes: volumes:
- name: proc - name: proc

View File

@ -11,8 +11,8 @@ data:
"annotations": { "annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": members are down ({{ $value }})." "message": "etcd cluster \"{{ $labels.job }}\": members are down ({{ $value }})."
}, },
"expr": "max by (job) (\n sum by (job) (up{job=~\".*etcd.*\"} == bool 0)\nor\n count by (job,endpoint) (\n sum by (job,endpoint,To) (rate(etcd_network_peer_sent_failures_total{job=~\".*etcd.*\"}[3m])) > 0.01\n )\n)\n> 0\n", "expr": "max without (endpoint) (\n sum without (instance) (up{job=~\".*etcd.*\"} == bool 0)\nor\n count without (To) (\n sum without (instance) (rate(etcd_network_peer_sent_failures_total{job=~\".*etcd.*\"}[120s])) > 0.01\n )\n)\n> 0\n",
"for": "3m", "for": "10m",
"labels": { "labels": {
"severity": "critical" "severity": "critical"
} }
@ -22,7 +22,7 @@ data:
"annotations": { "annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": insufficient members ({{ $value }})." "message": "etcd cluster \"{{ $labels.job }}\": insufficient members ({{ $value }})."
}, },
"expr": "sum(up{job=~\".*etcd.*\"} == bool 1) by (job) < ((count(up{job=~\".*etcd.*\"}) by (job) + 1) / 2)\n", "expr": "sum(up{job=~\".*etcd.*\"} == bool 1) without (instance) < ((count(up{job=~\".*etcd.*\"}) without (instance) + 1) / 2)\n",
"for": "3m", "for": "3m",
"labels": { "labels": {
"severity": "critical" "severity": "critical"
@ -42,10 +42,10 @@ data:
{ {
"alert": "etcdHighNumberOfLeaderChanges", "alert": "etcdHighNumberOfLeaderChanges",
"annotations": { "annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": instance {{ $labels.instance }} has seen {{ $value }} leader changes within the last 30 minutes." "message": "etcd cluster \"{{ $labels.job }}\": {{ $value }} leader changes within the last 15 minutes. Frequent elections may be a sign of insufficient resources, high network latency, or disruptions by other components and should be investigated."
}, },
"expr": "rate(etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}[15m]) > 3\n", "expr": "increase((max without (instance) (etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}) or 0*absent(etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}))[15m:1m]) >= 4\n",
"for": "15m", "for": "5m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
@ -55,7 +55,7 @@ data:
"annotations": { "annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": gRPC requests to {{ $labels.grpc_method }} are taking {{ $value }}s on etcd instance {{ $labels.instance }}." "message": "etcd cluster \"{{ $labels.job }}\": gRPC requests to {{ $labels.grpc_method }} are taking {{ $value }}s on etcd instance {{ $labels.instance }}."
}, },
"expr": "histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job=~\".*etcd.*\", grpc_type=\"unary\"}[5m])) by (job, instance, grpc_service, grpc_method, le))\n> 0.15\n", "expr": "histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job=~\".*etcd.*\", grpc_type=\"unary\"}[5m])) without(grpc_type))\n> 0.15\n",
"for": "10m", "for": "10m",
"labels": { "labels": {
"severity": "critical" "severity": "critical"
@ -110,7 +110,7 @@ data:
"annotations": { "annotations": {
"message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}" "message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}"
}, },
"expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nBY (method) > 0.01\n", "expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) without (code) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nwithout (code) > 0.01\n",
"for": "10m", "for": "10m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -121,7 +121,7 @@ data:
"annotations": { "annotations": {
"message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}." "message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}."
}, },
"expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nBY (method) > 0.05\n", "expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) without (code) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nwithout (code) > 0.05\n",
"for": "10m", "for": "10m",
"labels": { "labels": {
"severity": "critical" "severity": "critical"
@ -149,21 +149,153 @@ data:
"name": "kube-apiserver.rules", "name": "kube-apiserver.rules",
"rules": [ "rules": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n", "expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[1d]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[1d]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[1d]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[1d]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[1d]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[1d]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate1d"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[1h]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[1h]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[1h]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[1h]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[1h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[1h]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate1h"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[2h]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[2h]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[2h]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[2h]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[2h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[2h]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate2h"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[30m]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[30m]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[30m]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[30m]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[30m]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[30m]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate30m"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[3d]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[3d]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[3d]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[3d]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate3d"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[5m]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[5m]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[5m]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[5m]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[5m]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[5m]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate5m"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[6h]))\n -\n (\n (\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[6h]))\n or\n vector(0)\n )\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[6h]))\n +\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[6h]))\n )\n )\n +\n # errors\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[6h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[6h]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate6h"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[1d]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[1d]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate1d"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1h]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[1h]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[1h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1h]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate1h"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[2h]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[2h]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[2h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[2h]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate2h"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[30m]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[30m]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[30m]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[30m]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate30m"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[3d]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[3d]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[3d]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[3d]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate3d"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[5m]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[5m]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[5m]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[5m]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate5m"
},
{
"expr": "(\n (\n # too slow\n sum(rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\n -\n sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[6h]))\n )\n +\n sum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[6h]))\n)\n/\nsum(rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate6h"
},
{
"expr": "sum by (code,resource) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[5m]))\n",
"labels": {
"verb": "read"
},
"record": "code_resource:apiserver_request_total:rate5m"
},
{
"expr": "sum by (code,resource) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[5m]))\n",
"labels": {
"verb": "write"
},
"record": "code_resource:apiserver_request_total:rate5m"
},
{
"expr": "histogram_quantile(0.99, sum by (le, resource) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\"}[5m]))) > 0\n",
"labels": {
"quantile": "0.99",
"verb": "read"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.99, sum by (le, resource) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[5m]))) > 0\n",
"labels": {
"quantile": "0.99",
"verb": "write"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "sum(rate(apiserver_request_duration_seconds_sum{subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod)\n/\nsum(rate(apiserver_request_duration_seconds_count{subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod)\n",
"record": "cluster:apiserver_request_duration_seconds:mean5m"
},
{
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod))\n",
"labels": { "labels": {
"quantile": "0.99" "quantile": "0.99"
}, },
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile" "record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
}, },
{ {
"expr": "histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n", "expr": "histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod))\n",
"labels": { "labels": {
"quantile": "0.9" "quantile": "0.9"
}, },
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile" "record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
}, },
{ {
"expr": "histogram_quantile(0.5, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n", "expr": "histogram_quantile(0.5, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod))\n",
"labels": { "labels": {
"quantile": "0.5" "quantile": "0.5"
}, },
@ -171,6 +303,143 @@ data:
} }
] ]
}, },
{
"interval": "3m",
"name": "kube-apiserver-availability.rules",
"rules": [
{
"expr": "1 - (\n (\n # write too slow\n sum(increase(apiserver_request_duration_seconds_count{verb=~\"POST|PUT|PATCH|DELETE\"}[30d]))\n -\n sum(increase(apiserver_request_duration_seconds_bucket{verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[30d]))\n ) +\n (\n # read too slow\n sum(increase(apiserver_request_duration_seconds_count{verb=~\"LIST|GET\"}[30d]))\n -\n (\n (\n sum(increase(apiserver_request_duration_seconds_bucket{verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[30d]))\n or\n vector(0)\n )\n +\n sum(increase(apiserver_request_duration_seconds_bucket{verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[30d]))\n +\n sum(increase(apiserver_request_duration_seconds_bucket{verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[30d]))\n )\n ) +\n # errors\n sum(code:apiserver_request_total:increase30d{code=~\"5..\"} or vector(0))\n)\n/\nsum(code:apiserver_request_total:increase30d)\n",
"labels": {
"verb": "all"
},
"record": "apiserver_request:availability30d"
},
{
"expr": "1 - (\n sum(increase(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\"}[30d]))\n -\n (\n # too slow\n (\n sum(increase(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=~\"resource|\",le=\"0.1\"}[30d]))\n or\n vector(0)\n )\n +\n sum(increase(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"namespace\",le=\"0.5\"}[30d]))\n +\n sum(increase(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",scope=\"cluster\",le=\"5\"}[30d]))\n )\n +\n # errors\n sum(code:apiserver_request_total:increase30d{verb=\"read\",code=~\"5..\"} or vector(0))\n)\n/\nsum(code:apiserver_request_total:increase30d{verb=\"read\"})\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:availability30d"
},
{
"expr": "1 - (\n (\n # too slow\n sum(increase(apiserver_request_duration_seconds_count{verb=~\"POST|PUT|PATCH|DELETE\"}[30d]))\n -\n sum(increase(apiserver_request_duration_seconds_bucket{verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"}[30d]))\n )\n +\n # errors\n sum(code:apiserver_request_total:increase30d{verb=\"write\",code=~\"5..\"} or vector(0))\n)\n/\nsum(code:apiserver_request_total:increase30d{verb=\"write\"})\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:availability30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"LIST\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"GET\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"POST\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PUT\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PATCH\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"DELETE\",code=~\"2..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"LIST\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"GET\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"POST\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PUT\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PATCH\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"DELETE\",code=~\"3..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"LIST\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"GET\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"POST\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PUT\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PATCH\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"DELETE\",code=~\"4..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"LIST\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"GET\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"POST\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PUT\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"PATCH\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=\"DELETE\",code=~\"5..\"}[30d]))\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code) (code_verb:apiserver_request_total:increase30d{verb=~\"LIST|GET\"})\n",
"labels": {
"verb": "read"
},
"record": "code:apiserver_request_total:increase30d"
},
{
"expr": "sum by (code) (code_verb:apiserver_request_total:increase30d{verb=~\"POST|PUT|PATCH|DELETE\"})\n",
"labels": {
"verb": "write"
},
"record": "code:apiserver_request_total:increase30d"
}
]
},
{ {
"name": "k8s.rules", "name": "k8s.rules",
"rules": [ "rules": [
@ -179,23 +448,23 @@ data:
"record": "namespace:container_cpu_usage_seconds_total:sum_rate" "record": "namespace:container_cpu_usage_seconds_total:sum_rate"
}, },
{ {
"expr": "sum by (namespace, pod, container) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])\n) * on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n", "expr": "sum by (cluster, namespace, pod, container) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])\n) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (\n 1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate" "record": "node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate"
}, },
{ {
"expr": "container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n", "expr": "container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_working_set_bytes" "record": "node_namespace_pod_container:container_memory_working_set_bytes"
}, },
{ {
"expr": "container_memory_rss{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n", "expr": "container_memory_rss{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_rss" "record": "node_namespace_pod_container:container_memory_rss"
}, },
{ {
"expr": "container_memory_cache{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n", "expr": "container_memory_cache{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_cache" "record": "node_namespace_pod_container:container_memory_cache"
}, },
{ {
"expr": "container_memory_swap{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n", "expr": "container_memory_swap{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_swap" "record": "node_namespace_pod_container:container_memory_swap"
}, },
{ {
@ -203,33 +472,33 @@ data:
"record": "namespace:container_memory_usage_bytes:sum" "record": "namespace:container_memory_usage_bytes:sum"
}, },
{ {
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~\"Pending|Running\"} == 1)) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n", "expr": "sum by (namespace) (\n sum by (namespace, pod) (\n max by (namespace, pod, container) (\n kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"}\n ) * on(namespace, pod) group_left() max by (namespace, pod) (\n kube_pod_status_phase{phase=~\"Pending|Running\"} == 1\n )\n )\n)\n",
"record": "namespace:kube_pod_container_resource_requests_memory_bytes:sum" "record": "namespace:kube_pod_container_resource_requests_memory_bytes:sum"
}, },
{ {
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~\"Pending|Running\"} == 1)) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n", "expr": "sum by (namespace) (\n sum by (namespace, pod) (\n max by (namespace, pod, container) (\n kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n ) * on(namespace, pod) group_left() max by (namespace, pod) (\n kube_pod_status_phase{phase=~\"Pending|Running\"} == 1\n )\n )\n)\n",
"record": "namespace:kube_pod_container_resource_requests_cpu_cores:sum" "record": "namespace:kube_pod_container_resource_requests_cpu_cores:sum"
}, },
{ {
"expr": "sum(\n label_replace(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"ReplicaSet\"},\n \"replicaset\", \"$1\", \"owner_name\", \"(.*)\"\n ) * on(replicaset, namespace) group_left(owner_name) kube_replicaset_owner{job=\"kube-state-metrics\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n) by (namespace, workload, pod)\n", "expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"ReplicaSet\"},\n \"replicaset\", \"$1\", \"owner_name\", \"(.*)\"\n ) * on(replicaset, namespace) group_left(owner_name) topk by(replicaset, namespace) (\n 1, max by (replicaset, namespace, owner_name) (\n kube_replicaset_owner{job=\"kube-state-metrics\"}\n )\n ),\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": { "labels": {
"workload_type": "deployment" "workload_type": "deployment"
}, },
"record": "mixin_pod_workload" "record": "namespace_workload_pod:kube_pod_owner:relabel"
}, },
{ {
"expr": "sum(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"DaemonSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n) by (namespace, workload, pod)\n", "expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"DaemonSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": { "labels": {
"workload_type": "daemonset" "workload_type": "daemonset"
}, },
"record": "mixin_pod_workload" "record": "namespace_workload_pod:kube_pod_owner:relabel"
}, },
{ {
"expr": "sum(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"StatefulSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n) by (namespace, workload, pod)\n", "expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"StatefulSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": { "labels": {
"workload_type": "statefulset" "workload_type": "statefulset"
}, },
"record": "mixin_pod_workload" "record": "namespace_workload_pod:kube_pod_owner:relabel"
} }
] ]
}, },
@ -305,127 +574,162 @@ data:
"name": "node.rules", "name": "node.rules",
"rules": [ "rules": [
{ {
"expr": "sum(min(kube_pod_info) by (node))", "expr": "sum(min(kube_pod_info{node!=\"\"}) by (cluster, node))\n",
"record": ":kube_pod_info_node_count:" "record": ":kube_pod_info_node_count:"
}, },
{ {
"expr": "max(label_replace(kube_pod_info{job=\"kube-state-metrics\"}, \"pod\", \"$1\", \"pod\", \"(.*)\")) by (node, namespace, pod)\n", "expr": "topk by(namespace, pod) (1,\n max by (node, namespace, pod) (\n label_replace(kube_pod_info{job=\"kube-state-metrics\",node!=\"\"}, \"pod\", \"$1\", \"pod\", \"(.*)\")\n))\n",
"record": "node_namespace_pod:kube_pod_info:" "record": "node_namespace_pod:kube_pod_info:"
}, },
{ {
"expr": "count by (node) (sum by (node, cpu) (\n node_cpu_seconds_total{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n))\n", "expr": "count by (cluster, node) (sum by (node, cpu) (\n node_cpu_seconds_total{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n))\n",
"record": "node:node_num_cpu:sum" "record": "node:node_num_cpu:sum"
}, },
{ {
"expr": "sum(\n node_memory_MemAvailable_bytes{job=\"node-exporter\"} or\n (\n node_memory_Buffers_bytes{job=\"node-exporter\"} +\n node_memory_Cached_bytes{job=\"node-exporter\"} +\n node_memory_MemFree_bytes{job=\"node-exporter\"} +\n node_memory_Slab_bytes{job=\"node-exporter\"}\n )\n)\n", "expr": "sum(\n node_memory_MemAvailable_bytes{job=\"node-exporter\"} or\n (\n node_memory_Buffers_bytes{job=\"node-exporter\"} +\n node_memory_Cached_bytes{job=\"node-exporter\"} +\n node_memory_MemFree_bytes{job=\"node-exporter\"} +\n node_memory_Slab_bytes{job=\"node-exporter\"}\n )\n) by (cluster)\n",
"record": ":node_memory_MemAvailable_bytes:sum" "record": ":node_memory_MemAvailable_bytes:sum"
} }
] ]
}, },
{
"name": "kubelet.rules",
"rules": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (instance, le) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"})\n",
"labels": {
"quantile": "0.99"
},
"record": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.9, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (instance, le) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"})\n",
"labels": {
"quantile": "0.9"
},
"record": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.5, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (instance, le) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"})\n",
"labels": {
"quantile": "0.5"
},
"record": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile"
}
]
},
{ {
"name": "kubernetes-apps", "name": "kubernetes-apps",
"rules": [ "rules": [
{ {
"alert": "KubePodCrashLooping", "alert": "KubePodCrashLooping",
"annotations": { "annotations": {
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf \"%.2f\" $value }} times / 5 minutes.", "description": "Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf \"%.2f\" $value }} times / 5 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping",
"summary": "Pod is crash looping."
}, },
"expr": "rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\"}[15m]) * 60 * 5 > 0\n", "expr": "rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\"}[5m]) * 60 * 5 > 0\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubePodNotReady", "alert": "KubePodNotReady",
"annotations": { "annotations": {
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 15 minutes.", "description": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready",
"summary": "Pod has been in a non-ready state for more than 15 minutes."
}, },
"expr": "sum by (namespace, pod) (kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Failed|Pending|Unknown\"} * on(namespace, pod) group_left(owner_kind) kube_pod_owner{owner_kind!=\"Job\"}) > 0\n", "expr": "sum by (namespace, pod) (\n max by(namespace, pod) (\n kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Pending|Unknown\"}\n ) * on(namespace, pod) group_left(owner_kind) topk by(namespace, pod) (\n 1, max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!=\"Job\"})\n )\n) > 0\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeDeploymentGenerationMismatch", "alert": "KubeDeploymentGenerationMismatch",
"annotations": { "annotations": {
"message": "Deployment generation for {{ $labels.namespace }}/{{ $labels.deployment }} does not match, this indicates that the Deployment has failed but has not been rolled back.", "description": "Deployment generation for {{ $labels.namespace }}/{{ $labels.deployment }} does not match, this indicates that the Deployment has failed but has not been rolled back.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentgenerationmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentgenerationmismatch",
"summary": "Deployment generation mismatch due to possible roll-back"
}, },
"expr": "kube_deployment_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_deployment_metadata_generation{job=\"kube-state-metrics\"}\n", "expr": "kube_deployment_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_deployment_metadata_generation{job=\"kube-state-metrics\"}\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeDeploymentReplicasMismatch", "alert": "KubeDeploymentReplicasMismatch",
"annotations": { "annotations": {
"message": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than 15 minutes.", "description": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch",
"summary": "Deployment has not matched the expected number of replicas."
}, },
"expr": "kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n !=\nkube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n", "expr": "(\n kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n !=\n kube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n) and (\n changes(kube_deployment_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeStatefulSetReplicasMismatch", "alert": "KubeStatefulSetReplicasMismatch",
"annotations": { "annotations": {
"message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 15 minutes.", "description": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetreplicasmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetreplicasmismatch",
"summary": "Deployment has not matched the expected number of replicas."
}, },
"expr": "kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_status_replicas{job=\"kube-state-metrics\"}\n", "expr": "(\n kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas{job=\"kube-state-metrics\"}\n) and (\n changes(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeStatefulSetGenerationMismatch", "alert": "KubeStatefulSetGenerationMismatch",
"annotations": { "annotations": {
"message": "StatefulSet generation for {{ $labels.namespace }}/{{ $labels.statefulset }} does not match, this indicates that the StatefulSet has failed but has not been rolled back.", "description": "StatefulSet generation for {{ $labels.namespace }}/{{ $labels.statefulset }} does not match, this indicates that the StatefulSet has failed but has not been rolled back.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetgenerationmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetgenerationmismatch",
"summary": "StatefulSet generation mismatch due to possible roll-back"
}, },
"expr": "kube_statefulset_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_metadata_generation{job=\"kube-state-metrics\"}\n", "expr": "kube_statefulset_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_metadata_generation{job=\"kube-state-metrics\"}\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeStatefulSetUpdateNotRolledOut", "alert": "KubeStatefulSetUpdateNotRolledOut",
"annotations": { "annotations": {
"message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update has not been rolled out.", "description": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update has not been rolled out.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetupdatenotrolledout" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetupdatenotrolledout",
"summary": "StatefulSet update has not been rolled out."
}, },
"expr": "max without (revision) (\n kube_statefulset_status_current_revision{job=\"kube-state-metrics\"}\n unless\n kube_statefulset_status_update_revision{job=\"kube-state-metrics\"}\n)\n *\n(\n kube_statefulset_replicas{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}\n)\n", "expr": "(\n max without (revision) (\n kube_statefulset_status_current_revision{job=\"kube-state-metrics\"}\n unless\n kube_statefulset_status_update_revision{job=\"kube-state-metrics\"}\n )\n *\n (\n kube_statefulset_replicas{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}\n )\n) and (\n changes(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeDaemonSetRolloutStuck", "alert": "KubeDaemonSetRolloutStuck",
"annotations": { "annotations": {
"message": "Only {{ $value | humanizePercentage }} of the desired Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are scheduled and ready.", "description": "DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has not finished or progressed for at least 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck",
"summary": "DaemonSet rollout is stuck."
}, },
"expr": "kube_daemonset_status_number_ready{job=\"kube-state-metrics\"}\n /\nkube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"} < 1.00\n", "expr": "(\n (\n kube_daemonset_status_current_number_scheduled{job=\"kube-state-metrics\"}\n !=\n kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n ) or (\n kube_daemonset_status_number_misscheduled{job=\"kube-state-metrics\"}\n !=\n 0\n ) or (\n kube_daemonset_updated_number_scheduled{job=\"kube-state-metrics\"}\n !=\n kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n ) or (\n kube_daemonset_status_number_available{job=\"kube-state-metrics\"}\n !=\n kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n )\n) and (\n changes(kube_daemonset_updated_number_scheduled{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeContainerWaiting", "alert": "KubeContainerWaiting",
"annotations": { "annotations": {
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{ $labels.container}} has been in waiting state for longer than 1 hour.", "description": "Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{ $labels.container}} has been in waiting state for longer than 1 hour.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontainerwaiting" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontainerwaiting",
"summary": "Pod container waiting longer than 1 hour"
}, },
"expr": "sum by (namespace, pod, container) (kube_pod_container_status_waiting_reason{job=\"kube-state-metrics\"}) > 0\n", "expr": "sum by (namespace, pod, container) (kube_pod_container_status_waiting_reason{job=\"kube-state-metrics\"}) > 0\n",
"for": "1h", "for": "1h",
@ -436,8 +740,9 @@ data:
{ {
"alert": "KubeDaemonSetNotScheduled", "alert": "KubeDaemonSetNotScheduled",
"annotations": { "annotations": {
"message": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled.", "description": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetnotscheduled" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetnotscheduled",
"summary": "DaemonSet pods are not scheduled."
}, },
"expr": "kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n -\nkube_daemonset_status_current_number_scheduled{job=\"kube-state-metrics\"} > 0\n", "expr": "kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n -\nkube_daemonset_status_current_number_scheduled{job=\"kube-state-metrics\"} > 0\n",
"for": "10m", "for": "10m",
@ -448,23 +753,12 @@ data:
{ {
"alert": "KubeDaemonSetMisScheduled", "alert": "KubeDaemonSetMisScheduled",
"annotations": { "annotations": {
"message": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are running where they are not supposed to run.", "description": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are running where they are not supposed to run.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetmisscheduled" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetmisscheduled",
"summary": "DaemonSet pods are misscheduled."
}, },
"expr": "kube_daemonset_status_number_misscheduled{job=\"kube-state-metrics\"} > 0\n", "expr": "kube_daemonset_status_number_misscheduled{job=\"kube-state-metrics\"} > 0\n",
"for": "10m", "for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeCronJobRunning",
"annotations": {
"message": "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is taking more than 1h to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecronjobrunning"
},
"expr": "time() - kube_cronjob_next_schedule_time{job=\"kube-state-metrics\"} > 3600\n",
"for": "1h",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
@ -472,11 +766,12 @@ data:
{ {
"alert": "KubeJobCompletion", "alert": "KubeJobCompletion",
"annotations": { "annotations": {
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more than one hour to complete.", "description": "Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more than 12 hours to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobcompletion" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobcompletion",
"summary": "Job did not complete in time"
}, },
"expr": "kube_job_spec_completions{job=\"kube-state-metrics\"} - kube_job_status_succeeded{job=\"kube-state-metrics\"} > 0\n", "expr": "kube_job_spec_completions{job=\"kube-state-metrics\"} - kube_job_status_succeeded{job=\"kube-state-metrics\"} > 0\n",
"for": "1h", "for": "12h",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
@ -484,8 +779,9 @@ data:
{ {
"alert": "KubeJobFailed", "alert": "KubeJobFailed",
"annotations": { "annotations": {
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete.", "description": "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobfailed" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobfailed",
"summary": "Job failed to complete."
}, },
"expr": "kube_job_failed{job=\"kube-state-metrics\"} > 0\n", "expr": "kube_job_failed{job=\"kube-state-metrics\"} > 0\n",
"for": "15m", "for": "15m",
@ -496,8 +792,9 @@ data:
{ {
"alert": "KubeHpaReplicasMismatch", "alert": "KubeHpaReplicasMismatch",
"annotations": { "annotations": {
"message": "HPA {{ $labels.namespace }}/{{ $labels.hpa }} has not matched the desired number of replicas for longer than 15 minutes.", "description": "HPA {{ $labels.namespace }}/{{ $labels.hpa }} has not matched the desired number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpareplicasmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpareplicasmismatch",
"summary": "HPA has not matched descired number of replicas."
}, },
"expr": "(kube_hpa_status_desired_replicas{job=\"kube-state-metrics\"}\n !=\nkube_hpa_status_current_replicas{job=\"kube-state-metrics\"})\n and\nchanges(kube_hpa_status_current_replicas[15m]) == 0\n", "expr": "(kube_hpa_status_desired_replicas{job=\"kube-state-metrics\"}\n !=\nkube_hpa_status_current_replicas{job=\"kube-state-metrics\"})\n and\nchanges(kube_hpa_status_current_replicas[15m]) == 0\n",
"for": "15m", "for": "15m",
@ -508,8 +805,9 @@ data:
{ {
"alert": "KubeHpaMaxedOut", "alert": "KubeHpaMaxedOut",
"annotations": { "annotations": {
"message": "HPA {{ $labels.namespace }}/{{ $labels.hpa }} has been running at max replicas for longer than 15 minutes.", "description": "HPA {{ $labels.namespace }}/{{ $labels.hpa }} has been running at max replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpamaxedout" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpamaxedout",
"summary": "HPA is running at max replicas"
}, },
"expr": "kube_hpa_status_current_replicas{job=\"kube-state-metrics\"}\n ==\nkube_hpa_spec_max_replicas{job=\"kube-state-metrics\"}\n", "expr": "kube_hpa_status_current_replicas{job=\"kube-state-metrics\"}\n ==\nkube_hpa_spec_max_replicas{job=\"kube-state-metrics\"}\n",
"for": "15m", "for": "15m",
@ -525,32 +823,35 @@ data:
{ {
"alert": "KubeCPUOvercommit", "alert": "KubeCPUOvercommit",
"annotations": { "annotations": {
"message": "Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.", "description": "Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit",
"summary": "Cluster has overcommitted CPU resource requests."
}, },
"expr": "sum(namespace:kube_pod_container_resource_requests_cpu_cores:sum)\n /\nsum(kube_node_status_allocatable_cpu_cores)\n >\n(count(kube_node_status_allocatable_cpu_cores)-1) / count(kube_node_status_allocatable_cpu_cores)\n", "expr": "sum(namespace:kube_pod_container_resource_requests_cpu_cores:sum{})\n /\nsum(kube_node_status_allocatable_cpu_cores)\n >\n(count(kube_node_status_allocatable_cpu_cores)-1) / count(kube_node_status_allocatable_cpu_cores)\n",
"for": "5m", "for": "5m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
}, },
{ {
"alert": "KubeMemOvercommit", "alert": "KubeMemoryOvercommit",
"annotations": { "annotations": {
"message": "Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.", "description": "Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememoryovercommit",
"summary": "Cluster has overcommitted memory resource requests."
}, },
"expr": "sum(namespace:kube_pod_container_resource_requests_memory_bytes:sum)\n /\nsum(kube_node_status_allocatable_memory_bytes)\n >\n(count(kube_node_status_allocatable_memory_bytes)-1)\n /\ncount(kube_node_status_allocatable_memory_bytes)\n", "expr": "sum(namespace:kube_pod_container_resource_requests_memory_bytes:sum{})\n /\nsum(kube_node_status_allocatable_memory_bytes)\n >\n(count(kube_node_status_allocatable_memory_bytes)-1)\n /\ncount(kube_node_status_allocatable_memory_bytes)\n",
"for": "5m", "for": "5m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
}, },
{ {
"alert": "KubeCPUOvercommit", "alert": "KubeCPUQuotaOvercommit",
"annotations": { "annotations": {
"message": "Cluster has overcommitted CPU resource requests for Namespaces.", "description": "Cluster has overcommitted CPU resource requests for Namespaces.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuquotaovercommit",
"summary": "Cluster has overcommitted CPU resource requests."
}, },
"expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"cpu\"})\n /\nsum(kube_node_status_allocatable_cpu_cores)\n > 1.5\n", "expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"cpu\"})\n /\nsum(kube_node_status_allocatable_cpu_cores)\n > 1.5\n",
"for": "5m", "for": "5m",
@ -559,10 +860,11 @@ data:
} }
}, },
{ {
"alert": "KubeMemOvercommit", "alert": "KubeMemoryQuotaOvercommit",
"annotations": { "annotations": {
"message": "Cluster has overcommitted memory resource requests for Namespaces.", "description": "Cluster has overcommitted memory resource requests for Namespaces.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememoryquotaovercommit",
"summary": "Cluster has overcommitted memory resource requests."
}, },
"expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"memory\"})\n /\nsum(kube_node_status_allocatable_memory_bytes{job=\"node-exporter\"})\n > 1.5\n", "expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"memory\"})\n /\nsum(kube_node_status_allocatable_memory_bytes{job=\"node-exporter\"})\n > 1.5\n",
"for": "5m", "for": "5m",
@ -570,13 +872,40 @@ data:
"severity": "warning" "severity": "warning"
} }
}, },
{
"alert": "KubeQuotaAlmostFull",
"annotations": {
"description": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaalmostfull",
"summary": "Namespace quota is going to be full."
},
"expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 0.9 < 1\n",
"for": "15m",
"labels": {
"severity": "info"
}
},
{
"alert": "KubeQuotaFullyUsed",
"annotations": {
"description": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotafullyused",
"summary": "Namespace quota is fully used."
},
"expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n == 1\n",
"for": "15m",
"labels": {
"severity": "info"
}
},
{ {
"alert": "KubeQuotaExceeded", "alert": "KubeQuotaExceeded",
"annotations": { "annotations": {
"message": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.", "description": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded",
"summary": "Namespace quota has exceeded the limits."
}, },
"expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 0.90\n", "expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 1\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -585,13 +914,14 @@ data:
{ {
"alert": "CPUThrottlingHigh", "alert": "CPUThrottlingHigh",
"annotations": { "annotations": {
"message": "{{ $value | humanizePercentage }} throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.", "description": "{{ $value | humanizePercentage }} throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh",
"summary": "Processes experience elevated CPU throttling."
}, },
"expr": "sum(increase(container_cpu_cfs_throttled_periods_total{container!=\"\", }[5m])) by (container, pod, namespace)\n /\nsum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)\n > ( 100 / 100 )\n", "expr": "sum(increase(container_cpu_cfs_throttled_periods_total{container!=\"\", }[5m])) by (container, pod, namespace)\n /\nsum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)\n > ( 80 / 100 )\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "info"
} }
} }
] ]
@ -600,10 +930,11 @@ data:
"name": "kubernetes-storage", "name": "kubernetes-storage",
"rules": [ "rules": [
{ {
"alert": "KubePersistentVolumeUsageCritical", "alert": "KubePersistentVolumeFillingUp",
"annotations": { "annotations": {
"message": "The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is only {{ $value | humanizePercentage }} free.", "description": "The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is only {{ $value | humanizePercentage }} free.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeusagecritical" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefillingup",
"summary": "PersistentVolume is filling up."
}, },
"expr": "kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\nkubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n < 0.03\n", "expr": "kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\nkubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n < 0.03\n",
"for": "1m", "for": "1m",
@ -612,22 +943,24 @@ data:
} }
}, },
{ {
"alert": "KubePersistentVolumeFullInFourDays", "alert": "KubePersistentVolumeFillingUp",
"annotations": { "annotations": {
"message": "Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is expected to fill up within four days. Currently {{ $value | humanizePercentage }} is available.", "description": "Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is expected to fill up within four days. Currently {{ $value | humanizePercentage }} is available.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefullinfourdays" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefillingup",
"summary": "PersistentVolume is filling up."
}, },
"expr": "(\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 0.15\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\n", "expr": "(\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 0.15\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\n",
"for": "5m", "for": "1h",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubePersistentVolumeErrors", "alert": "KubePersistentVolumeErrors",
"annotations": { "annotations": {
"message": "The persistent volume {{ $labels.persistentvolume }} has status {{ $labels.phase }}.", "description": "The persistent volume {{ $labels.persistentvolume }} has status {{ $labels.phase }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeerrors" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeerrors",
"summary": "PersistentVolume is having issues with provisioning."
}, },
"expr": "kube_persistentvolume_status_phase{phase=~\"Failed|Pending\",job=\"kube-state-metrics\"} > 0\n", "expr": "kube_persistentvolume_status_phase{phase=~\"Failed|Pending\",job=\"kube-state-metrics\"} > 0\n",
"for": "5m", "for": "5m",
@ -643,10 +976,11 @@ data:
{ {
"alert": "KubeVersionMismatch", "alert": "KubeVersionMismatch",
"annotations": { "annotations": {
"message": "There are {{ $value }} different semantic versions of Kubernetes components running.", "description": "There are {{ $value }} different semantic versions of Kubernetes components running.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch",
"summary": "Different semantic versions of Kubernetes components running."
}, },
"expr": "count(count by (gitVersion) (label_replace(kubernetes_build_info{job!~\"kube-dns|coredns\"},\"gitVersion\",\"$1\",\"gitVersion\",\"(v[0-9]*.[0-9]*.[0-9]*).*\"))) > 1\n", "expr": "count(count by (gitVersion) (label_replace(kubernetes_build_info{job!~\"kube-dns|coredns\"},\"gitVersion\",\"$1\",\"gitVersion\",\"(v[0-9]*.[0-9]*).*\"))) > 1\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -655,8 +989,9 @@ data:
{ {
"alert": "KubeClientErrors", "alert": "KubeClientErrors",
"annotations": { "annotations": {
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ $value | humanizePercentage }} errors.'", "description": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ $value | humanizePercentage }} errors.'",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors",
"summary": "Kubernetes API server client is experiencing errors."
}, },
"expr": "(sum(rate(rest_client_requests_total{code=~\"5..\"}[5m])) by (instance, job)\n /\nsum(rate(rest_client_requests_total[5m])) by (instance, job))\n> 0.01\n", "expr": "(sum(rate(rest_client_requests_total{code=~\"5..\"}[5m])) by (instance, job)\n /\nsum(rate(rest_client_requests_total[5m])) by (instance, job))\n> 0.01\n",
"for": "15m", "for": "15m",
@ -666,77 +1001,82 @@ data:
} }
] ]
}, },
{
"name": "kube-apiserver-slos",
"rules": [
{
"alert": "KubeAPIErrorBudgetBurn",
"annotations": {
"description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
},
"expr": "sum(apiserver_request:burnrate1h) > (14.40 * 0.01000)\nand\nsum(apiserver_request:burnrate5m) > (14.40 * 0.01000)\n",
"for": "2m",
"labels": {
"long": "1h",
"severity": "critical",
"short": "5m"
}
},
{
"alert": "KubeAPIErrorBudgetBurn",
"annotations": {
"description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
},
"expr": "sum(apiserver_request:burnrate6h) > (6.00 * 0.01000)\nand\nsum(apiserver_request:burnrate30m) > (6.00 * 0.01000)\n",
"for": "15m",
"labels": {
"long": "6h",
"severity": "critical",
"short": "30m"
}
},
{
"alert": "KubeAPIErrorBudgetBurn",
"annotations": {
"description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
},
"expr": "sum(apiserver_request:burnrate1d) > (3.00 * 0.01000)\nand\nsum(apiserver_request:burnrate2h) > (3.00 * 0.01000)\n",
"for": "1h",
"labels": {
"long": "1d",
"severity": "warning",
"short": "2h"
}
},
{
"alert": "KubeAPIErrorBudgetBurn",
"annotations": {
"description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
},
"expr": "sum(apiserver_request:burnrate3d) > (1.00 * 0.01000)\nand\nsum(apiserver_request:burnrate6h) > (1.00 * 0.01000)\n",
"for": "3h",
"labels": {
"long": "3d",
"severity": "warning",
"short": "6h"
}
}
]
},
{ {
"name": "kubernetes-system-apiserver", "name": "kubernetes-system-apiserver",
"rules": [ "rules": [
{ {
"alert": "KubeAPILatencyHigh", "alert": "KubeClientCertificateExpiration",
"annotations": { "annotations": {
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.", "description": "A client certificate used to authenticate to the apiserver is expiring in less than 1.0 hours.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration",
"summary": "Client certificate is about to expire."
}, },
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|PROXY|CONNECT\"} > 1\n", "expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 3600\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPILatencyHigh",
"annotations": {
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
},
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|PROXY|CONNECT\"} > 4\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m]))\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) > 0.03\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m]))\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) > 0.01\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (resource,subresource,verb) > 0.10\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (resource,subresource,verb) > 0.05\n",
"for": "10m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
@ -744,30 +1084,46 @@ data:
{ {
"alert": "KubeClientCertificateExpiration", "alert": "KubeClientCertificateExpiration",
"annotations": { "annotations": {
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 7.0 days.", "description": "A client certificate used to authenticate to the apiserver is expiring in less than 0.1 hours.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration",
"summary": "Client certificate is about to expire."
}, },
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 604800\n", "expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 300\n",
"labels": {
"severity": "critical"
}
},
{
"alert": "AggregatedAPIErrors",
"annotations": {
"description": "An aggregated API {{ $labels.name }}/{{ $labels.namespace }} has reported errors. The number of errors have increased for it in the past five minutes. High values indicate that the availability of the service changes too often.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapierrors",
"summary": "An aggregated API has reported errors."
},
"expr": "sum by(name, namespace)(increase(aggregator_unavailable_apiservice_count[5m])) > 2\n",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
}, },
{ {
"alert": "KubeClientCertificateExpiration", "alert": "AggregatedAPIDown",
"annotations": { "annotations": {
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.", "description": "An aggregated API {{ $labels.name }}/{{ $labels.namespace }} has been only {{ $value | humanize }}% available over the last 10m.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapidown",
"summary": "An aggregated API is down."
}, },
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 86400\n", "expr": "(1 - max by(name, namespace)(avg_over_time(aggregator_unavailable_apiservice[10m]))) * 100 < 85\n",
"for": "5m",
"labels": { "labels": {
"severity": "critical" "severity": "warning"
} }
}, },
{ {
"alert": "KubeAPIDown", "alert": "KubeAPIDown",
"annotations": { "annotations": {
"message": "KubeAPI has disappeared from Prometheus target discovery.", "description": "KubeAPI has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown",
"summary": "Target disappeared from Prometheus target discovery."
}, },
"expr": "absent(up{job=\"apiserver\"} == 1)\n", "expr": "absent(up{job=\"apiserver\"} == 1)\n",
"for": "15m", "for": "15m",
@ -783,8 +1139,9 @@ data:
{ {
"alert": "KubeNodeNotReady", "alert": "KubeNodeNotReady",
"annotations": { "annotations": {
"message": "{{ $labels.node }} has been unready for more than 15 minutes.", "description": "{{ $labels.node }} has been unready for more than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready",
"summary": "Node is not ready."
}, },
"expr": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n", "expr": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n",
"for": "15m", "for": "15m",
@ -795,10 +1152,12 @@ data:
{ {
"alert": "KubeNodeUnreachable", "alert": "KubeNodeUnreachable",
"annotations": { "annotations": {
"message": "{{ $labels.node }} is unreachable and some workloads may be rescheduled.", "description": "{{ $labels.node }} is unreachable and some workloads may be rescheduled.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodeunreachable" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodeunreachable",
"summary": "Node is unreachable."
}, },
"expr": "kube_node_spec_taint{job=\"kube-state-metrics\",key=\"node.kubernetes.io/unreachable\",effect=\"NoSchedule\"} == 1\n", "expr": "(kube_node_spec_taint{job=\"kube-state-metrics\",key=\"node.kubernetes.io/unreachable\",effect=\"NoSchedule\"} unless ignoring(key,value) kube_node_spec_taint{job=\"kube-state-metrics\",key=~\"ToBeDeletedByClusterAutoscaler|cloud.google.com/impending-node-termination|aws-node-termination-handler/spot-itn\"}) == 1\n",
"for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
@ -806,10 +1165,124 @@ data:
{ {
"alert": "KubeletTooManyPods", "alert": "KubeletTooManyPods",
"annotations": { "annotations": {
"message": "Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage }} of its Pod capacity.", "description": "Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage }} of its Pod capacity.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods",
"summary": "Kubelet is running at capacity."
}, },
"expr": "max(max(kubelet_running_pod_count{job=\"kubelet\"}) by(instance) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"}) by(node) / max(kube_node_status_capacity_pods{job=\"kube-state-metrics\"}) by(node) > 0.95\n", "expr": "count by(node) (\n (kube_pod_status_phase{job=\"kube-state-metrics\",phase=\"Running\"} == 1) * on(instance,pod,namespace,cluster) group_left(node) topk by(instance,pod,namespace,cluster) (1, kube_pod_info{job=\"kube-state-metrics\"})\n)\n/\nmax by(node) (\n kube_node_status_capacity_pods{job=\"kube-state-metrics\"} != 1\n) > 0.95\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeNodeReadinessFlapping",
"annotations": {
"description": "The readiness status of node {{ $labels.node }} has changed {{ $value }} times in the last 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping",
"summary": "Node readiness status is flapping."
},
"expr": "sum(changes(kube_node_status_condition{status=\"true\",condition=\"Ready\"}[15m])) by (node) > 2\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletPlegDurationHigh",
"annotations": {
"description": "The Kubelet Pod Lifecycle Event Generator has a 99th percentile duration of {{ $value }} seconds on node {{ $labels.node }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletplegdurationhigh",
"summary": "Kubelet Pod Lifecycle Event Generator is taking too long to relist."
},
"expr": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile{quantile=\"0.99\"} >= 10\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletPodStartUpLatencyHigh",
"annotations": {
"description": "Kubelet Pod startup 99th percentile latency is {{ $value }} seconds on node {{ $labels.node }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh",
"summary": "Kubelet Pod startup latency is too high."
},
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\"}[5m])) by (instance, le)) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"} > 60\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletClientCertificateExpiration",
"annotations": {
"description": "Client certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletclientcertificateexpiration",
"summary": "Kubelet client certificate is about to expire."
},
"expr": "kubelet_certificate_manager_client_ttl_seconds < 3600\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletClientCertificateExpiration",
"annotations": {
"description": "Client certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletclientcertificateexpiration",
"summary": "Kubelet client certificate is about to expire."
},
"expr": "kubelet_certificate_manager_client_ttl_seconds < 300\n",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeletServerCertificateExpiration",
"annotations": {
"description": "Server certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletservercertificateexpiration",
"summary": "Kubelet server certificate is about to expire."
},
"expr": "kubelet_certificate_manager_server_ttl_seconds < 3600\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletServerCertificateExpiration",
"annotations": {
"description": "Server certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletservercertificateexpiration",
"summary": "Kubelet server certificate is about to expire."
},
"expr": "kubelet_certificate_manager_server_ttl_seconds < 300\n",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeletClientCertificateRenewalErrors",
"annotations": {
"description": "Kubelet on node {{ $labels.node }} has failed to renew its client certificate ({{ $value | humanize }} errors in the last 5 minutes).",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletclientcertificaterenewalerrors",
"summary": "Kubelet has failed to renew its client certificate."
},
"expr": "increase(kubelet_certificate_manager_client_expiration_renew_errors[5m]) > 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletServerCertificateRenewalErrors",
"annotations": {
"description": "Kubelet on node {{ $labels.node }} has failed to renew its server certificate ({{ $value | humanize }} errors in the last 5 minutes).",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletservercertificaterenewalerrors",
"summary": "Kubelet has failed to renew its server certificate."
},
"expr": "increase(kubelet_server_expiration_renew_errors[5m]) > 0\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -818,8 +1291,9 @@ data:
{ {
"alert": "KubeletDown", "alert": "KubeletDown",
"annotations": { "annotations": {
"message": "Kubelet has disappeared from Prometheus target discovery.", "description": "Kubelet has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown",
"summary": "Target disappeared from Prometheus target discovery."
}, },
"expr": "absent(up{job=\"kubelet\"} == 1)\n", "expr": "absent(up{job=\"kubelet\"} == 1)\n",
"for": "15m", "for": "15m",
@ -835,8 +1309,9 @@ data:
{ {
"alert": "KubeSchedulerDown", "alert": "KubeSchedulerDown",
"annotations": { "annotations": {
"message": "KubeScheduler has disappeared from Prometheus target discovery.", "description": "KubeScheduler has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown",
"summary": "Target disappeared from Prometheus target discovery."
}, },
"expr": "absent(up{job=\"kube-scheduler\"} == 1)\n", "expr": "absent(up{job=\"kube-scheduler\"} == 1)\n",
"for": "15m", "for": "15m",
@ -852,8 +1327,9 @@ data:
{ {
"alert": "KubeControllerManagerDown", "alert": "KubeControllerManagerDown",
"annotations": { "annotations": {
"message": "KubeControllerManager has disappeared from Prometheus target discovery.", "description": "KubeControllerManager has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown" "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown",
"summary": "Target disappeared from Prometheus target discovery."
}, },
"expr": "absent(up{job=\"kube-controller-manager\"} == 1)\n", "expr": "absent(up{job=\"kube-controller-manager\"} == 1)\n",
"for": "15m", "for": "15m",
@ -865,6 +1341,363 @@ data:
} }
] ]
} }
loki.yaml: |-
{
"groups": [
{
"name": "loki_rules",
"rules": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job))",
"record": "job:loki_request_duration_seconds:99quantile"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job))",
"record": "job:loki_request_duration_seconds:50quantile"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (job) / sum(rate(loki_request_duration_seconds_count[1m])) by (job)",
"record": "job:loki_request_duration_seconds:avg"
},
{
"expr": "sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job)",
"record": "job:loki_request_duration_seconds_bucket:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (job)",
"record": "job:loki_request_duration_seconds_sum:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_count[1m])) by (job)",
"record": "job:loki_request_duration_seconds_count:sum_rate"
},
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route))",
"record": "job_route:loki_request_duration_seconds:99quantile"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route))",
"record": "job_route:loki_request_duration_seconds:50quantile"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (job, route)",
"record": "job_route:loki_request_duration_seconds:avg"
},
{
"expr": "sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route)",
"record": "job_route:loki_request_duration_seconds_bucket:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (job, route)",
"record": "job_route:loki_request_duration_seconds_sum:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_count[1m])) by (job, route)",
"record": "job_route:loki_request_duration_seconds_count:sum_rate"
},
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, namespace, job, route))",
"record": "namespace_job_route:loki_request_duration_seconds:99quantile"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, namespace, job, route))",
"record": "namespace_job_route:loki_request_duration_seconds:50quantile"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (namespace, job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route)",
"record": "namespace_job_route:loki_request_duration_seconds:avg"
},
{
"expr": "sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, namespace, job, route)",
"record": "namespace_job_route:loki_request_duration_seconds_bucket:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (namespace, job, route)",
"record": "namespace_job_route:loki_request_duration_seconds_sum:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route)",
"record": "namespace_job_route:loki_request_duration_seconds_count:sum_rate"
}
]
},
{
"name": "loki_alerts",
"rules": [
{
"alert": "LokiRequestErrors",
"annotations": {
"message": "{{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}% errors.\n"
},
"expr": "100 * sum(rate(loki_request_duration_seconds_count{status_code=~\"5..\"}[1m])) by (namespace, job, route)\n /\nsum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route)\n > 10\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "LokiRequestLatency",
"annotations": {
"message": "{{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}s 99th percentile latency.\n"
},
"expr": "namespace_job_route:loki_request_duration_seconds:99quantile{route!~\"(?i).*tail.*\"} > 1\n",
"for": "15m",
"labels": {
"severity": "critical"
}
}
]
}
]
}
node-exporter.yaml: |-
{
"groups": [
{
"name": "node-exporter.rules",
"rules": [
{
"expr": "count without (cpu) (\n count without (mode) (\n node_cpu_seconds_total{job=\"node-exporter\"}\n )\n)\n",
"record": "instance:node_num_cpu:sum"
},
{
"expr": "1 - avg without (cpu, mode) (\n rate(node_cpu_seconds_total{job=\"node-exporter\", mode=\"idle\"}[1m])\n)\n",
"record": "instance:node_cpu_utilisation:rate1m"
},
{
"expr": "(\n node_load1{job=\"node-exporter\"}\n/\n instance:node_num_cpu:sum{job=\"node-exporter\"}\n)\n",
"record": "instance:node_load1_per_cpu:ratio"
},
{
"expr": "1 - (\n node_memory_MemAvailable_bytes{job=\"node-exporter\"}\n/\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n)\n",
"record": "instance:node_memory_utilisation:ratio"
},
{
"expr": "rate(node_vmstat_pgmajfault{job=\"node-exporter\"}[1m])\n",
"record": "instance:node_vmstat_pgmajfault:rate1m"
},
{
"expr": "rate(node_disk_io_time_seconds_total{job=\"node-exporter\", device!~\"dm.*\"}[1m])\n",
"record": "instance_device:node_disk_io_time_seconds:rate1m"
},
{
"expr": "rate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\", device!~\"dm.*\"}[1m])\n",
"record": "instance_device:node_disk_io_time_weighted_seconds:rate1m"
},
{
"expr": "sum without (device) (\n rate(node_network_receive_bytes_total{job=\"node-exporter\", device!=\"lo\"}[1m])\n)\n",
"record": "instance:node_network_receive_bytes_excluding_lo:rate1m"
},
{
"expr": "sum without (device) (\n rate(node_network_transmit_bytes_total{job=\"node-exporter\", device!=\"lo\"}[1m])\n)\n",
"record": "instance:node_network_transmit_bytes_excluding_lo:rate1m"
},
{
"expr": "sum without (device) (\n rate(node_network_receive_drop_total{job=\"node-exporter\", device!=\"lo\"}[1m])\n)\n",
"record": "instance:node_network_receive_drop_excluding_lo:rate1m"
},
{
"expr": "sum without (device) (\n rate(node_network_transmit_drop_total{job=\"node-exporter\", device!=\"lo\"}[1m])\n)\n",
"record": "instance:node_network_transmit_drop_excluding_lo:rate1m"
}
]
},
{
"name": "node-exporter",
"rules": [
{
"alert": "NodeFilesystemSpaceFillingUp",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left and is filling up.",
"summary": "Filesystem is predicted to run out of space within the next 24 hours."
},
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 40\nand\n predict_linear(node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 24*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFilesystemSpaceFillingUp",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left and is filling up fast.",
"summary": "Filesystem is predicted to run out of space within the next 4 hours."
},
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 20\nand\n predict_linear(node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 4*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeFilesystemAlmostOutOfSpace",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left.",
"summary": "Filesystem has less than 5% space left."
},
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 5\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFilesystemAlmostOutOfSpace",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left.",
"summary": "Filesystem has less than 3% space left."
},
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 3\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeFilesystemFilesFillingUp",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left and is filling up.",
"summary": "Filesystem is predicted to run out of inodes within the next 24 hours."
},
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 40\nand\n predict_linear(node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 24*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFilesystemFilesFillingUp",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left and is filling up fast.",
"summary": "Filesystem is predicted to run out of inodes within the next 4 hours."
},
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 20\nand\n predict_linear(node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 4*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeFilesystemAlmostOutOfFiles",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left.",
"summary": "Filesystem has less than 5% inodes left."
},
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 5\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFilesystemAlmostOutOfFiles",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left.",
"summary": "Filesystem has less than 3% inodes left."
},
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 3\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeNetworkReceiveErrs",
"annotations": {
"description": "{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} receive errors in the last two minutes.",
"summary": "Network interface is reporting many receive errors."
},
"expr": "increase(node_network_receive_errs_total[2m]) > 10\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeNetworkTransmitErrs",
"annotations": {
"description": "{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} transmit errors in the last two minutes.",
"summary": "Network interface is reporting many transmit errors."
},
"expr": "increase(node_network_transmit_errs_total[2m]) > 10\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeHighNumberConntrackEntriesUsed",
"annotations": {
"description": "{{ $value | humanizePercentage }} of conntrack entries are used.",
"summary": "Number of conntrack are getting close to the limit."
},
"expr": "(node_nf_conntrack_entries / node_nf_conntrack_entries_limit) > 0.75\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeTextFileCollectorScrapeError",
"annotations": {
"description": "Node Exporter text file collector failed to scrape.",
"summary": "Node Exporter text file collector failed to scrape."
},
"expr": "node_textfile_scrape_error{job=\"node-exporter\"} == 1\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeClockSkewDetected",
"annotations": {
"message": "Clock on {{ $labels.instance }} is out of sync by more than 300s. Ensure NTP is configured correctly on this host.",
"summary": "Clock skew detected."
},
"expr": "(\n node_timex_offset_seconds > 0.05\nand\n deriv(node_timex_offset_seconds[5m]) >= 0\n)\nor\n(\n node_timex_offset_seconds < -0.05\nand\n deriv(node_timex_offset_seconds[5m]) <= 0\n)\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeClockNotSynchronising",
"annotations": {
"message": "Clock on {{ $labels.instance }} is not synchronising. Ensure NTP is configured on this host.",
"summary": "Clock not synchronising."
},
"expr": "min_over_time(node_timex_sync_status[5m]) == 0\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeRAIDDegraded",
"annotations": {
"description": "RAID array '{{ $labels.device }}' on {{ $labels.instance }} is in degraded state due to one or more disks failures. Number of spare drives is insufficient to fix issue automatically.",
"summary": "RAID Array is degraded"
},
"expr": "node_md_disks_required - ignoring (state) (node_md_disks{state=\"active\"}) > 0\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeRAIDDiskFailure",
"annotations": {
"description": "At least one device in RAID array on {{ $labels.instance }} failed. Array '{{ $labels.device }}' needs attention and possibly a disk swap.",
"summary": "Failed device in RAID array"
},
"expr": "node_md_disks{state=\"fail\"} > 0\n",
"labels": {
"severity": "warning"
}
}
]
}
]
}
prom.yaml: |- prom.yaml: |-
{ {
"groups": [ "groups": [
@ -994,7 +1827,7 @@ data:
{ {
"alert": "PrometheusRemoteStorageFailures", "alert": "PrometheusRemoteStorageFailures",
"annotations": { "annotations": {
"description": "Prometheus {{$labels.instance}} failed to send {{ printf \"%.1f\" $value }}% of the samples to queue {{$labels.queue}}.", "description": "Prometheus {{$labels.instance}} failed to send {{ printf \"%.1f\" $value }}% of the samples to {{ $labels.remote_name}}:{{ $labels.url }}",
"summary": "Prometheus fails to send samples to remote storage." "summary": "Prometheus fails to send samples to remote storage."
}, },
"expr": "(\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n/\n (\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n +\n rate(prometheus_remote_storage_succeeded_samples_total{job=\"prometheus\"}[5m])\n )\n)\n* 100\n> 1\n", "expr": "(\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n/\n (\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n +\n rate(prometheus_remote_storage_succeeded_samples_total{job=\"prometheus\"}[5m])\n )\n)\n* 100\n> 1\n",
@ -1006,7 +1839,7 @@ data:
{ {
"alert": "PrometheusRemoteWriteBehind", "alert": "PrometheusRemoteWriteBehind",
"annotations": { "annotations": {
"description": "Prometheus {{$labels.instance}} remote write is {{ printf \"%.1f\" $value }}s behind for queue {{$labels.queue}}.", "description": "Prometheus {{$labels.instance}} remote write is {{ printf \"%.1f\" $value }}s behind for {{ $labels.remote_name}}:{{ $labels.url }}.",
"summary": "Prometheus remote write is behind." "summary": "Prometheus remote write is behind."
}, },
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds{job=\"prometheus\"}[5m])\n- on(job, instance) group_right\n max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job=\"prometheus\"}[5m])\n)\n> 120\n", "expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds{job=\"prometheus\"}[5m])\n- on(job, instance) group_right\n max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job=\"prometheus\"}[5m])\n)\n> 120\n",
@ -1018,10 +1851,10 @@ data:
{ {
"alert": "PrometheusRemoteWriteDesiredShards", "alert": "PrometheusRemoteWriteDesiredShards",
"annotations": { "annotations": {
"description": "Prometheus {{$labels.instance}} remote write desired shards calculation wants to run {{ printf $value }} shards, which is more than the max of {{ printf `prometheus_remote_storage_shards_max{instance=\"%s\",job=\"prometheus\"}` $labels.instance | query | first | value }}.", "description": "Prometheus {{$labels.instance}} remote write desired shards calculation wants to run {{ $value }} shards for queue {{ $labels.remote_name}}:{{ $labels.url }}, which is more than the max of {{ printf `prometheus_remote_storage_shards_max{instance=\"%s\",job=\"prometheus\"}` $labels.instance | query | first | value }}.",
"summary": "Prometheus remote write desired shards calculation wants to run more than configured max shards." "summary": "Prometheus remote write desired shards calculation wants to run more than configured max shards."
}, },
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_shards_desired{job=\"prometheus\"}[5m])\n> on(job, instance) group_right\n max_over_time(prometheus_remote_storage_shards_max{job=\"prometheus\"}[5m])\n)\n", "expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_shards_desired{job=\"prometheus\"}[5m])\n>\n max_over_time(prometheus_remote_storage_shards_max{job=\"prometheus\"}[5m])\n)\n",
"for": "15m", "for": "15m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"
@ -1050,6 +1883,18 @@ data:
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
},
{
"alert": "PrometheusTargetLimitHit",
"annotations": {
"description": "Prometheus {{$labels.instance}} has dropped {{ printf \"%.0f\" $value }} targets because the number of targets exceeded the configured target_limit.",
"summary": "Prometheus has dropped targets because some scrape configs have exceeded the targets limit."
},
"expr": "increase(prometheus_target_scrape_pool_exceeded_target_limit_total{job=\"prometheus\"}[5m]) > 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
} }
] ]
} }
@ -1071,6 +1916,17 @@ data:
"labels": { "labels": {
"severity": "warning" "severity": "warning"
} }
},
{
"alert": "BlackboxProbeFailure",
"annotations": {
"message": "Blackbox probe {{$labels.instance}} failed"
},
"expr": "probe_success == 0",
"for": "2m",
"labels": {
"severity": "critical"
}
} }
] ]
}, },
@ -1082,7 +1938,7 @@ data:
"annotations": { "annotations": {
"message": "{{ $value }} RAID disk(s) on node {{ $labels.instance }} are inactive." "message": "{{ $value }} RAID disk(s) on node {{ $labels.instance }} are inactive."
}, },
"expr": "node_md_disks - node_md_disks_active > 0", "expr": "node_md_disks{state=\"failed\"} > 0",
"for": "10m", "for": "10m",
"labels": { "labels": {
"severity": "warning" "severity": "warning"

View File

@ -1,50 +0,0 @@
locals {
# Pick a CoreOS Container Linux derivative
# coreos-stable -> Container Linux AMI
# flatcar-stable -> Flatcar Linux AMI
ami_id = local.flavor == "flatcar" ? data.aws_ami.flatcar.image_id : data.aws_ami.coreos.image_id
flavor = element(split("-", var.os_image), 0)
channel = element(split("-", var.os_image), 1)
}
data "aws_ami" "coreos" {
most_recent = true
owners = ["595879546273"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["CoreOS-${local.flavor == "coreos" ? local.channel : "stable"}-*"]
}
}
data "aws_ami" "flatcar" {
most_recent = true
owners = ["075585003325"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-${local.flavor == "flatcar" ? local.channel : "stable"}-*"]
}
}

View File

@ -1,162 +0,0 @@
---
systemd:
units:
- name: etcd-member.service
enable: true
dropins:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.4.3"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380"
Environment="ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381"
Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}"
Environment="ETCD_STRICT_RECONFIG_CHECK=true"
Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt"
Environment="ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt"
Environment="ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key"
Environment="ETCD_CLIENT_CERT_AUTH=true"
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt"
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt"
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key"
Environment="ETCD_PEER_CLIENT_CERT_AUTH=true"
- name: docker.service
enable: true
- name: locksmithd.service
mask: true
- name: wait-for-dns.service
enable: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
enable: true
contents: |
[Unit]
Description=Kubelet via Hyperkube
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--insecure-options=image"
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--cgroup-driver=$${KUBELET_CGROUP_DRIVER} \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/master \
--node-labels=node.kubernetes.io/controller="true" \
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: bootstrap.service
contents: |
[Unit]
Description=Kubernetes control plane
ConditionPathExists=!/opt/bootstrap/bootstrap.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/rkt run \
--trust-keys-from-https \
--volume assets,kind=host,source=/opt/bootstrap/assets \
--mount volume=assets,target=/assets \
--volume script,kind=host,source=/opt/bootstrap/apply \
--mount volume=script,target=/apply \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.16.3 \
--net=host \
--dns=host \
--exec=/apply
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubeconfig
filesystem: root
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/kubernetes/kubelet.env
filesystem: root
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.3
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/assets/auth/kubeconfig
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5
done
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
inline: |
fs.inotify.max_user_watches=16184
passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_key}"

View File

@ -1,99 +0,0 @@
# Secure copy assets to controllers.
resource "null_resource" "copy-controller-secrets" {
count = var.controller_count
depends_on = [
module.bootstrap,
]
connection {
type = "ssh"
host = aws_instance.controllers.*.public_ip[count.index]
user = "core"
timeout = "15m"
}
provisioner "file" {
content = module.bootstrap.etcd_ca_cert
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_cert
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_key
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = module.bootstrap.etcd_server_cert
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_server_key
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_cert
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_key
destination = "$HOME/etcd-peer.key"
}
provisioner "file" {
source = var.asset_dir
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
"sudo mv $HOME/assets /opt/bootstrap/assets",
"sudo mkdir -p /etc/kubernetes/manifests",
"sudo mkdir -p /etc/kubernetes/bootstrap-secrets",
"sudo cp -r /opt/bootstrap/assets/tls/* /etc/kubernetes/bootstrap-secrets/",
"sudo cp /opt/bootstrap/assets/auth/kubeconfig /etc/kubernetes/bootstrap-secrets/",
"sudo cp -r /opt/bootstrap/assets/static-manifests/* /etc/kubernetes/manifests/",
]
}
}
# Connect to a controller to perform one-time cluster bootstrap.
resource "null_resource" "bootstrap" {
depends_on = [
null_resource.copy-controller-secrets,
module.workers,
aws_route53_record.apiserver,
]
connection {
type = "ssh"
host = aws_instance.controllers[0].public_ip
user = "core"
timeout = "15m"
}
provisioner "remote-exec" {
inline = [
"sudo systemctl start bootstrap",
]
}
}

View File

@ -1,11 +0,0 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.12.0"
required_providers {
aws = "~> 2.23"
ct = "~> 0.3"
template = "~> 2.1"
null = "~> 2.1"
}
}

View File

@ -1,50 +0,0 @@
locals {
# Pick a CoreOS Container Linux derivative
# coreos-stable -> Container Linux AMI
# flatcar-stable -> Flatcar Linux AMI
ami_id = local.flavor == "flatcar" ? data.aws_ami.flatcar.image_id : data.aws_ami.coreos.image_id
flavor = element(split("-", var.os_image), 0)
channel = element(split("-", var.os_image), 1)
}
data "aws_ami" "coreos" {
most_recent = true
owners = ["595879546273"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["CoreOS-${local.flavor == "coreos" ? local.channel : "stable"}-*"]
}
}
data "aws_ami" "flatcar" {
most_recent = true
owners = ["075585003325"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-${local.flavor == "flatcar" ? local.channel : "stable"}-*"]
}
}

View File

@ -1,4 +0,0 @@
terraform {
required_version = ">= 0.12"
}

View File

@ -11,11 +11,11 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.3 (upstream) * Kubernetes v1.20.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/fedora-coreos/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, CSI, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs ## Docs

View File

@ -14,10 +14,33 @@ data "aws_ami" "fedora-coreos" {
} }
filter { filter {
name = "name" name = "description"
values = ["fedora-coreos-30.*.*-hvm"] values = ["Fedora CoreOS ${var.os_stream} *"]
}
}
# Experimental Fedora CoreOS arm64 / aarch64 AMIs from Poseidon
# WARNING: These AMIs will be removed when Fedora CoreOS publishes arm64 AMIs
# and may be removed for any reason before then as well. Do not use.
data "aws_ami" "fedora-coreos-arm" {
count = var.arch == "arm64" ? 1 : 0
most_recent = true
owners = ["099663496933"]
filter {
name = "architecture"
values = ["arm64"]
} }
# try to filter out dev images (AWS filters can't) filter {
name_regex = "^fedora-coreos-30.[0-9]*.[0-9]*-hvm*" name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["fedora-coreos-*"]
}
} }

View File

@ -1,11 +1,10 @@
# Kubernetes assets (kubeconfig, manifests) # Kubernetes assets (kubeconfig, manifests)
module "bootstrap" { module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=0daa1276c633fea28e41b2c2c18831e2584deb24" source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=4edd79dd0295e6ffa4c8ed04fd5914d5cb3f1b7c"
cluster_name = var.cluster_name cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)] api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]
etcd_servers = aws_route53_record.etcds.*.fqdn etcd_servers = aws_route53_record.etcds.*.fqdn
asset_dir = var.asset_dir
networking = var.networking networking = var.networking
network_mtu = var.network_mtu network_mtu = var.network_mtu
pod_cidr = var.pod_cidr pod_cidr = var.pod_cidr
@ -13,6 +12,7 @@ module "bootstrap" {
cluster_domain_suffix = var.cluster_domain_suffix cluster_domain_suffix = var.cluster_domain_suffix
enable_reporting = var.enable_reporting enable_reporting = var.enable_reporting
enable_aggregation = var.enable_aggregation enable_aggregation = var.enable_aggregation
daemonset_tolerations = var.daemonset_tolerations
trusted_certs_dir = "/etc/pki/tls/certs" trusted_certs_dir = "/etc/pki/tls/certs"
} }

View File

@ -22,9 +22,8 @@ resource "aws_instance" "controllers" {
} }
instance_type = var.controller_type instance_type = var.controller_type
ami = var.arch == "arm64" ? data.aws_ami.fedora-coreos-arm[0].image_id : data.aws_ami.fedora-coreos.image_id
ami = data.aws_ami.fedora-coreos.image_id user_data = data.ct_config.controller-ignitions.*.rendered[count.index]
user_data = data.ct_config.controller-ignitions.*.rendered[count.index]
# storage # storage
root_block_device { root_block_device {
@ -36,7 +35,7 @@ resource "aws_instance" "controllers" {
# network # network
associate_public_ip_address = true associate_public_ip_address = true
subnet_id = aws_subnet.public.*.id[count.index] subnet_id = element(aws_subnet.public.*.id, count.index)
vpc_security_group_ids = [aws_security_group.controller.id] vpc_security_group_ids = [aws_security_group.controller.id]
lifecycle { lifecycle {
@ -63,6 +62,7 @@ data "template_file" "controller-configs" {
vars = { vars = {
# Cannot use cyclic dependencies on controllers or their DNS records # Cannot use cyclic dependencies on controllers or their DNS records
etcd_arch = var.arch == "arm64" ? "-arm64" : ""
etcd_name = "etcd${count.index}" etcd_name = "etcd${count.index}"
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}" etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,... # etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...

View File

@ -1,6 +1,6 @@
--- ---
variant: fcos variant: fcos
version: 1.0.0 version: 1.1.0
systemd: systemd:
units: units:
- name: etcd-member.service - name: etcd-member.service
@ -8,28 +8,25 @@ systemd:
contents: | contents: |
[Unit] [Unit]
Description=etcd (System Container) Description=etcd (System Container)
Documentation=https://github.com/coreos/etcd Documentation=https://github.com/etcd-io/etcd
Wants=network-online.target network.target Wants=network-online.target network.target
After=network-online.target After=network-online.target
[Service] [Service]
# https://github.com/opencontainers/runc/pull/1807 Environment=ETCD_IMAGE=quay.io/coreos/etcd:v3.4.14${etcd_arch}
# Type=notify
# NotifyAccess=exec
Type=exec Type=exec
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
ExecStartPre=/bin/mkdir -p /var/lib/etcd ExecStartPre=/bin/mkdir -p /var/lib/etcd
ExecStartPre=-/usr/bin/podman rm etcd ExecStartPre=-/usr/bin/podman rm etcd
#--volume $${NOTIFY_SOCKET}:/run/systemd/notify \
ExecStart=/usr/bin/podman run --name etcd \ ExecStart=/usr/bin/podman run --name etcd \
--env-file /etc/etcd/etcd.env \ --env-file /etc/etcd/etcd.env \
--network host \ --network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \ --volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \ --volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.4.3 $${ETCD_IMAGE}
ExecStop=/usr/bin/podman stop etcd ExecStop=/usr/bin/podman stop etcd
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
- name: docker.service - name: docker.service
@ -38,11 +35,12 @@ systemd:
enabled: true enabled: true
contents: | contents: |
[Unit] [Unit]
Description=Wait for DNS entries Description=Wait for DNS and hostname
Before=kubelet.service Before=kubelet.service
[Service] [Service]
Type=oneshot Type=oneshot
RemainAfterExit=true RemainAfterExit=true
ExecStartPre=/bin/sh -c 'while [ `hostname -s` == "localhost" ]; do sleep 1; done;'
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done' ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install] [Install]
RequiredBy=kubelet.service RequiredBy=kubelet.service
@ -51,9 +49,10 @@ systemd:
enabled: true enabled: true
contents: | contents: |
[Unit] [Unit]
Description=Kubelet via Hyperkube (System Container) Description=Kubelet (System Container)
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.20.0
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
@ -67,23 +66,21 @@ systemd:
--network host \ --network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \ --volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \ --volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \ --volume /lib/modules:/lib/modules:ro \
--volume /run:/run \ --volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \ --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \ --volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \ --volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/calico:/var/lib/calico \
--volume /var/lib/docker:/var/lib/docker \ --volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \ --volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \ --volume /var/log:/var/log \
--volume /var/run:/var/run \
--volume /var/run/lock:/var/run/lock:z \ --volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \ --volume /opt/cni/bin:/opt/cni/bin:z \
k8s.gcr.io/hyperkube:v1.16.3 /hyperkube kubelet \ $${KUBELET_IMAGE} \
--anonymous-auth=false \ --anonymous-auth=false \
--authentication-token-webhook \ --authentication-token-webhook \
--authorization-mode=Webhook \ --authorization-mode=Webhook \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--cgroup-driver=systemd \ --cgroup-driver=systemd \
--cgroups-per-qos=true \ --cgroups-per-qos=true \
--enforce-node-allocatable=pods \ --enforce-node-allocatable=pods \
@ -91,15 +88,14 @@ systemd:
--cluster_dns=${cluster_dns_service_ip} \ --cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \ --cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \ --cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \ --healthz-port=0 \
--kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/var/lib/kubelet/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \ --network-plugin=cni \
--node-labels=node.kubernetes.io/master \
--node-labels=node.kubernetes.io/controller="true" \ --node-labels=node.kubernetes.io/controller="true" \
--pod-manifest-path=/etc/kubernetes/manifests \ --pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \ --read-only-port=0 \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \ --register-with-taints=node-role.kubernetes.io/controller=:NoSchedule \
--rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins --volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/podman stop kubelet ExecStop=-/usr/bin/podman stop kubelet
Delegate=yes Delegate=yes
@ -116,17 +112,20 @@ systemd:
Type=oneshot Type=oneshot
RemainAfterExit=true RemainAfterExit=true
WorkingDirectory=/opt/bootstrap WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*' ExecStartPre=-/usr/bin/podman rm bootstrap
ExecStart=/usr/bin/podman run --name bootstrap \ ExecStart=/usr/bin/podman run --name bootstrap \
--network host \ --network host \
--volume /etc/kubernetes/pki:/etc/kubernetes/pki:ro,z \
--volume /opt/bootstrap/assets:/assets:ro,Z \ --volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \ --volume /opt/bootstrap/apply:/apply:ro,Z \
k8s.gcr.io/hyperkube:v1.16.3 \ --entrypoint=/apply \
/apply quay.io/poseidon/kubelet:v1.20.0
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap ExecStartPost=-/usr/bin/podman stop bootstrap
storage: storage:
directories: directories:
- path: /var/lib/etcd
mode: 0700
- path: /etc/kubernetes - path: /etc/kubernetes
- path: /opt/bootstrap - path: /opt/bootstrap
files: files:
@ -135,12 +134,34 @@ storage:
contents: contents:
inline: | inline: |
${kubeconfig} ${kubeconfig}
- path: /opt/bootstrap/layout
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/pki
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/pki/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
mv auth/* /etc/kubernetes/pki/
mv tls/k8s/* /etc/kubernetes/pki/
mkdir -p /etc/kubernetes/manifests
mv static-manifests/* /etc/kubernetes/manifests/
mkdir -p /opt/bootstrap/assets
mv manifests /opt/bootstrap/assets/manifests
mv manifests-networking/* /opt/bootstrap/assets/manifests/
rm -rf assets auth static-manifests tls manifests-networking
chcon -R -u system_u -t container_file_t /etc/kubernetes/pki
- path: /opt/bootstrap/apply - path: /opt/bootstrap/apply
mode: 0544 mode: 0544
contents: contents:
inline: | inline: |
#!/bin/bash -e #!/bin/bash -e
export KUBECONFIG=/assets/auth/kubeconfig export KUBECONFIG=/etc/kubernetes/pki/admin.conf
until kubectl version; do until kubectl version; do
echo "Waiting for static pod control plane" echo "Waiting for static pod control plane"
sleep 5 sleep 5
@ -149,14 +170,22 @@ storage:
echo "Retry applying manifests" echo "Retry applying manifests"
sleep 5 sleep 5
done done
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.all.rp_filter=1
- path: /etc/sysctl.d/max-user-watches.conf - path: /etc/sysctl.d/max-user-watches.conf
contents: contents:
inline: | inline: |
fs.inotify.max_user_watches=16184 fs.inotify.max_user_watches=16184
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.*.rp_filter=0
- path: /etc/systemd/network/50-flannel.link
contents:
inline: |
[Match]
OriginalName=flannel*
[Link]
MACAddressPolicy=none
- path: /etc/systemd/system.conf.d/accounting.conf - path: /etc/systemd/system.conf.d/accounting.conf
contents: contents:
inline: | inline: |
@ -168,8 +197,6 @@ storage:
mode: 0644 mode: 0644
contents: contents:
inline: | inline: |
# TODO: Use a systemd dropin once podman v1.4.5 is avail.
NOTIFY_SOCKET=/run/systemd/notify
ETCD_NAME=${etcd_name} ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379 ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379
@ -187,6 +214,7 @@ storage:
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true ETCD_PEER_CLIENT_CERT_AUTH=true
ETCD_UNSUPPORTED_ARCH=arm64
passwd: passwd:
users: users:
- name: core - name: core

View File

@ -25,21 +25,23 @@ resource "aws_internet_gateway" "gateway" {
resource "aws_route_table" "default" { resource "aws_route_table" "default" {
vpc_id = aws_vpc.network.id vpc_id = aws_vpc.network.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.gateway.id
}
route {
ipv6_cidr_block = "::/0"
gateway_id = aws_internet_gateway.gateway.id
}
tags = { tags = {
"Name" = var.cluster_name "Name" = var.cluster_name
} }
} }
resource "aws_route" "egress-ipv4" {
route_table_id = aws_route_table.default.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.gateway.id
}
resource "aws_route" "egress-ipv6" {
route_table_id = aws_route_table.default.id
destination_ipv6_cidr_block = "::/0"
gateway_id = aws_internet_gateway.gateway.id
}
# Subnets (one per availability zone) # Subnets (one per availability zone)
resource "aws_subnet" "public" { resource "aws_subnet" "public" {

View File

@ -17,6 +17,7 @@ resource "aws_route53_record" "apiserver" {
resource "aws_lb" "nlb" { resource "aws_lb" "nlb" {
name = "${var.cluster_name}-nlb" name = "${var.cluster_name}-nlb"
load_balancer_type = "network" load_balancer_type = "network"
ip_address_type = "dualstack"
internal = false internal = false
subnets = aws_subnet.public.*.id subnets = aws_subnet.public.*.id

View File

@ -1,5 +1,6 @@
output "kubeconfig-admin" { output "kubeconfig-admin" {
value = module.bootstrap.kubeconfig-admin value = module.bootstrap.kubeconfig-admin
sensitive = true
} }
# Outputs for Kubernetes Ingress # Outputs for Kubernetes Ingress
@ -32,7 +33,8 @@ output "worker_security_groups" {
} }
output "kubeconfig" { output "kubeconfig" {
value = module.bootstrap.kubeconfig-kubelet value = module.bootstrap.kubeconfig-kubelet
sensitive = true
} }
# Outputs for custom load balancing # Outputs for custom load balancing
@ -52,3 +54,10 @@ output "worker_target_group_https" {
value = module.workers.target_group_https value = module.workers.target_group_https
} }
# Outputs for debug
output "assets_dist" {
value = module.bootstrap.assets_dist
sensitive = true
}

View File

@ -13,6 +13,30 @@ resource "aws_security_group" "controller" {
} }
} }
resource "aws_security_group_rule" "controller-icmp" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "icmp"
from_port = 8
to_port = 0
source_security_group_id = aws_security_group.worker.id
}
resource "aws_security_group_rule" "controller-icmp-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "icmp"
from_port = 8
to_port = 0
self = true
}
resource "aws_security_group_rule" "controller-ssh" { resource "aws_security_group_rule" "controller-ssh" {
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
@ -44,28 +68,31 @@ resource "aws_security_group_rule" "controller-etcd-metrics" {
source_security_group_id = aws_security_group.worker.id source_security_group_id = aws_security_group.worker.id
} }
# Allow Prometheus to scrape kube-scheduler resource "aws_security_group_rule" "controller-cilium-health" {
resource "aws_security_group_rule" "controller-scheduler-metrics" { count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
type = "ingress" type = "ingress"
protocol = "tcp" protocol = "tcp"
from_port = 10251 from_port = 4240
to_port = 10251 to_port = 4240
source_security_group_id = aws_security_group.worker.id source_security_group_id = aws_security_group.worker.id
} }
# Allow Prometheus to scrape kube-controller-manager resource "aws_security_group_rule" "controller-cilium-health-self" {
resource "aws_security_group_rule" "controller-manager-metrics" { count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
type = "ingress" type = "ingress"
protocol = "tcp" protocol = "tcp"
from_port = 10252 from_port = 4240
to_port = 10252 to_port = 4240
source_security_group_id = aws_security_group.worker.id self = true
} }
# IANA VXLAN default
resource "aws_security_group_rule" "controller-vxlan" { resource "aws_security_group_rule" "controller-vxlan" {
count = var.networking == "flannel" ? 1 : 0 count = var.networking == "flannel" ? 1 : 0
@ -100,6 +127,31 @@ resource "aws_security_group_rule" "controller-apiserver" {
cidr_blocks = ["0.0.0.0/0"] cidr_blocks = ["0.0.0.0/0"]
} }
# Linux VXLAN default
resource "aws_security_group_rule" "controller-linux-vxlan" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
source_security_group_id = aws_security_group.worker.id
}
resource "aws_security_group_rule" "controller-linux-vxlan-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
self = true
}
# Allow Prometheus to scrape node-exporter daemonset # Allow Prometheus to scrape node-exporter daemonset
resource "aws_security_group_rule" "controller-node-exporter" { resource "aws_security_group_rule" "controller-node-exporter" {
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
@ -111,6 +163,17 @@ resource "aws_security_group_rule" "controller-node-exporter" {
source_security_group_id = aws_security_group.worker.id source_security_group_id = aws_security_group.worker.id
} }
# Allow Prometheus to scrape kube-proxy
resource "aws_security_group_rule" "kube-proxy-metrics" {
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "tcp"
from_port = 10249
to_port = 10249
source_security_group_id = aws_security_group.worker.id
}
# Allow apiserver to access kubelets for exec, log, port-forward # Allow apiserver to access kubelets for exec, log, port-forward
resource "aws_security_group_rule" "controller-kubelet" { resource "aws_security_group_rule" "controller-kubelet" {
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
@ -132,6 +195,28 @@ resource "aws_security_group_rule" "controller-kubelet-self" {
self = true self = true
} }
# Allow Prometheus to scrape kube-scheduler
resource "aws_security_group_rule" "controller-scheduler-metrics" {
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "tcp"
from_port = 10251
to_port = 10251
source_security_group_id = aws_security_group.worker.id
}
# Allow Prometheus to scrape kube-controller-manager
resource "aws_security_group_rule" "controller-manager-metrics" {
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "tcp"
from_port = 10252
to_port = 10252
source_security_group_id = aws_security_group.worker.id
}
resource "aws_security_group_rule" "controller-bgp" { resource "aws_security_group_rule" "controller-bgp" {
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
@ -216,6 +301,30 @@ resource "aws_security_group" "worker" {
} }
} }
resource "aws_security_group_rule" "worker-icmp" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "icmp"
from_port = 8
to_port = 0
source_security_group_id = aws_security_group.controller.id
}
resource "aws_security_group_rule" "worker-icmp-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "icmp"
from_port = 8
to_port = 0
self = true
}
resource "aws_security_group_rule" "worker-ssh" { resource "aws_security_group_rule" "worker-ssh" {
security_group_id = aws_security_group.worker.id security_group_id = aws_security_group.worker.id
@ -246,6 +355,31 @@ resource "aws_security_group_rule" "worker-https" {
cidr_blocks = ["0.0.0.0/0"] cidr_blocks = ["0.0.0.0/0"]
} }
resource "aws_security_group_rule" "worker-cilium-health" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "tcp"
from_port = 4240
to_port = 4240
source_security_group_id = aws_security_group.controller.id
}
resource "aws_security_group_rule" "worker-cilium-health-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "tcp"
from_port = 4240
to_port = 4240
self = true
}
# IANA VXLAN default
resource "aws_security_group_rule" "worker-vxlan" { resource "aws_security_group_rule" "worker-vxlan" {
count = var.networking == "flannel" ? 1 : 0 count = var.networking == "flannel" ? 1 : 0
@ -270,6 +404,31 @@ resource "aws_security_group_rule" "worker-vxlan-self" {
self = true self = true
} }
# Linux VXLAN default
resource "aws_security_group_rule" "worker-linux-vxlan" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
source_security_group_id = aws_security_group.controller.id
}
resource "aws_security_group_rule" "worker-linux-vxlan-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
self = true
}
# Allow Prometheus to scrape node-exporter daemonset # Allow Prometheus to scrape node-exporter daemonset
resource "aws_security_group_rule" "worker-node-exporter" { resource "aws_security_group_rule" "worker-node-exporter" {
security_group_id = aws_security_group.worker.id security_group_id = aws_security_group.worker.id
@ -281,14 +440,15 @@ resource "aws_security_group_rule" "worker-node-exporter" {
self = true self = true
} }
resource "aws_security_group_rule" "ingress-health" { # Allow Prometheus to scrape kube-proxy
resource "aws_security_group_rule" "worker-kube-proxy" {
security_group_id = aws_security_group.worker.id security_group_id = aws_security_group.worker.id
type = "ingress" type = "ingress"
protocol = "tcp" protocol = "tcp"
from_port = 10254 from_port = 10249
to_port = 10254 to_port = 10249
cidr_blocks = ["0.0.0.0/0"] self = true
} }
# Allow apiserver to access kubelets for exec, log, port-forward # Allow apiserver to access kubelets for exec, log, port-forward
@ -313,6 +473,16 @@ resource "aws_security_group_rule" "worker-kubelet-self" {
self = true self = true
} }
resource "aws_security_group_rule" "ingress-health" {
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "tcp"
from_port = 10254
to_port = 10254
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "worker-bgp" { resource "aws_security_group_rule" "worker-bgp" {
security_group_id = aws_security_group.worker.id security_group_id = aws_security_group.worker.id

View File

@ -1,7 +1,16 @@
locals {
# format assets for distribution
assets_bundle = [
# header with the unpack location
for key, value in module.bootstrap.assets_dist :
format("##### %s\n%s", key, value)
]
}
# Secure copy assets to controllers. # Secure copy assets to controllers.
resource "null_resource" "copy-controller-secrets" { resource "null_resource" "copy-controller-secrets" {
count = var.controller_count count = var.controller_count
depends_on = [ depends_on = [
module.bootstrap, module.bootstrap,
] ]
@ -14,63 +23,13 @@ resource "null_resource" "copy-controller-secrets" {
} }
provisioner "file" { provisioner "file" {
content = module.bootstrap.etcd_ca_cert content = join("\n", local.assets_bundle)
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_cert
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_key
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = module.bootstrap.etcd_server_cert
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_server_key
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_cert
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_key
destination = "$HOME/etcd-peer.key"
}
provisioner "file" {
source = var.asset_dir
destination = "$HOME/assets" destination = "$HOME/assets"
} }
provisioner "remote-exec" { provisioner "remote-exec" {
inline = [ inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd", "sudo /opt/bootstrap/layout",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
"sudo mv $HOME/assets /opt/bootstrap/assets",
"sudo mkdir -p /etc/kubernetes/manifests",
"sudo mkdir -p /etc/kubernetes/bootstrap-secrets",
"sudo cp -r /opt/bootstrap/assets/tls/* /etc/kubernetes/bootstrap-secrets/",
"sudo cp /opt/bootstrap/assets/auth/kubeconfig /etc/kubernetes/bootstrap-secrets/",
"sudo cp -r /opt/bootstrap/assets/static-manifests/* /etc/kubernetes/manifests/"
] ]
} }
} }

View File

@ -41,10 +41,15 @@ variable "worker_type" {
default = "t3.small" default = "t3.small"
} }
variable "os_image" { variable "os_stream" {
type = string type = string
description = "AMI channel for Fedora CoreOS (not yet used)" description = "Fedora CoreOS image stream for instances (e.g. stable, testing, next)"
default = "coreos-stable" default = "stable"
validation {
condition = contains(["stable", "testing", "next"], var.os_stream)
error_message = "The os_stream must be stable, testing, or next."
}
} }
variable "disk_size" { variable "disk_size" {
@ -96,11 +101,6 @@ variable "ssh_authorized_key" {
description = "SSH public key for user 'core'" description = "SSH public key for user 'core'"
} }
variable "asset_dir" {
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
}
variable "networking" { variable "networking" {
type = string type = string
description = "Choice of networking provider (calico or flannel)" description = "Choice of networking provider (calico or flannel)"
@ -126,37 +126,54 @@ variable "pod_cidr" {
} }
variable "service_cidr" { variable "service_cidr" {
type = string type = string
description = <<EOD description = <<EOD
CIDR IPv4 range to assign Kubernetes services. CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns. The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD EOD
default = "10.3.0.0/16" default = "10.3.0.0/16"
} }
variable "enable_reporting" { variable "enable_reporting" {
type = bool type = bool
description = "Enable usage or analytics reporting to upstreams (Calico)" description = "Enable usage or analytics reporting to upstreams (Calico)"
default = false default = false
} }
variable "enable_aggregation" { variable "enable_aggregation" {
type = bool type = bool
description = "Enable the Kubernetes Aggregation Layer (defaults to false)" description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
default = false default = false
} }
variable "worker_node_labels" { variable "worker_node_labels" {
type = list(string) type = list(string)
description = "List of initial worker node labels" description = "List of initial worker node labels"
default = [] default = []
} }
# unofficial, undocumented, unsupported # unofficial, undocumented, unsupported
variable "cluster_domain_suffix" { variable "cluster_domain_suffix" {
type = string type = string
description = "Queries for domains with the suffix will be answered by CoreDNS. Default is cluster.local (e.g. foo.default.svc.cluster.local)" description = "Queries for domains with the suffix will be answered by CoreDNS. Default is cluster.local (e.g. foo.default.svc.cluster.local)"
default = "cluster.local" default = "cluster.local"
}
variable "arch" {
type = string
description = "Container architecture (amd64 or arm64)"
default = "amd64"
validation {
condition = var.arch == "amd64" || var.arch == "arm64"
error_message = "The arch must be amd64 or arm64."
}
}
variable "daemonset_tolerations" {
type = list(string)
description = "List of additional taint keys kube-system DaemonSets should tolerate (e.g. ['custom-role', 'gpu-role'])"
default = []
} }

View File

@ -1,11 +1,15 @@
# Terraform version and plugin versions # Terraform version and plugin versions
terraform { terraform {
required_version = "~> 0.12.0" required_version = "~> 0.13.0"
required_providers { required_providers {
aws = "~> 2.23" aws = ">= 2.23, <= 4.0"
ct = "~> 0.4"
template = "~> 2.1" template = "~> 2.1"
null = "~> 2.1" null = "~> 2.1"
ct = {
source = "poseidon/ct"
version = "~> 0.6"
}
} }
} }

View File

@ -8,7 +8,8 @@ module "workers" {
security_groups = [aws_security_group.worker.id] security_groups = [aws_security_group.worker.id]
worker_count = var.worker_count worker_count = var.worker_count
instance_type = var.worker_type instance_type = var.worker_type
os_image = var.os_image os_stream = var.os_stream
arch = var.arch
disk_size = var.disk_size disk_size = var.disk_size
spot_price = var.worker_price spot_price = var.worker_price
target_groups = var.worker_target_groups target_groups = var.worker_target_groups

View File

@ -14,10 +14,33 @@ data "aws_ami" "fedora-coreos" {
} }
filter { filter {
name = "name" name = "description"
values = ["fedora-coreos-30.*.*-hvm"] values = ["Fedora CoreOS ${var.os_stream} *"]
} }
# try to filter out dev images (AWS filters can't)
name_regex = "^fedora-coreos-30.[0-9]*.[0-9]*-hvm*"
} }
# Experimental Fedora CoreOS arm64 / aarch64 AMIs from Poseidon
# WARNING: These AMIs will be removed when Fedora CoreOS publishes arm64 AMIs
# and may be removed for any reason before then as well. Do not use.
data "aws_ami" "fedora-coreos-arm" {
count = var.arch == "arm64" ? 1 : 0
most_recent = true
owners = ["099663496933"]
filter {
name = "architecture"
values = ["arm64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["fedora-coreos-*"]
}
}

View File

@ -1,6 +1,6 @@
--- ---
variant: fcos variant: fcos
version: 1.0.0 version: 1.1.0
systemd: systemd:
units: units:
- name: docker.service - name: docker.service
@ -9,11 +9,12 @@ systemd:
enabled: true enabled: true
contents: | contents: |
[Unit] [Unit]
Description=Wait for DNS entries Description=Wait for DNS and hostname
Before=kubelet.service Before=kubelet.service
[Service] [Service]
Type=oneshot Type=oneshot
RemainAfterExit=true RemainAfterExit=true
ExecStartPre=/bin/sh -c 'while [ `hostname -s` == "localhost" ]; do sleep 1; done;'
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done' ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install] [Install]
RequiredBy=kubelet.service RequiredBy=kubelet.service
@ -21,9 +22,10 @@ systemd:
enabled: true enabled: true
contents: | contents: |
[Unit] [Unit]
Description=Kubelet via Hyperkube (System Container) Description=Kubelet (System Container)
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.20.0
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
@ -37,23 +39,21 @@ systemd:
--network host \ --network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \ --volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \ --volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \ --volume /lib/modules:/lib/modules:ro \
--volume /run:/run \ --volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \ --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \ --volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \ --volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/calico:/var/lib/calico \
--volume /var/lib/docker:/var/lib/docker \ --volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \ --volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \ --volume /var/log:/var/log \
--volume /var/run:/var/run \
--volume /var/run/lock:/var/run/lock:z \ --volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \ --volume /opt/cni/bin:/opt/cni/bin:z \
k8s.gcr.io/hyperkube:v1.16.3 /hyperkube kubelet \ $${KUBELET_IMAGE} \
--anonymous-auth=false \ --anonymous-auth=false \
--authentication-token-webhook \ --authentication-token-webhook \
--authorization-mode=Webhook \ --authorization-mode=Webhook \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--cgroup-driver=systemd \ --cgroup-driver=systemd \
--cgroups-per-qos=true \ --cgroups-per-qos=true \
--enforce-node-allocatable=pods \ --enforce-node-allocatable=pods \
@ -61,16 +61,19 @@ systemd:
--cluster_dns=${cluster_dns_service_ip} \ --cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \ --cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \ --cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \ --healthz-port=0 \
--kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/var/lib/kubelet/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \ --network-plugin=cni \
--node-labels=node.kubernetes.io/node \ --node-labels=node.kubernetes.io/node \
%{ for label in split(",", node_labels) } %{~ for label in split(",", node_labels) ~}
--node-labels=${label} \ --node-labels=${label} \
%{ endfor ~} %{~ endfor ~}
%{~ for taint in split(",", node_taints) ~}
--register-with-taints=${taint} \
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \ --pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \ --read-only-port=0 \
--rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins --volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/podman stop kubelet ExecStop=-/usr/bin/podman stop kubelet
Delegate=yes Delegate=yes
@ -78,6 +81,19 @@ systemd:
RestartSec=10 RestartSec=10
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
- name: delete-node.service
enabled: true
contents: |
[Unit]
Description=Delete Kubernetes node on shutdown
[Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.20.0
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash -c '/usr/bin/podman run --volume /var/lib/kubelet:/var/lib/kubelet:ro,z --entrypoint /usr/local/bin/kubectl $${KUBELET_IMAGE} --kubeconfig=/var/lib/kubelet/kubeconfig delete node $HOSTNAME'
[Install]
WantedBy=multi-user.target
storage: storage:
directories: directories:
- path: /etc/kubernetes - path: /etc/kubernetes
@ -87,14 +103,22 @@ storage:
contents: contents:
inline: | inline: |
${kubeconfig} ${kubeconfig}
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.all.rp_filter=1
- path: /etc/sysctl.d/max-user-watches.conf - path: /etc/sysctl.d/max-user-watches.conf
contents: contents:
inline: | inline: |
fs.inotify.max_user_watches=16184 fs.inotify.max_user_watches=16184
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.*.rp_filter=0
- path: /etc/systemd/network/50-flannel.link
contents:
inline: |
[Match]
OriginalName=flannel*
[Link]
MACAddressPolicy=none
- path: /etc/systemd/system.conf.d/accounting.conf - path: /etc/systemd/system.conf.d/accounting.conf
contents: contents:
inline: | inline: |

View File

@ -34,10 +34,15 @@ variable "instance_type" {
default = "t3.small" default = "t3.small"
} }
variable "os_image" { variable "os_stream" {
type = string type = string
description = "AMI channel for Fedora CoreOS (not yet used)" description = "Fedora CoreOS image stream for instances (e.g. stable, testing, next)"
default = "coreos-stable" default = "stable"
validation {
condition = contains(["stable", "testing", "next"], var.os_stream)
error_message = "The os_stream must be stable, testing, or next."
}
} }
variable "disk_size" { variable "disk_size" {
@ -89,22 +94,41 @@ variable "ssh_authorized_key" {
} }
variable "service_cidr" { variable "service_cidr" {
type = string type = string
description = <<EOD description = <<EOD
CIDR IPv4 range to assign Kubernetes services. CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns. The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD EOD
default = "10.3.0.0/16" default = "10.3.0.0/16"
} }
variable "cluster_domain_suffix" { variable "cluster_domain_suffix" {
type = string type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) " description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
default = "cluster.local" default = "cluster.local"
} }
variable "node_labels" { variable "node_labels" {
type = list(string) type = list(string)
description = "List of initial node labels" description = "List of initial node labels"
default = [] default = []
}
variable "node_taints" {
type = list(string)
description = "List of initial node taints"
default = []
}
# unofficial, undocumented, unsupported
variable "arch" {
type = string
description = "Container architecture (amd64 or arm64)"
default = "amd64"
validation {
condition = var.arch == "amd64" || var.arch == "arm64"
error_message = "The arch must be amd64 or arm64."
}
} }

View File

@ -1,4 +1,14 @@
# Terraform version and plugin versions
terraform { terraform {
required_version = ">= 0.12" required_version = "~> 0.13.0"
required_providers {
aws = ">= 2.23, <= 4.0"
template = "~> 2.1"
ct = {
source = "poseidon/ct"
version = "~> 0.6"
}
}
} }

View File

@ -44,7 +44,7 @@ resource "aws_autoscaling_group" "workers" {
# Worker template # Worker template
resource "aws_launch_configuration" "worker" { resource "aws_launch_configuration" "worker" {
image_id = data.aws_ami.fedora-coreos.image_id image_id = var.arch == "arm64" ? data.aws_ami.fedora-coreos-arm[0].image_id : data.aws_ami.fedora-coreos.image_id
instance_type = var.instance_type instance_type = var.instance_type
spot_price = var.spot_price > 0 ? var.spot_price : null spot_price = var.spot_price > 0 ? var.spot_price : null
enable_monitoring = false enable_monitoring = false
@ -71,9 +71,9 @@ resource "aws_launch_configuration" "worker" {
# Worker Ignition config # Worker Ignition config
data "ct_config" "worker-ignition" { data "ct_config" "worker-ignition" {
content = data.template_file.worker-config.rendered content = data.template_file.worker-config.rendered
strict = true strict = true
snippets = var.snippets snippets = var.snippets
} }
# Worker Fedora CoreOS config # Worker Fedora CoreOS config
@ -86,6 +86,7 @@ data "template_file" "worker-config" {
cluster_dns_service_ip = cidrhost(var.service_cidr, 10) cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix cluster_domain_suffix = var.cluster_domain_suffix
node_labels = join(",", var.node_labels) node_labels = join(",", var.node_labels)
node_taints = join(",", var.node_taints)
} }
} }

View File

@ -11,13 +11,13 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.3 (upstream) * Kubernetes v1.20.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/cl/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/flatcar-linux/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, CSI, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs ## Docs
Please see the [official docs](https://typhoon.psdn.io) and the Google Cloud [tutorial](https://typhoon.psdn.io/cl/google-cloud/). Please see the [official docs](https://typhoon.psdn.io) and the AWS [tutorial](https://typhoon.psdn.io/flatcar-linux/aws/).

View File

@ -0,0 +1,27 @@
locals {
# Pick a Flatcar Linux AMI
# flatcar-stable -> Flatcar Linux AMI
ami_id = data.aws_ami.flatcar.image_id
channel = split("-", var.os_image)[1]
}
data "aws_ami" "flatcar" {
most_recent = true
owners = ["075585003325"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-${local.channel}-*"]
}
}

View File

@ -1,11 +1,10 @@
# Kubernetes assets (kubeconfig, manifests) # Kubernetes assets (kubeconfig, manifests)
module "bootstrap" { module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=0daa1276c633fea28e41b2c2c18831e2584deb24" source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=4edd79dd0295e6ffa4c8ed04fd5914d5cb3f1b7c"
cluster_name = var.cluster_name cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)] api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]
etcd_servers = aws_route53_record.etcds.*.fqdn etcd_servers = aws_route53_record.etcds.*.fqdn
asset_dir = var.asset_dir
networking = var.networking networking = var.networking
network_mtu = var.network_mtu network_mtu = var.network_mtu
pod_cidr = var.pod_cidr pod_cidr = var.pod_cidr

View File

@ -0,0 +1,213 @@
---
systemd:
units:
- name: etcd-member.service
enabled: true
contents: |
[Unit]
Description=etcd (System Container)
Documentation=https://github.com/etcd-io/etcd
Requires=docker.service
After=docker.service
[Service]
Environment=ETCD_IMAGE=quay.io/coreos/etcd:v3.4.14
ExecStartPre=/usr/bin/docker run -d \
--name etcd \
--network host \
--env-file /etc/etcd/etcd.env \
--user 232:232 \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro \
--volume /var/lib/etcd:/var/lib/etcd:rw \
$${ETCD_IMAGE}
ExecStart=docker logs -f etcd
ExecStop=docker stop etcd
ExecStopPost=docker rm etcd
Restart=always
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
- name: docker.service
enabled: true
- name: locksmithd.service
mask: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet (System Container)
Requires=docker.service
After=docker.service
Wants=rpc-statd.service
[Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.20.0
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=/usr/bin/docker run -d \
--name kubelet \
--privileged \
--pid host \
--network host \
-v /etc/kubernetes:/etc/kubernetes:ro \
-v /etc/machine-id:/etc/machine-id:ro \
-v /usr/lib/os-release:/etc/os-release:ro \
-v /lib/modules:/lib/modules:ro \
-v /run:/run \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-v /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
-v /var/lib/calico:/var/lib/calico:ro \
-v /var/lib/docker:/var/lib/docker \
-v /var/lib/kubelet:/var/lib/kubelet:rshared \
-v /var/log:/var/log \
-v /opt/cni/bin:/opt/cni/bin \
$${KUBELET_IMAGE} \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--cgroup-driver=$${KUBELET_CGROUP_DRIVER} \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--healthz-port=0 \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--network-plugin=cni \
--node-labels=node.kubernetes.io/controller="true" \
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--register-with-taints=node-role.kubernetes.io/controller=:NoSchedule \
--rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStart=docker logs -f kubelet
ExecStop=docker stop kubelet
ExecStopPost=docker rm kubelet
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: bootstrap.service
contents: |
[Unit]
Description=Kubernetes control plane
Wants=docker.service
After=docker.service
ConditionPathExists=!/opt/bootstrap/bootstrap.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.20.0
ExecStart=/usr/bin/docker run \
-v /etc/kubernetes/pki:/etc/kubernetes/pki:ro \
-v /opt/bootstrap/assets:/assets:ro \
-v /opt/bootstrap/apply:/apply:ro \
--entrypoint=/apply \
$${KUBELET_IMAGE}
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
[Install]
WantedBy=multi-user.target
storage:
directories:
- path: /var/lib/etcd
filesystem: root
mode: 0700
overwrite: true
files:
- path: /etc/kubernetes/kubeconfig
filesystem: root
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /opt/bootstrap/layout
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/pki
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/pki/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
chmod -R 700 /var/lib/etcd
mv auth/* /etc/kubernetes/pki/
mv tls/k8s/* /etc/kubernetes/pki/
mkdir -p /etc/kubernetes/manifests
mv static-manifests/* /etc/kubernetes/manifests/
mkdir -p /opt/bootstrap/assets
mv manifests /opt/bootstrap/assets/manifests
mv manifests-networking/* /opt/bootstrap/assets/manifests/
rm -rf assets auth static-manifests tls manifests-networking
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/etc/kubernetes/pki/admin.conf
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5
done
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
mode: 0644
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/etcd/etcd.env
filesystem: root
mode: 0644
contents:
inline: |
ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381
ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}
ETCD_STRICT_RECONFIG_CHECK=true
ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt
ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt
ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true
passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_key}"

View File

@ -10,7 +10,7 @@ resource "aws_route53_record" "etcds" {
ttl = 300 ttl = 300
# private IPv4 address for etcd # private IPv4 address for etcd
records = [element(aws_instance.controllers.*.private_ip, count.index)] records = [aws_instance.controllers.*.private_ip[count.index]]
} }
# Controller instances # Controller instances
@ -24,7 +24,7 @@ resource "aws_instance" "controllers" {
instance_type = var.controller_type instance_type = var.controller_type
ami = local.ami_id ami = local.ami_id
user_data = element(data.ct_config.controller-ignitions.*.rendered, count.index) user_data = data.ct_config.controller-ignitions.*.rendered[count.index]
# storage # storage
root_block_device { root_block_device {
@ -49,20 +49,17 @@ resource "aws_instance" "controllers" {
# Controller Ignition configs # Controller Ignition configs
data "ct_config" "controller-ignitions" { data "ct_config" "controller-ignitions" {
count = var.controller_count count = var.controller_count
content = element( content = data.template_file.controller-configs.*.rendered[count.index]
data.template_file.controller-configs.*.rendered, strict = true
count.index, snippets = var.controller_snippets
)
pretty_print = false
snippets = var.controller_clc_snippets
} }
# Controller Container Linux configs # Controller Container Linux configs
data "template_file" "controller-configs" { data "template_file" "controller-configs" {
count = var.controller_count count = var.controller_count
template = file("${path.module}/cl/controller.yaml.tmpl") template = file("${path.module}/cl/controller.yaml")
vars = { vars = {
# Cannot use cyclic dependencies on controllers or their DNS records # Cannot use cyclic dependencies on controllers or their DNS records
@ -70,7 +67,7 @@ data "template_file" "controller-configs" {
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}" etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,... # etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
etcd_initial_cluster = join(",", data.template_file.etcds.*.rendered) etcd_initial_cluster = join(",", data.template_file.etcds.*.rendered)
cgroup_driver = local.flavor == "flatcar" && local.channel == "edge" ? "systemd" : "cgroupfs" cgroup_driver = local.channel == "edge" ? "systemd" : "cgroupfs"
kubeconfig = indent(10, module.bootstrap.kubeconfig-kubelet) kubeconfig = indent(10, module.bootstrap.kubeconfig-kubelet)
ssh_authorized_key = var.ssh_authorized_key ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10) cluster_dns_service_ip = cidrhost(var.service_cidr, 10)

View File

@ -25,21 +25,23 @@ resource "aws_internet_gateway" "gateway" {
resource "aws_route_table" "default" { resource "aws_route_table" "default" {
vpc_id = aws_vpc.network.id vpc_id = aws_vpc.network.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.gateway.id
}
route {
ipv6_cidr_block = "::/0"
gateway_id = aws_internet_gateway.gateway.id
}
tags = { tags = {
"Name" = var.cluster_name "Name" = var.cluster_name
} }
} }
resource "aws_route" "egress-ipv4" {
route_table_id = aws_route_table.default.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.gateway.id
}
resource "aws_route" "egress-ipv6" {
route_table_id = aws_route_table.default.id
destination_ipv6_cidr_block = "::/0"
gateway_id = aws_internet_gateway.gateway.id
}
# Subnets (one per availability zone) # Subnets (one per availability zone)
resource "aws_subnet" "public" { resource "aws_subnet" "public" {
@ -62,6 +64,6 @@ resource "aws_route_table_association" "public" {
count = length(data.aws_availability_zones.all.names) count = length(data.aws_availability_zones.all.names)
route_table_id = aws_route_table.default.id route_table_id = aws_route_table.default.id
subnet_id = element(aws_subnet.public.*.id, count.index) subnet_id = aws_subnet.public.*.id[count.index]
} }

View File

@ -17,6 +17,7 @@ resource "aws_route53_record" "apiserver" {
resource "aws_lb" "nlb" { resource "aws_lb" "nlb" {
name = "${var.cluster_name}-nlb" name = "${var.cluster_name}-nlb"
load_balancer_type = "network" load_balancer_type = "network"
ip_address_type = "dualstack"
internal = false internal = false
subnets = aws_subnet.public.*.id subnets = aws_subnet.public.*.id
@ -88,7 +89,7 @@ resource "aws_lb_target_group_attachment" "controllers" {
count = var.controller_count count = var.controller_count
target_group_arn = aws_lb_target_group.controllers.arn target_group_arn = aws_lb_target_group.controllers.arn
target_id = element(aws_instance.controllers.*.id, count.index) target_id = aws_instance.controllers.*.id[count.index]
port = 6443 port = 6443
} }

View File

@ -1,5 +1,6 @@
output "kubeconfig-admin" { output "kubeconfig-admin" {
value = module.bootstrap.kubeconfig-admin value = module.bootstrap.kubeconfig-admin
sensitive = true
} }
# Outputs for Kubernetes Ingress # Outputs for Kubernetes Ingress
@ -32,7 +33,8 @@ output "worker_security_groups" {
} }
output "kubeconfig" { output "kubeconfig" {
value = module.bootstrap.kubeconfig-kubelet value = module.bootstrap.kubeconfig-kubelet
sensitive = true
} }
# Outputs for custom load balancing # Outputs for custom load balancing
@ -52,3 +54,10 @@ output "worker_target_group_https" {
value = module.workers.target_group_https value = module.workers.target_group_https
} }
# Outputs for debug
output "assets_dist" {
value = module.bootstrap.assets_dist
sensitive = true
}

View File

@ -13,6 +13,30 @@ resource "aws_security_group" "controller" {
} }
} }
resource "aws_security_group_rule" "controller-icmp" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "icmp"
from_port = 8
to_port = 0
source_security_group_id = aws_security_group.worker.id
}
resource "aws_security_group_rule" "controller-icmp-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "icmp"
from_port = 8
to_port = 0
self = true
}
resource "aws_security_group_rule" "controller-ssh" { resource "aws_security_group_rule" "controller-ssh" {
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
@ -33,28 +57,6 @@ resource "aws_security_group_rule" "controller-etcd" {
self = true self = true
} }
# Allow Prometheus to scrape kube-scheduler
resource "aws_security_group_rule" "controller-scheduler-metrics" {
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "tcp"
from_port = 10251
to_port = 10251
source_security_group_id = aws_security_group.worker.id
}
# Allow Prometheus to scrape kube-controller-manager
resource "aws_security_group_rule" "controller-manager-metrics" {
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "tcp"
from_port = 10252
to_port = 10252
source_security_group_id = aws_security_group.worker.id
}
# Allow Prometheus to scrape etcd metrics # Allow Prometheus to scrape etcd metrics
resource "aws_security_group_rule" "controller-etcd-metrics" { resource "aws_security_group_rule" "controller-etcd-metrics" {
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
@ -66,6 +68,31 @@ resource "aws_security_group_rule" "controller-etcd-metrics" {
source_security_group_id = aws_security_group.worker.id source_security_group_id = aws_security_group.worker.id
} }
resource "aws_security_group_rule" "controller-cilium-health" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "tcp"
from_port = 4240
to_port = 4240
source_security_group_id = aws_security_group.worker.id
}
resource "aws_security_group_rule" "controller-cilium-health-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "tcp"
from_port = 4240
to_port = 4240
self = true
}
# IANA VXLAN default
resource "aws_security_group_rule" "controller-vxlan" { resource "aws_security_group_rule" "controller-vxlan" {
count = var.networking == "flannel" ? 1 : 0 count = var.networking == "flannel" ? 1 : 0
@ -100,6 +127,31 @@ resource "aws_security_group_rule" "controller-apiserver" {
cidr_blocks = ["0.0.0.0/0"] cidr_blocks = ["0.0.0.0/0"]
} }
# Linux VXLAN default
resource "aws_security_group_rule" "controller-linux-vxlan" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
source_security_group_id = aws_security_group.worker.id
}
resource "aws_security_group_rule" "controller-linux-vxlan-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
self = true
}
# Allow Prometheus to scrape node-exporter daemonset # Allow Prometheus to scrape node-exporter daemonset
resource "aws_security_group_rule" "controller-node-exporter" { resource "aws_security_group_rule" "controller-node-exporter" {
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
@ -111,6 +163,17 @@ resource "aws_security_group_rule" "controller-node-exporter" {
source_security_group_id = aws_security_group.worker.id source_security_group_id = aws_security_group.worker.id
} }
# Allow Prometheus to scrape kube-proxy
resource "aws_security_group_rule" "kube-proxy-metrics" {
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "tcp"
from_port = 10249
to_port = 10249
source_security_group_id = aws_security_group.worker.id
}
# Allow apiserver to access kubelets for exec, log, port-forward # Allow apiserver to access kubelets for exec, log, port-forward
resource "aws_security_group_rule" "controller-kubelet" { resource "aws_security_group_rule" "controller-kubelet" {
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
@ -132,6 +195,28 @@ resource "aws_security_group_rule" "controller-kubelet-self" {
self = true self = true
} }
# Allow Prometheus to scrape kube-scheduler
resource "aws_security_group_rule" "controller-scheduler-metrics" {
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "tcp"
from_port = 10251
to_port = 10251
source_security_group_id = aws_security_group.worker.id
}
# Allow Prometheus to scrape kube-controller-manager
resource "aws_security_group_rule" "controller-manager-metrics" {
security_group_id = aws_security_group.controller.id
type = "ingress"
protocol = "tcp"
from_port = 10252
to_port = 10252
source_security_group_id = aws_security_group.worker.id
}
resource "aws_security_group_rule" "controller-bgp" { resource "aws_security_group_rule" "controller-bgp" {
security_group_id = aws_security_group.controller.id security_group_id = aws_security_group.controller.id
@ -216,6 +301,30 @@ resource "aws_security_group" "worker" {
} }
} }
resource "aws_security_group_rule" "worker-icmp" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "icmp"
from_port = 8
to_port = 0
source_security_group_id = aws_security_group.controller.id
}
resource "aws_security_group_rule" "worker-icmp-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "icmp"
from_port = 8
to_port = 0
self = true
}
resource "aws_security_group_rule" "worker-ssh" { resource "aws_security_group_rule" "worker-ssh" {
security_group_id = aws_security_group.worker.id security_group_id = aws_security_group.worker.id
@ -246,6 +355,31 @@ resource "aws_security_group_rule" "worker-https" {
cidr_blocks = ["0.0.0.0/0"] cidr_blocks = ["0.0.0.0/0"]
} }
resource "aws_security_group_rule" "worker-cilium-health" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "tcp"
from_port = 4240
to_port = 4240
source_security_group_id = aws_security_group.controller.id
}
resource "aws_security_group_rule" "worker-cilium-health-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "tcp"
from_port = 4240
to_port = 4240
self = true
}
# IANA VXLAN default
resource "aws_security_group_rule" "worker-vxlan" { resource "aws_security_group_rule" "worker-vxlan" {
count = var.networking == "flannel" ? 1 : 0 count = var.networking == "flannel" ? 1 : 0
@ -270,6 +404,31 @@ resource "aws_security_group_rule" "worker-vxlan-self" {
self = true self = true
} }
# Linux VXLAN default
resource "aws_security_group_rule" "worker-linux-vxlan" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
source_security_group_id = aws_security_group.controller.id
}
resource "aws_security_group_rule" "worker-linux-vxlan-self" {
count = var.networking == "cilium" ? 1 : 0
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
self = true
}
# Allow Prometheus to scrape node-exporter daemonset # Allow Prometheus to scrape node-exporter daemonset
resource "aws_security_group_rule" "worker-node-exporter" { resource "aws_security_group_rule" "worker-node-exporter" {
security_group_id = aws_security_group.worker.id security_group_id = aws_security_group.worker.id
@ -281,14 +440,15 @@ resource "aws_security_group_rule" "worker-node-exporter" {
self = true self = true
} }
resource "aws_security_group_rule" "ingress-health" { # Allow Prometheus to scrape kube-proxy
resource "aws_security_group_rule" "worker-kube-proxy" {
security_group_id = aws_security_group.worker.id security_group_id = aws_security_group.worker.id
type = "ingress" type = "ingress"
protocol = "tcp" protocol = "tcp"
from_port = 10254 from_port = 10249
to_port = 10254 to_port = 10249
cidr_blocks = ["0.0.0.0/0"] self = true
} }
# Allow apiserver to access kubelets for exec, log, port-forward # Allow apiserver to access kubelets for exec, log, port-forward
@ -313,6 +473,16 @@ resource "aws_security_group_rule" "worker-kubelet-self" {
self = true self = true
} }
resource "aws_security_group_rule" "ingress-health" {
security_group_id = aws_security_group.worker.id
type = "ingress"
protocol = "tcp"
from_port = 10254
to_port = 10254
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "worker-bgp" { resource "aws_security_group_rule" "worker-bgp" {
security_group_id = aws_security_group.worker.id security_group_id = aws_security_group.worker.id

View File

@ -0,0 +1,58 @@
locals {
# format assets for distribution
assets_bundle = [
# header with the unpack location
for key, value in module.bootstrap.assets_dist :
format("##### %s\n%s", key, value)
]
}
# Secure copy assets to controllers.
resource "null_resource" "copy-controller-secrets" {
count = var.controller_count
depends_on = [
module.bootstrap,
]
connection {
type = "ssh"
host = aws_instance.controllers.*.public_ip[count.index]
user = "core"
timeout = "15m"
}
provisioner "file" {
content = join("\n", local.assets_bundle)
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo /opt/bootstrap/layout",
]
}
}
# Connect to a controller to perform one-time cluster bootstrap.
resource "null_resource" "bootstrap" {
depends_on = [
null_resource.copy-controller-secrets,
module.workers,
aws_route53_record.apiserver,
]
connection {
type = "ssh"
host = aws_instance.controllers[0].public_ip
user = "core"
timeout = "15m"
}
provisioner "remote-exec" {
inline = [
"sudo systemctl start bootstrap",
]
}
}

View File

@ -43,8 +43,13 @@ variable "worker_type" {
variable "os_image" { variable "os_image" {
type = string type = string
description = "AMI channel for a Container Linux derivative (coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)" description = "AMI channel for a Container Linux derivative (flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)"
default = "coreos-stable" default = "flatcar-stable"
validation {
condition = contains(["flatcar-stable", "flatcar-beta", "flatcar-alpha", "flatcar-edge"], var.os_image)
error_message = "The os_image must be flatcar-stable, flatcar-beta, flatcar-alpha, or flatcar-edge."
}
} }
variable "disk_size" { variable "disk_size" {
@ -77,13 +82,13 @@ variable "worker_target_groups" {
default = [] default = []
} }
variable "controller_clc_snippets" { variable "controller_snippets" {
type = list(string) type = list(string)
description = "Controller Container Linux Config snippets" description = "Controller Container Linux Config snippets"
default = [] default = []
} }
variable "worker_clc_snippets" { variable "worker_snippets" {
type = list(string) type = list(string)
description = "Worker Container Linux Config snippets" description = "Worker Container Linux Config snippets"
default = [] default = []
@ -96,11 +101,6 @@ variable "ssh_authorized_key" {
description = "SSH public key for user 'core'" description = "SSH public key for user 'core'"
} }
variable "asset_dir" {
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
}
variable "networking" { variable "networking" {
type = string type = string
description = "Choice of networking provider (calico or flannel)" description = "Choice of networking provider (calico or flannel)"
@ -126,37 +126,37 @@ variable "pod_cidr" {
} }
variable "service_cidr" { variable "service_cidr" {
type = string type = string
description = <<EOD description = <<EOD
CIDR IPv4 range to assign Kubernetes services. CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns. The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD EOD
default = "10.3.0.0/16" default = "10.3.0.0/16"
} }
variable "enable_reporting" { variable "enable_reporting" {
type = bool type = bool
description = "Enable usage or analytics reporting to upstreams (Calico)" description = "Enable usage or analytics reporting to upstreams (Calico)"
default = false default = false
} }
variable "enable_aggregation" { variable "enable_aggregation" {
type = bool type = bool
description = "Enable the Kubernetes Aggregation Layer (defaults to false)" description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
default = false default = false
} }
variable "worker_node_labels" { variable "worker_node_labels" {
type = list(string) type = list(string)
description = "List of initial worker node labels" description = "List of initial worker node labels"
default = [] default = []
} }
# unofficial, undocumented, unsupported # unofficial, undocumented, unsupported
variable "cluster_domain_suffix" { variable "cluster_domain_suffix" {
type = string type = string
description = "Queries for domains with the suffix will be answered by CoreDNS. Default is cluster.local (e.g. foo.default.svc.cluster.local)" description = "Queries for domains with the suffix will be answered by CoreDNS. Default is cluster.local (e.g. foo.default.svc.cluster.local)"
default = "cluster.local" default = "cluster.local"
} }

View File

@ -0,0 +1,15 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.13.0"
required_providers {
aws = ">= 2.23, <= 4.0"
template = "~> 2.1"
null = "~> 2.1"
ct = {
source = "poseidon/ct"
version = "~> 0.6"
}
}
}

View File

@ -18,7 +18,7 @@ module "workers" {
ssh_authorized_key = var.ssh_authorized_key ssh_authorized_key = var.ssh_authorized_key
service_cidr = var.service_cidr service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix cluster_domain_suffix = var.cluster_domain_suffix
clc_snippets = var.worker_clc_snippets snippets = var.worker_snippets
node_labels = var.worker_node_labels node_labels = var.worker_node_labels
} }

View File

@ -0,0 +1,27 @@
locals {
# Pick a Flatcar Linux AMI
# flatcar-stable -> Flatcar Linux AMI
ami_id = data.aws_ami.flatcar.image_id
channel = split("-", var.os_image)[1]
}
data "aws_ami" "flatcar" {
most_recent = true
owners = ["075585003325"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-${local.channel}-*"]
}
}

View File

@ -2,11 +2,11 @@
systemd: systemd:
units: units:
- name: docker.service - name: docker.service
enable: true enabled: true
- name: locksmithd.service - name: locksmithd.service
mask: true mask: true
- name: wait-for-dns.service - name: wait-for-dns.service
enable: true enabled: true
contents: | contents: |
[Unit] [Unit]
Description=Wait for DNS entries Description=Wait for DNS entries
@ -19,69 +19,81 @@ systemd:
[Install] [Install]
RequiredBy=kubelet.service RequiredBy=kubelet.service
- name: kubelet.service - name: kubelet.service
enable: true enabled: true
contents: | contents: |
[Unit] [Unit]
Description=Kubelet via Hyperkube Description=Kubelet
Requires=docker.service
After=docker.service
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
EnvironmentFile=/etc/kubernetes/kubelet.env Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.20.0
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--insecure-options=image"
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver} Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/bin/mkdir -p /var/lib/calico ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt" ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid # Podman, rkt, or runc run container processes, whereas docker run
ExecStart=/usr/lib/coreos/kubelet-wrapper \ # is a client to a daemon and requires workarounds to use within a
# systemd unit. https://github.com/moby/moby/issues/6791
ExecStartPre=/usr/bin/docker run -d \
--name kubelet \
--privileged \
--pid host \
--network host \
-v /etc/kubernetes:/etc/kubernetes:ro \
-v /etc/machine-id:/etc/machine-id:ro \
-v /usr/lib/os-release:/etc/os-release:ro \
-v /lib/modules:/lib/modules:ro \
-v /run:/run \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-v /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
-v /var/lib/calico:/var/lib/calico:ro \
-v /var/lib/docker:/var/lib/docker \
-v /var/lib/kubelet:/var/lib/kubelet:rshared \
-v /var/log:/var/log \
-v /opt/cni/bin:/opt/cni/bin \
$${KUBELET_IMAGE} \
--anonymous-auth=false \ --anonymous-auth=false \
--authentication-token-webhook \ --authentication-token-webhook \
--authorization-mode=Webhook \ --authorization-mode=Webhook \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--cgroup-driver=$${KUBELET_CGROUP_DRIVER} \ --cgroup-driver=$${KUBELET_CGROUP_DRIVER} \
--client-ca-file=/etc/kubernetes/ca.crt \ --client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \ --cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \ --cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \ --cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \ --healthz-port=0 \
--kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/var/lib/kubelet/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \ --network-plugin=cni \
--node-labels=node.kubernetes.io/node \ --node-labels=node.kubernetes.io/node \
%{ for label in split(",", node_labels) } %{~ for label in split(",", node_labels) ~}
--node-labels=${label} \ --node-labels=${label} \
%{ endfor ~} %{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \ --pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \ --read-only-port=0 \
--rotate-certificates \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins --volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid ExecStart=docker logs -f kubelet
ExecStop=docker stop kubelet
ExecStopPost=docker rm kubelet
Restart=always Restart=always
RestartSec=5 RestartSec=5
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
- name: delete-node.service - name: delete-node.service
enable: true enabled: true
contents: | contents: |
[Unit] [Unit]
Description=Waiting to delete Kubernetes node on shutdown Description=Delete Kubernetes node on shutdown
[Service] [Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.20.0
Type=oneshot Type=oneshot
RemainAfterExit=true RemainAfterExit=true
ExecStart=/bin/true ExecStart=/bin/true
ExecStop=/etc/kubernetes/delete-node ExecStop=/bin/bash -c '/usr/bin/docker run -v /var/lib/kubelet:/var/lib/kubelet:ro --entrypoint /usr/local/bin/kubectl $${KUBELET_IMAGE} --kubeconfig=/var/lib/kubelet/kubeconfig delete node $HOSTNAME'
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
storage: storage:
@ -92,34 +104,12 @@ storage:
contents: contents:
inline: | inline: |
${kubeconfig} ${kubeconfig}
- path: /etc/kubernetes/kubelet.env - path: /etc/sysctl.d/max-user-watches.conf
filesystem: root filesystem: root
mode: 0644 mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.3
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents: contents:
inline: | inline: |
fs.inotify.max_user_watches=16184 fs.inotify.max_user_watches=16184
- path: /etc/kubernetes/delete-node
filesystem: root
mode: 0744
contents:
inline: |
#!/bin/bash
set -e
exec /usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.16.3 \
--net=host \
--dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
passwd: passwd:
users: users:
- name: core - name: core

View File

@ -36,8 +36,13 @@ variable "instance_type" {
variable "os_image" { variable "os_image" {
type = string type = string
description = "AMI channel for a Container Linux derivative (coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)" description = "AMI channel for a Container Linux derivative (flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)"
default = "coreos-stable" default = "flatcar-stable"
validation {
condition = contains(["flatcar-stable", "flatcar-beta", "flatcar-alpha", "flatcar-edge"], var.os_image)
error_message = "The os_image must be flatcar-stable, flatcar-beta, flatcar-alpha, or flatcar-edge."
}
} }
variable "disk_size" { variable "disk_size" {
@ -70,7 +75,7 @@ variable "target_groups" {
default = [] default = []
} }
variable "clc_snippets" { variable "snippets" {
type = list(string) type = list(string)
description = "Container Linux Config snippets" description = "Container Linux Config snippets"
default = [] default = []
@ -89,22 +94,22 @@ variable "ssh_authorized_key" {
} }
variable "service_cidr" { variable "service_cidr" {
type = string type = string
description = <<EOD description = <<EOD
CIDR IPv4 range to assign Kubernetes services. CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns. The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD EOD
default = "10.3.0.0/16" default = "10.3.0.0/16"
} }
variable "cluster_domain_suffix" { variable "cluster_domain_suffix" {
type = string type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) " description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
default = "cluster.local" default = "cluster.local"
} }
variable "node_labels" { variable "node_labels" {
type = list(string) type = list(string)
description = "List of initial node labels" description = "List of initial node labels"
default = [] default = []
} }

View File

@ -0,0 +1,14 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.13.0"
required_providers {
aws = ">= 2.23, <= 4.0"
template = "~> 2.1"
ct = {
source = "poseidon/ct"
version = "~> 0.6"
}
}
}

View File

@ -71,21 +71,21 @@ resource "aws_launch_configuration" "worker" {
# Worker Ignition config # Worker Ignition config
data "ct_config" "worker-ignition" { data "ct_config" "worker-ignition" {
content = data.template_file.worker-config.rendered content = data.template_file.worker-config.rendered
pretty_print = false strict = true
snippets = var.clc_snippets snippets = var.snippets
} }
# Worker Container Linux config # Worker Container Linux config
data "template_file" "worker-config" { data "template_file" "worker-config" {
template = file("${path.module}/cl/worker.yaml.tmpl") template = file("${path.module}/cl/worker.yaml")
vars = { vars = {
kubeconfig = indent(10, var.kubeconfig) kubeconfig = indent(10, var.kubeconfig)
ssh_authorized_key = var.ssh_authorized_key ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10) cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix cluster_domain_suffix = var.cluster_domain_suffix
cgroup_driver = local.flavor == "flatcar" && local.channel == "edge" ? "systemd" : "cgroupfs" cgroup_driver = local.channel == "edge" ? "systemd" : "cgroupfs"
node_labels = join(",", var.node_labels) node_labels = join(",", var.node_labels)
} }
} }

View File

View File

@ -1,160 +0,0 @@
---
systemd:
units:
- name: etcd-member.service
enable: true
dropins:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.4.3"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380"
Environment="ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381"
Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}"
Environment="ETCD_STRICT_RECONFIG_CHECK=true"
Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt"
Environment="ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt"
Environment="ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key"
Environment="ETCD_CLIENT_CERT_AUTH=true"
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt"
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt"
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key"
Environment="ETCD_PEER_CLIENT_CERT_AUTH=true"
- name: docker.service
enable: true
- name: locksmithd.service
mask: true
- name: wait-for-dns.service
enable: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
enable: true
contents: |
[Unit]
Description=Kubelet via Hyperkube
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--insecure-options=image"
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/master \
--node-labels=node.kubernetes.io/controller="true" \
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: bootstrap.service
contents: |
[Unit]
Description=Kubernetes control plane
ConditionPathExists=!/opt/bootstrap/bootstrap.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/rkt run \
--trust-keys-from-https \
--volume assets,kind=host,source=/opt/bootstrap/assets \
--mount volume=assets,target=/assets \
--volume script,kind=host,source=/opt/bootstrap/apply \
--mount volume=script,target=/apply \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.16.3 \
--net=host \
--dns=host \
--exec=/apply
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubeconfig
filesystem: root
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/kubernetes/kubelet.env
filesystem: root
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.3
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/assets/auth/kubeconfig
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5
done
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
inline: |
fs.inotify.max_user_watches=16184
passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_key}"

View File

@ -1,100 +0,0 @@
# Secure copy assets to controllers.
resource "null_resource" "copy-controller-secrets" {
count = var.controller_count
depends_on = [
module.bootstrap,
azurerm_virtual_machine.controllers
]
connection {
type = "ssh"
host = azurerm_public_ip.controllers.*.ip_address[count.index]
user = "core"
timeout = "15m"
}
provisioner "file" {
content = module.bootstrap.etcd_ca_cert
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_cert
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_key
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = module.bootstrap.etcd_server_cert
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_server_key
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_cert
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_key
destination = "$HOME/etcd-peer.key"
}
provisioner "file" {
source = var.asset_dir
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
"sudo mv $HOME/assets /opt/bootstrap/assets",
"sudo mkdir -p /etc/kubernetes/manifests",
"sudo mkdir -p /etc/kubernetes/bootstrap-secrets",
"sudo cp -r /opt/bootstrap/assets/tls/* /etc/kubernetes/bootstrap-secrets/",
"sudo cp /opt/bootstrap/assets/auth/kubeconfig /etc/kubernetes/bootstrap-secrets/",
"sudo cp -r /opt/bootstrap/assets/static-manifests/* /etc/kubernetes/manifests/",
]
}
}
# Connect to a controller to perform one-time cluster bootstrap.
resource "null_resource" "bootstrap" {
depends_on = [
null_resource.copy-controller-secrets,
module.workers,
azurerm_dns_a_record.apiserver,
]
connection {
type = "ssh"
host = element(azurerm_public_ip.controllers.*.ip_address, 0)
user = "core"
timeout = "15m"
}
provisioner "remote-exec" {
inline = [
"sudo systemctl start bootstrap",
]
}
}

View File

@ -1,4 +0,0 @@
terraform {
required_version = ">= 0.12"
}

View File

@ -1,117 +0,0 @@
locals {
# Channel for a Container Linux derivative
# coreos-stable -> Container Linux Stable
channel = element(split("-", var.os_image), 1)
}
# Workers scale set
resource "azurerm_virtual_machine_scale_set" "workers" {
resource_group_name = var.resource_group_name
name = "${var.name}-workers"
location = var.region
single_placement_group = false
sku {
name = var.vm_type
tier = "standard"
capacity = var.worker_count
}
# boot
storage_profile_image_reference {
publisher = "CoreOS"
offer = "CoreOS"
sku = local.channel
version = "latest"
}
# storage
storage_profile_os_disk {
create_option = "FromImage"
caching = "ReadWrite"
os_type = "linux"
managed_disk_type = "Standard_LRS"
}
os_profile {
computer_name_prefix = "${var.name}-worker-"
admin_username = "core"
custom_data = data.ct_config.worker-ignition.rendered
}
# Azure mandates setting an ssh_key, even though Ignition custom_data handles it too
os_profile_linux_config {
disable_password_authentication = true
ssh_keys {
path = "/home/core/.ssh/authorized_keys"
key_data = var.ssh_authorized_key
}
}
# network
network_profile {
name = "nic0"
primary = true
network_security_group_id = var.security_group_id
ip_configuration {
name = "ip0"
primary = true
subnet_id = var.subnet_id
# backend address pool to which the NIC should be added
load_balancer_backend_address_pool_ids = [var.backend_address_pool_id]
}
}
# lifecycle
upgrade_policy_mode = "Manual"
# eviction policy may only be set when priority is Low
priority = var.priority
eviction_policy = var.priority == "Low" ? "Delete" : null
}
# Scale up or down to maintain desired number, tolerating deallocations.
resource "azurerm_monitor_autoscale_setting" "workers" {
resource_group_name = var.resource_group_name
name = "${var.name}-maintain-desired"
location = var.region
# autoscale
enabled = true
target_resource_id = azurerm_virtual_machine_scale_set.workers.id
profile {
name = "default"
capacity {
minimum = var.worker_count
default = var.worker_count
maximum = var.worker_count
}
}
}
# Worker Ignition configs
data "ct_config" "worker-ignition" {
content = data.template_file.worker-config.rendered
pretty_print = false
snippets = var.clc_snippets
}
# Worker Container Linux configs
data "template_file" "worker-config" {
template = file("${path.module}/cl/worker.yaml.tmpl")
vars = {
kubeconfig = indent(10, var.kubeconfig)
ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
node_labels = join(",", var.node_labels)
}
}

View File

@ -0,0 +1,23 @@
The MIT License (MIT)
Copyright (c) 2020 Typhoon Authors
Copyright (c) 2020 Dalton Hubble
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@ -11,13 +11,13 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.3 (upstream) * Kubernetes v1.20.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot priority](https://typhoon.psdn.io/fedora-coreos/azure/#low-priority) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs ## Docs
Please see the [official docs](https://typhoon.psdn.io) and the AWS [tutorial](https://typhoon.psdn.io/cl/aws/). Please see the [official docs](https://typhoon.psdn.io) and the Azure [tutorial](https://typhoon.psdn.io/fedora-coreos/azure/).

View File

@ -0,0 +1,25 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=4edd79dd0295e6ffa4c8ed04fd5914d5cb3f1b7c"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]
etcd_servers = formatlist("%s.%s", azurerm_dns_a_record.etcds.*.name, var.dns_zone)
networking = var.networking
# only effective with Calico networking
# we should be able to use 1450 MTU, but in practice, 1410 was needed
network_encapsulation = "vxlan"
network_mtu = "1410"
pod_cidr = var.pod_cidr
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
enable_reporting = var.enable_reporting
enable_aggregation = var.enable_aggregation
# Fedora CoreOS
trusted_certs_dir = "/etc/pki/tls/certs"
}

View File

@ -0,0 +1,151 @@
# Discrete DNS records for each controller's private IPv4 for etcd usage
resource "azurerm_dns_a_record" "etcds" {
count = var.controller_count
resource_group_name = var.dns_zone_group
# DNS Zone name where record should be created
zone_name = var.dns_zone
# DNS record
name = format("%s-etcd%d", var.cluster_name, count.index)
ttl = 300
# private IPv4 address for etcd
records = [azurerm_network_interface.controllers.*.private_ip_address[count.index]]
}
# Controller availability set to spread controllers
resource "azurerm_availability_set" "controllers" {
resource_group_name = azurerm_resource_group.cluster.name
name = "${var.cluster_name}-controllers"
location = var.region
platform_fault_domain_count = 2
platform_update_domain_count = 4
managed = true
}
# Controller instances
resource "azurerm_linux_virtual_machine" "controllers" {
count = var.controller_count
resource_group_name = azurerm_resource_group.cluster.name
name = "${var.cluster_name}-controller-${count.index}"
location = var.region
availability_set_id = azurerm_availability_set.controllers.id
size = var.controller_type
custom_data = base64encode(data.ct_config.controller-ignitions.*.rendered[count.index])
# storage
source_image_id = var.os_image
os_disk {
name = "${var.cluster_name}-controller-${count.index}"
caching = "None"
disk_size_gb = var.disk_size
storage_account_type = "Premium_LRS"
}
# network
network_interface_ids = [
azurerm_network_interface.controllers.*.id[count.index]
]
# Azure requires setting admin_ssh_key, though Ignition custom_data handles it too
admin_username = "core"
admin_ssh_key {
username = "core"
public_key = var.ssh_authorized_key
}
lifecycle {
ignore_changes = [
os_disk,
custom_data,
]
}
}
# Controller public IPv4 addresses
resource "azurerm_public_ip" "controllers" {
count = var.controller_count
resource_group_name = azurerm_resource_group.cluster.name
name = "${var.cluster_name}-controller-${count.index}"
location = azurerm_resource_group.cluster.location
sku = "Standard"
allocation_method = "Static"
}
# Controller NICs with public and private IPv4
resource "azurerm_network_interface" "controllers" {
count = var.controller_count
resource_group_name = azurerm_resource_group.cluster.name
name = "${var.cluster_name}-controller-${count.index}"
location = azurerm_resource_group.cluster.location
ip_configuration {
name = "ip0"
subnet_id = azurerm_subnet.controller.id
private_ip_address_allocation = "Dynamic"
# instance public IPv4
public_ip_address_id = azurerm_public_ip.controllers.*.id[count.index]
}
}
# Associate controller network interface with controller security group
resource "azurerm_network_interface_security_group_association" "controllers" {
count = var.controller_count
network_interface_id = azurerm_network_interface.controllers[count.index].id
network_security_group_id = azurerm_network_security_group.controller.id
}
# Associate controller network interface with controller backend address pool
resource "azurerm_network_interface_backend_address_pool_association" "controllers" {
count = var.controller_count
network_interface_id = azurerm_network_interface.controllers[count.index].id
ip_configuration_name = "ip0"
backend_address_pool_id = azurerm_lb_backend_address_pool.controller.id
}
# Controller Ignition configs
data "ct_config" "controller-ignitions" {
count = var.controller_count
content = data.template_file.controller-configs.*.rendered[count.index]
strict = true
snippets = var.controller_snippets
}
# Controller Fedora CoreOS configs
data "template_file" "controller-configs" {
count = var.controller_count
template = file("${path.module}/fcc/controller.yaml")
vars = {
# Cannot use cyclic dependencies on controllers or their DNS records
etcd_name = "etcd${count.index}"
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
etcd_initial_cluster = join(",", data.template_file.etcds.*.rendered)
kubeconfig = indent(10, module.bootstrap.kubeconfig-kubelet)
ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
}
}
data "template_file" "etcds" {
count = var.controller_count
template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380"
vars = {
index = count.index
cluster_name = var.cluster_name
dns_zone = var.dns_zone
}
}

Some files were not shown because too many files have changed in this diff Show More