Compare commits

..

44 Commits

Author SHA1 Message Date
5a1bc423a1 Announce Fedora Atomic modules won't be updated beyond v1.13.x
* Thank you Project Atomic team and users
* See the deprecation announcement https://typhoon.psdn.io/announce/#march-27-2019
2019-03-26 23:56:33 -07:00
32fe72fb2d Update mkdocs and plugin versions used in tutorials
* Recommend provider plugin versions that are currently used
by the author
* Recommend updating terraform-provider-ct plugin from v0.3.0
to v0.3.1
* https://github.com/coreos/terraform-provider-ct/releases
2019-03-26 01:00:44 -07:00
4fea526ebf Update Kubernetes from v1.13.4 to v1.13.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1135
2019-03-25 21:43:47 -07:00
41a9d86bc3 Add NetworkPolicy to limit traffic into Prometheus
* Allow traffic from Grafana to Prometheus in monitoring
* Allow traffic from Prometheus to Prometheus in monitoring
* NetworkPolicy denies non-whitelisted traffic. Define policy
to allow other access
2019-03-23 21:38:34 -07:00
36e31fc9fa Add liveness and readiness probes to Grafana
* https://github.com/grafana/grafana/issues/3302
2019-03-23 17:55:37 -07:00
619a0370dc Update Grafana from v6.0.1 to v6.0.2
* https://github.com/grafana/grafana/releases/tag/v6.0.2
2019-03-21 23:41:25 -07:00
6dd2731046 Set cpu/memory resources requests/limits for some addons
* Set resource requests and limits for Grafana and CLUO
* Set resource requests for Prometheus, but allow usage
to grow since needs vary widely
* Leave nginx without resource requests/limits for now,
its typically well behaved
2019-03-20 00:15:08 -07:00
1feefbe9c6 Update Calico from v3.5.2 to v3.6.0
* Add calico-ipam CRDs and RBAC permissions
* Switch IPAM from host-local to calico-ipam
  * `calico-ipam` subnets `ippools` (defaults to pod CIDR) into
`ipamblocks` (defaults to /26, but set to /24 in Typhoon)
  * `host-local` subnets the pod CIDR based on the node PodCIDR
field (set via kube-controller-manager as /24's)
* Create a custom default IPv4 IPPool to ensure the block size
is kept at /24 to allow 110 pods per node (Kubernetes default)
* Retaining host-local was slightly preferred, but Calico v3.6
is migrating all usage to calico-ipam. The codepath that skipped
calico-ipam for KDD was removed
*  https://docs.projectcalico.org/v3.6/release-notes/
2019-03-19 22:49:56 -07:00
aa630003a4 Refresh Prometheus rules and Grafana dashboards
* Refresh rules and dashboards from upstreams
* Organize dashboards and stay below the ConfigMap size
limit
2019-03-17 13:23:04 -07:00
bf97a45b9d Remove heapster manifests from addons
* Heapster addon powers `kubectl top`
* In early Kubernetes, people legitimately used and expected
`kubectl top` to work, so the optional addon was provided
* Today the standards are different. Many better monitoring
tools exist, that are also less coupled to Kubernetes "kubectl
top" reliance on a non-core extensions means its not in-scope
for minimal Kubernetes clusters. No more exceptionalism
* Finally, Heapster isn't that useful anymore. Its manifests
have no need for Typhoon-specific modification
* Look to prior releases if you still wish to apply heapster
2019-03-17 12:41:59 -07:00
3d6a6d4adb Re-add Kubelet metadata service dependency on DigitalOcean
* Restore the original special-casing of DigitalOcean Kubelets
* Fix node metadata InternalIP being set to the IP of the default
gateway on DigitalOcean nodes (regressed in v1.12.3)
* Reverts the "pretty" node names on DigitalOcean (worker-2 vs IP)
* Closes #424 (full details)
2019-03-17 12:39:25 -07:00
e0bee2e417 Update Prometheus from v2.7.2 to v2.8.0
* https://github.com/prometheus/prometheus/releases/tag/v2.8.0
2019-03-13 22:11:38 -07:00
2019177b6b Fix implicit map assignments to be explicit
* Terraform v0.12 will require map assignments be explicit,
part of v0.12 readiness
2019-03-12 01:19:54 -07:00
9493ed3b1d Change default iPXE kernel/initrd download from HTTP to HTTPS
* Require an iPXE-enabled network boot environment with support for
TLS downloads. PXE clients must chainload to iPXE firmware compiled
with `DOWNLOAD_PROTO_HTTPS` enabled ([crypto](https://ipxe.org/crypto))
* iPXE's pre-compiled firmware binaries do _not_ enable HTTPS. Admins
should build iPXE from source with support enabled
* Affects the Container Linux and Flatcar Linux install profiles that
pull from public downloads. No effect when cached_install=true
or using Fedora Atomic, as those download from Matchbox
* Add `download_protocol` variable. Recognizing boot firmware TLS
support is difficult in some environments, set the protocol to "http"
for the old behavior (discouraged)
2019-03-09 23:23:40 -08:00
4201eb1efa Update Grafana from v6.0.0 to v6.0.1
* https://github.com/grafana/grafana/releases/tag/v6.0.1
2019-03-09 12:44:18 -08:00
fe96da27d7 Add support for terraform-provider-aws v2.0+
* Allow terraform-provider-aws >= v1.13, but < 3.0. No change
to the minimum version, but allow using v2.x.y releases
* Verify compatability with terraform-provider-aws v2.1.0
2019-03-09 12:06:44 -08:00
3afd114f8c Update mkdocs-material from v4.0.1 to v4.0.2 2019-03-04 23:11:02 -08:00
4d9a692424 Update Prometheus from v2.7.1 to v2.7.2
* https://github.com/prometheus/prometheus/releases/tag/v2.7.2
2019-03-04 23:08:12 -08:00
deec512c14 Resolve in-addr.arpa and ip6.arpa zones with CoreDNS kubernetes plugin
* Resolve in-addr.arpa and ip6.arpa DNS PTR requests for Kubernetes
service IPs and pod IPs
* Previously, CoreDNS was configured to resolve in-addr.arpa PTR
records for service IPs (but not pod IPs)
2019-03-04 23:03:00 -08:00
5066a25d89 Add links and clarifications in CHANGES for release 2019-03-02 11:26:12 -08:00
de251bd94f Update tutorials to prefer newer provider plugins over min version
* Minimum versions of Terraform provider plugins are enforced in
each module already. Its better to provide examples with newer
versions. Some folks don't update them
* Previously, tutorials showed the minimum viable version of each
terraform provider that might be used
2019-03-02 11:07:40 -08:00
fc277eaab6 Document the GCP DNS admin requirement for cluster provisioning
* Configure the google terraform provider to use GCP service
account credentials with compute and dns admin privileges
2019-03-02 10:54:35 -08:00
a08adc92b5 Update nginx-ingress from v0.22.0 to v0.23.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.23.0
2019-03-01 01:18:54 -08:00
d42f42df4e Re-measure cluster provision times and document 2019-03-01 01:15:08 -08:00
4ff7fe2c29 Update Grafana dashboards from upstreams 2019-02-28 23:22:07 -08:00
f598307998 Update Kubernetes from v1.13.3 to v1.13.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1134
2019-02-28 22:47:43 -08:00
8ae552ebda Update documentation for use with Ubiquiti EdgeOS
* Show creation of a PXE-enabled network boot environment when
using dnsmasq as the DHCP server
* Recommend TFTP be served from /config/tftpboot since /config
is preserved between firmware upgrades
* Recommend compiling undionly.kpxe from source to enable
TLS features
* Add a note that equal-cost multi-path service IP routing
(e.g. for ingress) requires EdgeOS v2.0. Previously, it was known
that TLS handshakes couldn't be completed with packet balacing.
I've verified this is no longer the case when using the v2.0
EdgeOS firmware, ECMP works as expected.
2019-02-27 23:36:27 -08:00
daee5a9d60 Update Grafana from v6.0.0-beta3 to v6.0.0
* https://github.com/grafana/grafana/releases/tag/v6.0.0
* http://docs.grafana.org/guides/whats-new-in-v6-0/
2019-02-25 21:43:43 -08:00
73ae5d5649 Update Calico from v3.5.1 to v3.5.2
* https://docs.projectcalico.org/v3.5/releases/
2019-02-25 21:23:13 -08:00
42d7222f3d Add a readiness probe to CoreDNS
* https://github.com/poseidon/terraform-render-bootkube/pull/115
2019-02-23 13:25:23 -08:00
d10c2b4cb9 Update Grafana from v6.0.0-beta2 to v6.0.0-beta3
* Update Grafana dashboards
2019-02-23 13:03:25 -08:00
7f8572030d Upgrade to support terraform-provider-google v2.0+
* Support terraform-provider-google v1.19.0, v1.19.1, v1.20.0
and v2.0+ (and allow for future 2.x.y releases)
* Require terraform-provider-google v1.19.0 or newer. v1.19.0
introduced `network_interface` fields `network_ip` and `nat_ip`
to deprecate `address` and `assigned_nat_ip`. Those deprecated
fields are removed in terraform-provider-google v2.0
* https://github.com/terraform-providers/terraform-provider-google/releases/tag/v2.0.0
2019-02-20 02:33:32 -08:00
4294bd0292 Assign Pod Priority classes to critical cluster and node components
* Assign pod priorityClassNames to critical cluster and node
components (higher is higher priority) to inform node out-of-resource
eviction order and scheduler preemption and scheduling order
* Priority Admission Controller has been enabled since Typhoon
v1.11.1
2019-02-19 22:21:39 -08:00
ba4c5de052 Set the Google Cloud minimum CPU platform to Intel Haswell
* Intel Haswell or better is available in every zone around the world
* Neither Kubernetes nor Typhoon have a particular minimum processor
family. However, a few Google Cloud zones still default to Sandy/Ivy
bridge (scheduled to shift April 2019). Price is only based on machine
type so it is beneficial to opt for the next processor family
* Intel Haswell is a suitable minimum since it still allows plenty of
liberty in choosing any region or machine type
* Likely a slight increase to preemption probability in a few zones,
but any lower probability on Sandy/Ivy bridge is due to lower
desirability as they're phased out
* https://cloud.google.com/compute/docs/regions-zones/
2019-02-18 12:55:04 -08:00
e483c81ce9 Improve Prometheus rules and alerts and Grafana dashboards
* Collate upstream rules, alerts, and dashboards and tune for use
in Typhoon
* Previously, a well-chosen (but older) set of rules, alerts, and
dashboards were maintained to reflect metric name changes
2019-02-18 12:19:23 -08:00
6fa3b8a13f Upgrade Grafana to v6.0.0-beta2 and enable Explore UI
* Upgrade Grafana from v5.4.3 to v6.0.0-beta2
* Enable Grafana Explore UI while still using only the Viewer
role (inspect/edit without saving)
* http://docs.grafana.org/guides/whats-new-in-v6-0/
2019-02-17 13:26:42 -08:00
ac95e83249 Update mkdocs-material from v3.3.0 to v4.0.1 2019-02-16 15:55:38 -08:00
d988822741 Document and recommend terraform-provider-matchbox v0.2.3
* https://github.com/coreos/terraform-provider-matchbox/releases/tag/v0.2.3
2019-02-16 15:07:49 -08:00
170ef74eea Remove Nginx Ingress default backend
* nginx-ingress no longer requires a configured default-backend,
it will respond with its own 404 page starting in v0.21.0
* https://github.com/kubernetes/ingress-nginx/pull/3196
2019-02-16 14:18:15 -08:00
b13a651cfe Drop metrics that are unset, high cardinality, or extraneous
* https://github.com/coreos/prometheus-operator/pull/2387
* https://github.com/coreos/prometheus-operator/pull/1959
2019-02-10 23:56:11 -08:00
9c59f393a5 Add Kubernetes pod name to metrics discovered from service endpoints
* Prometheus queries from some upstreams use joins of node-exporter
and kube-state-metrics metrics by (namespace,pod). Add the Kubernetes
pod name to service endpoint metrics
* Rename the kubernetes_namespace field to namespace
* Honor labels since kube-state-metrics already include a `pod` field
that should not be overridden
2019-02-10 23:54:30 -08:00
3e4b3bfb04 Raise nginx-ingress liveness/readiness timeout
* Under heavy load, avoid timeouts causing nginx-ingress
restarts https://github.com/kubernetes/ingress-nginx/pull/3737
2019-02-09 12:53:09 -08:00
584088397c Update etcd from v3.3.11 to v3.3.12
* https://github.com/etcd-io/etcd/releases/tag/v3.3.12
2019-02-09 11:54:54 -08:00
0200058e0e Update Calico from v3.5.0 to v3.5.1
* Fix in confd https://github.com/projectcalico/confd/pull/205
2019-02-09 11:49:31 -08:00
118 changed files with 11285 additions and 8754 deletions

View File

@ -4,6 +4,86 @@ Notable changes between versions.
## Latest ## Latest
## v1.13.5
* Kubernetes [v1.13.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1135)
* Resolve in-addr.arpa reverse DNS lookups (PTR) for pod IPv4 addresses ([#415](https://github.com/poseidon/typhoon/pull/415))
* Reverse DNS lookups for service IPv4 addresses unchanged
* Upgrade Calico from v3.5.2 to [v3.6.0](https://docs.projectcalico.org/v3.6/release-notes/) ([#430](https://github.com/poseidon/typhoon/pull/430))
* Change pod IPAM from `host-local` to `calico-ipam`. `pod_cidr` is still divided into `/24` subnets per node, but managed as `ippools` and `ipamblocks`
* Suggest updating [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) from v0.3.0 to [v0.3.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.3.1) ([#434](https://github.com/poseidon/typhoon/pull/434))
* Announce: Fedora Atomic modules will be not be updated beyond Kubernetes v1.13.x ([#437](https://github.com/poseidon/typhoon/pull/437))
* Thank you Project Atomic team and users, please see the deprecation [notice](https://typhoon.psdn.io/announce/#march-27-2019)
#### AWS
* Support `terraform-provider-aws` v2.0+ ([#419](https://github.com/poseidon/typhoon/pull/419))
#### Bare-Metal
* Change the default iPXE kernel and initrd download protocol from HTTP to HTTPS ([#420](https://github.com/poseidon/typhoon/pull/420))
* Require an iPXE-enabled network boot environment with support for TLS downloads. PXE clients must chainload to iPXE firmware compiled with `DOWNLOAD_PROTO_HTTPS` [enabled](https://ipxe.org/crypto). (**action required**)
* Only affects Container Linux and Flatcar Linux install profiles that pull public images (default)
* Add `download_protocol` variable. Recognizing boot firmware TLS support is difficult in some environments, set the protocol to "http" for the old behavior (discouraged)
#### DigitalOcean
* Fix kubelet hostname-override to set node metadata InternalIP correctly ([#424](https://github.com/poseidon/typhoon/issues/424))
* Uniquely, DigitalOcean does not resolve hostnames to instance private IPs. Kubelet auto-detect mechanisms require the internal IP be set directly.
* Regressed in v1.12.3 ([#337](https://github.com/poseidon/typhoon/pull/337)) which aimed to provide friendly hostname-based node names on DigitalOcean
#### Addons
* Update Prometheus from v2.7.1 to [v2.8.0](https://github.com/prometheus/prometheus/releases/tag/v2.8.0)
* Refresh rules based on upstreams ([#426](https://github.com/poseidon/typhoon/pull/426))
* Define NetworkPolicy to allow only traffic from the Grafana addon
* Update Grafana from v6.0.0 to v6.0.2
* Add liveness and readiness probes
* Refresh dashboards and organize to stay below ConfigMap size limit ([#426](https://github.com/poseidon/typhoon/pull/426))
* Remove heapster manifests from addons ([#427](https://github.com/poseidon/typhoon/pull/427))
* Heapster addon powers `kubectl top` (in early Kubernetes, running the addon was expected). Today, there are better monitoring options.
* `kubectl top` reliance on a non-core extension means its not in-scope for minimal Kubernetes
* Look to prior releases if you still wish to apply heapster
## v1.13.4
* Kubernetes [v1.13.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1134)
* Update etcd from v3.3.11 to [v3.3.12](https://github.com/etcd-io/etcd/releases/tag/v3.3.12)
* Update Calico from v3.5.0 to [v3.5.2](https://docs.projectcalico.org/v3.5/releases/)
* Assign priorityClassNames to critical cluster and node components ([#406](https://github.com/poseidon/typhoon/pull/406))
* Inform node out-of-resource eviction and scheduler preemption and ordering
* Add CoreDNS readiness probe ([#410](https://github.com/poseidon/typhoon/pull/410))
#### Bare-Metal
* Recommend updating [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) plugin from v0.2.2 to [v0.2.3](https://github.com/coreos/terraform-provider-matchbox/releases/tag/v0.2.3) ([#402](https://github.com/poseidon/typhoon/pull/402))
* Improve docs on using Ubiquiti EdgeOS with bare-metal clusters ([#413](https://github.com/poseidon/typhoon/pull/413))
#### Google Cloud
* Support `terraform-provider-google` v2.0+ ([#407](https://github.com/poseidon/typhoon/pull/407))
* Require `terraform-provider-google` v1.19+ (**action required**)
* Set the minimum CPU platform to Intel Haswell ([#405](https://github.com/poseidon/typhoon/pull/405))
* Haswell or better is available in every zone (no price change)
* A few zones still default to Sandy/Ivy Bridge (shifts in April 2019)
#### Addons
* Modernize Prometheus rules and alerts ([#404](https://github.com/poseidon/typhoon/pull/404))
* Drop extraneous metrics ([#397](https://github.com/poseidon/typhoon/pull/397))
* Add `pod` name label to metrics discovered via service endpoints
* Rename `kubernetes_namespace` label to `namespace`
* Modernize Grafana and dashboards, see [docs](https://typhoon.psdn.io/addons/grafana/) ([#403](https://github.com/poseidon/typhoon/pull/403), [#404](https://github.com/poseidon/typhoon/pull/404))
* Upgrade Grafana from v5.4.3 to [v6.0.0](https://github.com/grafana/grafana/releases/tag/v6.0.0)!
* Enable Grafana [Explore](http://docs.grafana.org/guides/whats-new-in-v6-0/#explore) UI as a Viewer (inspect/edit without saving)
* Update nginx-ingress from v0.22.0 to v0.23.0
* Raise nginx-ingress liveness/readiness timeout to 5 seconds
* Remove nginx-ingess default-backend ([#401](https://github.com/poseidon/typhoon/pull/401))
#### Fedora Atomic
* Build Kubelet [system container](https://github.com/poseidon/system-containers) with buildah. The image is an OCI format and slightly larger.
## v1.13.3 ## v1.13.3
* Kubernetes [v1.13.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1133) * Kubernetes [v1.13.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1133)

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.13.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube)) * Kubernetes v1.13.5 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/cl/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/cl/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
@ -33,10 +33,10 @@ Fedora Atomic support is alpha and will evolve as Fedora Atomic is replaced by F
| Platform | Operating System | Terraform Module | Status | | Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------| |---------------|------------------|------------------|--------|
| AWS | Fedora Atomic | [aws/fedora-atomic/kubernetes](aws/fedora-atomic/kubernetes) | alpha | | AWS | Fedora Atomic | [aws/fedora-atomic/kubernetes](aws/fedora-atomic/kubernetes) | deprecated |
| Bare-Metal | Fedora Atomic | [bare-metal/fedora-atomic/kubernetes](bare-metal/fedora-atomic/kubernetes) | alpha | | Bare-Metal | Fedora Atomic | [bare-metal/fedora-atomic/kubernetes](bare-metal/fedora-atomic/kubernetes) | deprecated |
| Digital Ocean | Fedora Atomic | [digital-ocean/fedora-atomic/kubernetes](digital-ocean/fedora-atomic/kubernetes) | alpha | | Digital Ocean | Fedora Atomic | [digital-ocean/fedora-atomic/kubernetes](digital-ocean/fedora-atomic/kubernetes) | deprecated |
| Google Cloud | Fedora Atomic | [google-cloud/fedora-atomic/kubernetes](google-cloud/fedora-atomic/kubernetes) | alpha | | Google Cloud | Fedora Atomic | [google-cloud/fedora-atomic/kubernetes](google-cloud/fedora-atomic/kubernetes) | deprecated |
## Documentation ## Documentation
@ -50,7 +50,7 @@ Define a Kubernetes cluster by using the Terraform module for your chosen platfo
```tf ```tf
module "google-cloud-yavin" { module "google-cloud-yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.13.5"
providers = { providers = {
google = "google.default" google = "google.default"
@ -91,9 +91,9 @@ In 4-8 minutes (varies by platform), the cluster will be ready. This Google Clou
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME ROLES STATUS AGE VERSION NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal controller,master Ready 6m v1.13.3 yavin-controller-0.c.example-com.internal controller,master Ready 6m v1.13.5
yavin-worker-jrbf.c.example-com.internal node Ready 5m v1.13.3 yavin-worker-jrbf.c.example-com.internal node Ready 5m v1.13.5
yavin-worker-mzdm.c.example-com.internal node Ready 5m v1.13.3 yavin-worker-mzdm.c.example-com.internal node Ready 5m v1.13.5
``` ```
List the pods. List the pods.

View File

@ -18,20 +18,15 @@ spec:
annotations: annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default' seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers: containers:
- name: update-agent - name: update-agent
image: quay.io/coreos/container-linux-update-operator:v0.7.0 image: quay.io/coreos/container-linux-update-operator:v0.7.0
command: command:
- "/bin/update-agent" - "/bin/update-agent"
volumeMounts:
- mountPath: /var/run/dbus
name: var-run-dbus
- mountPath: /etc/coreos
name: etc-coreos
- mountPath: /usr/share/coreos
name: usr-share-coreos
- mountPath: /etc/os-release
name: etc-os-release
env: env:
# read by update-agent as the node name to manage reboots for # read by update-agent as the node name to manage reboots for
- name: UPDATE_AGENT_NODE - name: UPDATE_AGENT_NODE
@ -42,10 +37,22 @@ spec:
valueFrom: valueFrom:
fieldRef: fieldRef:
fieldPath: metadata.namespace fieldPath: metadata.namespace
tolerations: resources:
- key: node-role.kubernetes.io/master requests:
operator: Exists cpu: 10m
effect: NoSchedule memory: 20Mi
limits:
cpu: 20m
memory: 40Mi
volumeMounts:
- mountPath: /var/run/dbus
name: var-run-dbus
- mountPath: /etc/coreos
name: etc-coreos
- mountPath: /usr/share/coreos
name: usr-share-coreos
- mountPath: /etc/os-release
name: etc-os-release
volumes: volumes:
- name: var-run-dbus - name: var-run-dbus
hostPath: hostPath:

View File

@ -15,6 +15,10 @@ spec:
annotations: annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default' seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec: spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers: containers:
- name: update-operator - name: update-operator
image: quay.io/coreos/container-linux-update-operator:v0.7.0 image: quay.io/coreos/container-linux-update-operator:v0.7.0
@ -25,7 +29,11 @@ spec:
valueFrom: valueFrom:
fieldRef: fieldRef:
fieldPath: metadata.namespace fieldPath: metadata.namespace
tolerations: resources:
- key: node-role.kubernetes.io/master requests:
operator: Exists cpu: 10m
effect: NoSchedule memory: 20Mi
limits:
cpu: 20m
memory: 40Mi

View File

@ -0,0 +1,36 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-config
namespace: monitoring
data:
custom.ini: |+
[server]
http_port = 8080
[paths]
data = /var/lib/grafana
plugins = /var/lib/grafana/plugins
provisioning = /etc/grafana/provisioning
[users]
allow_sign_up = false
allow_org_create = false
# viewers can edit/inspect, but not save
viewers_can_edit = true
# Disable login form, since Grafana always creates an admin user
[auth]
disable_login_form = true
# Disable the user/pass login system
[auth.basic]
enabled = false
# Allow anonymous authentication with view-only authorization
[auth.anonymous]
enabled = true
org_role = Viewer
[analytics]
reporting_enabled = false

View File

@ -0,0 +1,1230 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-etcd
namespace: monitoring
data:
etcd.json: |-
{
"annotations": {
"list": [
]
},
"description": "etcd sample Grafana dashboard with Prometheus",
"editable": true,
"gnetId": null,
"hideControls": false,
"id": 6,
"links": [
],
"refresh": false,
"rows": [
{
"collapse": false,
"editable": true,
"height": "250px",
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "$datasource",
"editable": true,
"error": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 28,
"interval": null,
"isNew": true,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(etcd_server_has_leader{job=\"$cluster\"})",
"intervalFactor": 2,
"legendFormat": "",
"metric": "etcd_server_has_leader",
"refId": "A",
"step": 20
}
],
"thresholds": "",
"title": "Up",
"type": "singlestat",
"valueFontSize": "200%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"editable": true,
"error": false,
"fill": 0,
"id": 23,
"isNew": true,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 5,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(grpc_server_started_total{job=\"$cluster\",grpc_type=\"unary\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "RPC Rate",
"metric": "grpc_server_started_total",
"refId": "A",
"step": 2
},
{
"expr": "sum(rate(grpc_server_handled_total{job=\"$cluster\",grpc_type=\"unary\",grpc_code!=\"OK\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "RPC Failed Rate",
"metric": "grpc_server_handled_total",
"refId": "B",
"step": 2
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "RPC Rate",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"editable": true,
"error": false,
"fill": 0,
"id": 41,
"isNew": true,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(grpc_server_started_total{job=\"$cluster\",grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{job=\"$cluster\",grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"})",
"intervalFactor": 2,
"legendFormat": "Watch Streams",
"metric": "grpc_server_handled_total",
"refId": "A",
"step": 4
},
{
"expr": "sum(grpc_server_started_total{job=\"$cluster\",grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{job=\"$cluster\",grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"})",
"intervalFactor": 2,
"legendFormat": "Lease Streams",
"metric": "grpc_server_handled_total",
"refId": "B",
"step": 4
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Active Streams",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": "",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"showTitle": false,
"title": "Row"
},
{
"collapse": false,
"editable": true,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"decimals": null,
"editable": true,
"error": false,
"fill": 0,
"grid": {
},
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "etcd_debugging_mvcc_db_total_size_in_bytes{job=\"$cluster\"}",
"hide": false,
"interval": "",
"intervalFactor": 2,
"legendFormat": "{{instance}} DB Size",
"metric": "",
"refId": "A",
"step": 4
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "DB Size",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"editable": true,
"error": false,
"fill": 0,
"grid": {
},
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 1,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 4,
"stack": false,
"steppedLine": true,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket{job=\"$cluster\"}[5m])) by (instance, le))",
"hide": false,
"intervalFactor": 2,
"legendFormat": "{{instance}} WAL fsync",
"metric": "etcd_disk_wal_fsync_duration_seconds_bucket",
"refId": "A",
"step": 4
},
{
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket{job=\"$cluster\"}[5m])) by (instance, le))",
"intervalFactor": 2,
"legendFormat": "{{instance}} DB fsync",
"metric": "etcd_disk_backend_commit_duration_seconds_bucket",
"refId": "B",
"step": 4
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Sync Duration",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"editable": true,
"error": false,
"fill": 0,
"id": 29,
"isNew": true,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes{job=\"$cluster\"}",
"intervalFactor": 2,
"legendFormat": "{{instance}} Resident Memory",
"metric": "process_resident_memory_bytes",
"refId": "A",
"step": 4
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"title": "New row"
},
{
"collapse": false,
"editable": true,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"editable": true,
"error": false,
"fill": 5,
"id": 22,
"isNew": true,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 3,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "rate(etcd_network_client_grpc_received_bytes_total{job=\"$cluster\"}[5m])",
"intervalFactor": 2,
"legendFormat": "{{instance}} Client Traffic In",
"metric": "etcd_network_client_grpc_received_bytes_total",
"refId": "A",
"step": 4
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Client Traffic In",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"editable": true,
"error": false,
"fill": 5,
"id": 21,
"isNew": true,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 3,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "rate(etcd_network_client_grpc_sent_bytes_total{job=\"$cluster\"}[5m])",
"intervalFactor": 2,
"legendFormat": "{{instance}} Client Traffic Out",
"metric": "etcd_network_client_grpc_sent_bytes_total",
"refId": "A",
"step": 4
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Client Traffic Out",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"editable": true,
"error": false,
"fill": 0,
"id": 20,
"isNew": true,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_network_peer_received_bytes_total{job=\"$cluster\"}[5m])) by (instance)",
"intervalFactor": 2,
"legendFormat": "{{instance}} Peer Traffic In",
"metric": "etcd_network_peer_received_bytes_total",
"refId": "A",
"step": 4
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Peer Traffic In",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"decimals": null,
"editable": true,
"error": false,
"fill": 0,
"grid": {
},
"id": 16,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_network_peer_sent_bytes_total{job=\"$cluster\"}[5m])) by (instance)",
"hide": false,
"interval": "",
"intervalFactor": 2,
"legendFormat": "{{instance}} Peer Traffic Out",
"metric": "etcd_network_peer_sent_bytes_total",
"refId": "A",
"step": 4
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Peer Traffic Out",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"title": "New row"
},
{
"collapse": false,
"editable": true,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"editable": true,
"error": false,
"fill": 0,
"id": 40,
"isNew": true,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_server_proposals_failed_total{job=\"$cluster\"}[5m]))",
"intervalFactor": 2,
"legendFormat": "Proposal Failure Rate",
"metric": "etcd_server_proposals_failed_total",
"refId": "A",
"step": 2
},
{
"expr": "sum(etcd_server_proposals_pending{job=\"$cluster\"})",
"intervalFactor": 2,
"legendFormat": "Proposal Pending Total",
"metric": "etcd_server_proposals_pending",
"refId": "B",
"step": 2
},
{
"expr": "sum(rate(etcd_server_proposals_committed_total{job=\"$cluster\"}[5m]))",
"intervalFactor": 2,
"legendFormat": "Proposal Commit Rate",
"metric": "etcd_server_proposals_committed_total",
"refId": "C",
"step": 2
},
{
"expr": "sum(rate(etcd_server_proposals_applied_total{job=\"$cluster\"}[5m]))",
"intervalFactor": 2,
"legendFormat": "Proposal Apply Rate",
"refId": "D",
"step": 2
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Raft Proposals",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": "",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"datasource": "$datasource",
"decimals": 0,
"editable": true,
"error": false,
"fill": 0,
"id": 19,
"isNew": true,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "changes(etcd_server_leader_changes_seen_total{job=\"$cluster\"}[1d])",
"intervalFactor": 2,
"legendFormat": "{{instance}} Total Leader Elections Per Day",
"metric": "etcd_server_leader_changes_seen_total",
"refId": "A",
"step": 2
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Total Leader Elections Per Day",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"title": "New row"
}
],
"schemaVersion": 13,
"sharedCrosshair": false,
"style": "dark",
"tags": [
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(etcd_server_has_leader, job)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-15m",
"to": "now"
},
"timepicker": {
"now": true,
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "etcd",
"version": 215
}

View File

@ -0,0 +1,8314 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s
namespace: monitoring
data:
k8s-cluster-rsrc-use.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:cluster_cpu_utilisation:ratio{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_cpu_saturation_load1:{cluster=\"$cluster\"} / scalar(sum(min(kube_pod_info{cluster=\"$cluster\"}) by (node)))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Saturation (Load1)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:cluster_memory_utilisation:ratio{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_memory_swap_io_bytes:sum_rate{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Saturation (Swap I/O)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_disk_utilisation:avg_irate{cluster=\"$cluster\"} / scalar(:kube_pod_info_node_count:{cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk IO Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_disk_saturation:avg_irate{cluster=\"$cluster\"} / scalar(:kube_pod_info_node_count:{cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk IO Saturation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Disk",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_net_utilisation:sum_irate{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Net Utilisation (Transmitted)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_net_saturation:sum_irate{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Net Saturation (Dropped)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(max(node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\", cluster=\"$cluster\"} - node_filesystem_avail_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\", cluster=\"$cluster\"}) by (device,pod,namespace)) by (pod,namespace)\n/ scalar(sum(max(node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\", cluster=\"$cluster\"}) by (device,pod,namespace)))\n* on (namespace, pod) group_left (node) node_namespace_pod:kube_pod_info:{cluster=\"$cluster\"}\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Capacity",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Storage",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_node_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / USE Method / Cluster",
"uid": "a6e7d1362e1ddbb79db21d5bb40d7137",
"version": 0
}
k8s-node-rsrc-use.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_cpu_utilisation:avg1m{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Utilisation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_cpu_saturation_load1:{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Saturation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Saturation (Load1)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_memory_utilisation:{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Memory",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_memory_swap_io_bytes:sum_rate{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Swap IO",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Saturation (Swap I/O)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_disk_utilisation:avg_irate{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Utilisation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk IO Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_disk_saturation:avg_irate{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Saturation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk IO Saturation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Disk",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_net_utilisation:sum_irate{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Utilisation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Net Utilisation (Transmitted)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_net_saturation:sum_irate{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Saturation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Net Saturation (Dropped)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Net",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_filesystem_usage:{cluster=\"$cluster\"}\n* on (namespace, pod) group_left (node) node_namespace_pod:kube_pod_info:{cluster=\"$cluster\", node=\"$node\"}\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Disk",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_node_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "node",
"multi": false,
"name": "node",
"options": [
],
"query": "label_values(kube_node_info{cluster=\"$cluster\"}, node)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / USE Method / Node",
"uid": "4ac4f123aae0ff6dbaf4f4f66120033b",
"version": 0
}
k8s-resources-cluster.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "100px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "1 - avg(rate(node_cpu_seconds_total{mode=\"idle\", cluster=\"$cluster\"}[1m]))",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) / sum(node:node_num_cpu:sum{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Requests Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) / sum(node:node_num_cpu:sum{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Limits Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "1 - sum(:node_memory_MemFreeCachedBuffers_bytes:sum{cluster=\"$cluster\"}) / sum(:node_memory_MemTotal_bytes:sum{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) / sum(:node_memory_MemTotal_bytes:sum{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Requests Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) / sum(:node_memory_MemTotal_bytes:sum{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Limits Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Headlines",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "/d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container_name!=\"\"}) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "/d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container_name!=\"\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container_name!=\"\"}) by (namespace) / sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container_name!=\"\"}) by (namespace) / sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Requests by Namespace",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Requests",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(node_cpu_seconds_total, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Cluster",
"uid": "efa86fd1d0c121a26444b636a3f509a8",
"version": 0
}
k8s-resources-namespace.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod_name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod_name}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "/d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(label_replace(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}, \"pod\", \"$1\", \"pod_name\", \"(.*)\")) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(label_replace(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}, \"pod\", \"$1\", \"pod_name\", \"(.*)\")) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(label_replace(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}, \"pod\", \"$1\", \"pod_name\", \"(.*)\")) by (pod) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container_name!=\"\"}) by (pod_name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod_name}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Usage (RSS)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Cache)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Swap",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "/d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(label_replace(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container_name!=\"\"}, \"pod\", \"$1\", \"pod_name\", \"(.*)\")) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(label_replace(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container_name!=\"\"}, \"pod\", \"$1\", \"pod_name\", \"(.*)\")) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(label_replace(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container_name!=\"\"}, \"pod\", \"$1\", \"pod_name\", \"(.*)\")) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(label_replace(container_memory_rss{cluster=\"$cluster\", namespace=\"$namespace\",container_name!=\"\"}, \"pod\", \"$1\", \"pod_name\", \"(.*)\")) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(label_replace(container_memory_cache{cluster=\"$cluster\", namespace=\"$namespace\",container_name!=\"\"}, \"pod\", \"$1\", \"pod_name\", \"(.*)\")) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sum(label_replace(container_memory_swap{cluster=\"$cluster\", namespace=\"$namespace\",container_name!=\"\"}, \"pod\", \"$1\", \"pod_name\", \"(.*)\")) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Namespace",
"uid": "85a562078cdf77779eaa1add43ccec1e",
"version": 0
}
k8s-resources-pod.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{namespace=\"$namespace\", pod_name=\"$pod\", container_name!=\"POD\", cluster=\"$cluster\"}) by (container_name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container_name}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Container",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "container",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(label_replace(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\", container_name!=\"POD\"}, \"container\", \"$1\", \"container_name\", \"(.*)\")) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(label_replace(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\"}, \"container\", \"$1\", \"container_name\", \"(.*)\")) by (container) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(label_replace(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\"}, \"container\", \"$1\", \"container_name\", \"(.*)\")) by (container) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\", container_name!=\"POD\", container_name!=\"\"}) by (container_name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container_name}} (RSS)",
"legendLink": null,
"step": 10
},
{
"expr": "sum(container_memory_cache{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\", container_name!=\"POD\", container_name!=\"\"}) by (container_name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container_name}} (Cache)",
"legendLink": null,
"step": 10
},
{
"expr": "sum(container_memory_swap{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\", container_name!=\"POD\", container_name!=\"\"}) by (container_name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container_name}} (Swap)",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Usage (RSS)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Cache)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Swap",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Container",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "container",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(label_replace(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\", container_name!=\"POD\", container_name!=\"\"}, \"container\", \"$1\", \"container_name\", \"(.*)\")) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(label_replace(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\"}, \"container\", \"$1\", \"container_name\", \"(.*)\")) by (container) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(label_replace(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\", container_name!=\"\"}, \"container\", \"$1\", \"container_name\", \"(.*)\")) by (container) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(label_replace(container_memory_rss{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\", container_name != \"\", container_name != \"POD\"}, \"container\", \"$1\", \"container_name\", \"(.*)\")) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(label_replace(container_memory_cache{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\", container_name != \"\", container_name != \"POD\"}, \"container\", \"$1\", \"container_name\", \"(.*)\")) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sum(label_replace(container_memory_swap{cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\", container_name != \"\", container_name != \"POD\"}, \"container\", \"$1\", \"container_name\", \"(.*)\")) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "pod",
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=\"$namespace\"}, pod)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Pod",
"uid": "6581e46e4e5c7ba40a07646395ef7b23",
"version": 0
}
nodes.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(node_load1{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "load 1m",
"refId": "A"
},
{
"expr": "max(node_load5{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "load 5m",
"refId": "B"
},
{
"expr": "max(node_load15{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "load 15m",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "System load",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (cpu) (irate(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", mode!=\"idle\", instance=\"$instance\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cpu}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Usage Per Core",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": "true",
"avg": "true",
"current": "true",
"max": "false",
"min": "false",
"rightSide": "true",
"show": "true",
"total": "false",
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max (sum by (cpu) (irate(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", mode!=\"idle\", instance=\"$instance\"}[2m])) ) * 100\n",
"format": "time_series",
"intervalFactor": 10,
"legendFormat": "{{ cpu }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilizaion",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percent",
"label": null,
"logBase": 1,
"max": 100,
"min": 0,
"show": true
},
{
"format": "percent",
"label": null,
"logBase": 1,
"max": 100,
"min": 0,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "avg(sum by (cpu) (irate(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", mode!=\"idle\", instance=\"$instance\"}[2m]))) * 100\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "CPU Usage",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(\n node_memory_MemTotal_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_MemFree_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Buffers_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Cached_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory used",
"refId": "A"
},
{
"expr": "max(node_memory_Buffers_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory buffers",
"refId": "B"
},
{
"expr": "max(node_memory_Cached_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory cached",
"refId": "C"
},
{
"expr": "max(node_memory_MemFree_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory free",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 7,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(\n (\n (\n node_memory_MemTotal_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_MemFree_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Buffers_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Cached_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n )\n / node_memory_MemTotal_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n ) * 100)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "Memory Usage",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "read",
"yaxis": 1
},
{
"alias": "io time",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(rate(node_disk_read_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "read",
"refId": "A"
},
{
"expr": "max(rate(node_disk_written_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "written",
"refId": "B"
},
{
"expr": "max(rate(node_disk_io_time_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "io time",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk I/O",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_filesystem_usage:{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Space Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(rate(node_network_receive_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\", device!~\"lo\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Received",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(rate(node_network_transmit_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\", device!~\"lo\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Transmitted",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(\n node_filesystem_files{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_filesystem_files_free{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "inodes used",
"refId": "A"
},
{
"expr": "max(node_filesystem_files_free{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "inodes free",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Inodes Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 13,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(\n (\n (\n node_filesystem_files{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_filesystem_files_free{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n )\n / node_filesystem_files{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n ) * 100)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "Inodes Usage",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(node_boot_time_seconds{cluster=\"$cluster\", job=\"node-exporter\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Nodes",
"uid": "fa49a4706d07a042595b664c87fb33ea",
"version": 0
}
persistentvolumesusage.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": false,
"avg": true,
"current": true,
"max": true,
"min": true,
"rightSide": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "(kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", persistentvolumeclaim=\"$volume\"} - kubelet_volume_stats_available_bytes{cluster=\"$cluster\", job=\"kubelet\", persistentvolumeclaim=\"$volume\"}) / kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", persistentvolumeclaim=\"$volume\"} * 100\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{ Usage }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Volume Space Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percent",
"label": null,
"logBase": 1,
"max": 100,
"min": 0,
"show": true
},
{
"format": "percent",
"label": null,
"logBase": 1,
"max": 100,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": false,
"avg": true,
"current": true,
"max": true,
"min": true,
"rightSide": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "kubelet_volume_stats_inodes_used{cluster=\"$cluster\", job=\"kubelet\", persistentvolumeclaim=\"$volume\"} / kubelet_volume_stats_inodes{cluster=\"$cluster\", job=\"kubelet\", persistentvolumeclaim=\"$volume\"} * 100\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{ Usage }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Volume inodes Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percent",
"label": null,
"logBase": 1,
"max": 100,
"min": 0,
"show": true
},
{
"format": "percent",
"label": null,
"logBase": 1,
"max": 100,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kubelet_volume_stats_capacity_bytes, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\"}, exported_namespace)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "PersistentVolumeClaim",
"multi": false,
"name": "volume",
"options": [
],
"query": "label_values(kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", exported_namespace=\"$namespace\"}, persistentvolumeclaim)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-7d",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Persistent Volumes",
"uid": "919b92a8e8041bd567af9edab12c840c",
"version": 0
}
pods.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(container_name) (container_memory_usage_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\", container_name=~\"$container\", container_name!=\"POD\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Current: {{ container_name }}",
"refId": "A"
},
{
"expr": "sum by(container) (kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Requested: {{ container }}",
"refId": "B"
},
{
"expr": "sum by(container) (kube_pod_container_resource_limits_memory_bytes{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Limit: {{ container }}",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (container_name) (rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", image!=\"\",container_name!=\"POD\",pod_name=\"$pod\"}[1m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ container_name }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod_name=\"$pod\"}[1m])))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ pod_name }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network I/O",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Pod",
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=~\"$namespace\"}, pod)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "Container",
"multi": false,
"name": "container",
"options": [
],
"query": "label_values(kube_pod_container_info{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}, container)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Pods",
"uid": "ab4f13a9892a76a4d21ce8c2445bf4ea",
"version": 0
}
statefulset.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 2,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "cores",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod_name=~\"$statefulset.*\"}[3m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "CPU",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 3,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "GB",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod_name=~\"$statefulset.*\"}) / 1024^3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Memory",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 4,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "Bps",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod_name=~\"$statefulset.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\",pod_name=~\"$statefulset.*\"}[3m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Network",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"height": "100px",
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_replicas{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Desired Replicas",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 6,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "min(kube_statefulset_status_replicas_current{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Replicas of current version",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 7,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_status_observed_generation{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Observed Generation",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 8,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Metadata Generation",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(kube_statefulset_replicas{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas specified",
"refId": "A"
},
{
"expr": "max(kube_statefulset_status_replicas{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas created",
"refId": "B"
},
{
"expr": "min(kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "ready",
"refId": "C"
},
{
"expr": "min(kube_statefulset_status_replicas_current{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas of current version",
"refId": "D"
},
{
"expr": "min(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "updated",
"refId": "E"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Replicas",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Name",
"multi": false,
"name": "statefulset",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", namespace=\"$namespace\"}, statefulset)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / StatefulSets",
"uid": "a31c1f46e6f727cb37c0d731a7245005",
"version": 0
}

View File

@ -1,7361 +0,0 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards
namespace: monitoring
data:
deployment-dashboard.json: |+
{
"__inputs": [
{
"description": "",
"label": "prometheus",
"name": "prometheus",
"pluginId": "prometheus",
"pluginName": "Prometheus",
"type": "datasource"
}
],
"annotations": {
"list": []
},
"editable": false,
"graphTooltip": 1,
"hideControls": false,
"links": [],
"rows": [
{
"collapse": false,
"editable": false,
"height": "200px",
"panels": [
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 8,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "cores",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "CPU",
"type": "singlestat",
"valueFontSize": "110%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 9,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "GB",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "80%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"targets": [
{
"expr": "sum(container_memory_usage_bytes{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}) / 1024^3",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "Memory",
"type": "singlestat",
"valueFontSize": "110%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "Bps",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": false
},
"id": 7,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"targets": [
{
"expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "Network",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
}
],
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "100px",
"panels": [
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": false
},
"id": 5,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"metric": "kube_deployment_spec_replicas",
"refId": "A",
"step": 600
}
],
"title": "Desired Replicas",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 6,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "Available Replicas",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 3,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "max(kube_deployment_status_observed_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "Observed Generation",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 2,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "max(kube_deployment_metadata_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "Metadata Generation",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
}
],
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "350px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 1,
"isNew": true,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(kube_deployment_status_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"legendFormat": "current replicas",
"refId": "A",
"step": 30
},
{
"expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"legendFormat": "available",
"refId": "B",
"step": 30
},
{
"expr": "max(kube_deployment_status_replicas_unavailable{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"legendFormat": "unavailable",
"refId": "C",
"step": 30
},
{
"expr": "min(kube_deployment_status_replicas_updated{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"legendFormat": "updated",
"refId": "D",
"step": 30
},
{
"expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"legendFormat": "desired",
"refId": "E",
"step": 30
}
],
"title": "Replicas",
"tooltip": {
"msResolution": true,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "none",
"label": "",
"logBase": 1,
"show": true
},
{
"format": "short",
"label": "",
"logBase": 1,
"show": false
}
]
}
],
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"sharedCrosshair": false,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"allValue": ".*",
"current": {},
"datasource": "prometheus",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"multi": false,
"name": "deployment_namespace",
"options": [],
"query": "label_values(kube_deployment_metadata_generation, namespace)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": null,
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {},
"datasource": "prometheus",
"hide": 0,
"includeAll": false,
"label": "Deployment",
"multi": false,
"name": "deployment_name",
"options": [],
"query": "label_values(kube_deployment_metadata_generation{namespace=\"$deployment_namespace\"}, deployment)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "deployment",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "Deployment",
"version": 1
}
etcd-dashboard.json: |+
{
"__inputs": [
{
"name": "prometheus",
"label": "prometheus",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "4.5.2"
},
{
"type": "panel",
"id": "graph",
"name": "Graph",
"version": ""
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
},
{
"type": "panel",
"id": "singlestat",
"name": "Singlestat",
"version": ""
}
],
"annotations": {
"list": []
},
"description": "etcd sample Grafana dashboard with Prometheus",
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [],
"refresh": false,
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"error": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 28,
"interval": null,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(etcd_server_has_leader)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"metric": "etcd_server_has_leader",
"refId": "A",
"step": 20
}
],
"thresholds": "",
"title": "Up",
"type": "singlestat",
"valueFontSize": "200%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 0,
"id": 23,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 5,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(grpc_server_started_total{grpc_type=\"unary\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "RPC Rate",
"metric": "grpc_server_started_total",
"refId": "A",
"step": 4
},
{
"expr": "sum(rate(grpc_server_handled_total{grpc_type=\"unary\",grpc_code!=\"OK\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "RPC Failed Rate",
"metric": "grpc_server_handled_total",
"refId": "B",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "RPC Rate",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 0,
"id": 41,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Watch Streams",
"metric": "grpc_server_handled_total",
"refId": "A",
"step": 4
},
{
"expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Lease Streams",
"metric": "grpc_server_handled_total",
"refId": "B",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Active Streams",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": "",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Row",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"decimals": null,
"editable": false,
"error": false,
"fill": 0,
"grid": {},
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "etcd_debugging_mvcc_db_total_size_in_bytes",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 2,
"legendFormat": "{{instance}} DB Size",
"metric": "",
"refId": "A",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "DB Size",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 0,
"grid": {},
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 1,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": true,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))",
"format": "time_series",
"hide": false,
"intervalFactor": 2,
"legendFormat": "{{instance}} WAL fsync",
"metric": "etcd_disk_wal_fsync_duration_seconds_bucket",
"refId": "A",
"step": 4
},
{
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} DB fsync",
"metric": "etcd_disk_backend_commit_duration_seconds_bucket",
"refId": "B",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Disk Sync Duration",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "s",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 0,
"id": 29,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} Resident Memory",
"metric": "process_resident_memory_bytes",
"refId": "A",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Memory",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "New row",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 5,
"id": 22,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 3,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "rate(etcd_network_client_grpc_received_bytes_total[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} Client Traffic In",
"metric": "etcd_network_client_grpc_received_bytes_total",
"refId": "A",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Client Traffic In",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 5,
"id": 21,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 3,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "rate(etcd_network_client_grpc_sent_bytes_total[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} Client Traffic Out",
"metric": "etcd_network_client_grpc_sent_bytes_total",
"refId": "A",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Client Traffic Out",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 0,
"id": 20,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} Peer Traffic In",
"metric": "etcd_network_peer_received_bytes_total",
"refId": "A",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Peer Traffic In",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"decimals": null,
"editable": false,
"error": false,
"fill": 0,
"grid": {},
"id": 16,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance)",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 2,
"legendFormat": "{{instance}} Peer Traffic Out",
"metric": "etcd_network_peer_sent_bytes_total",
"refId": "A",
"step": 4
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Peer Traffic Out",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "Bps",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "New row",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 0,
"id": 40,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_server_proposals_failed_total[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Proposal Failure Rate",
"metric": "etcd_server_proposals_failed_total",
"refId": "A",
"step": 2
},
{
"expr": "sum(etcd_server_proposals_pending)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Proposal Pending Total",
"metric": "etcd_server_proposals_pending",
"refId": "B",
"step": 2
},
{
"expr": "sum(rate(etcd_server_proposals_committed_total[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Proposal Commit Rate",
"metric": "etcd_server_proposals_committed_total",
"refId": "C",
"step": 2
},
{
"expr": "sum(rate(etcd_server_proposals_applied_total[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Proposal Apply Rate",
"refId": "D",
"step": 2
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Raft Proposals",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": "",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"decimals": 0,
"editable": false,
"error": false,
"fill": 0,
"id": 19,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "changes(etcd_server_leader_changes_seen_total[1d])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} Total Leader Elections Per Day",
"metric": "etcd_server_leader_changes_seen_total",
"refId": "A",
"step": 2
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Total Leader Elections Per Day",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "New row",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-15m",
"to": "now"
},
"timepicker": {
"now": true,
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "etcd",
"version": 4
}
kubernetes-capacity-planning-dashboard.json: |+
{
"__inputs": [
{
"description": "",
"label": "prometheus",
"name": "prometheus",
"pluginId": "prometheus",
"pluginName": "Prometheus",
"type": "datasource"
}
],
"annotations": {
"list": []
},
"editable": false,
"gnetId": 22,
"graphTooltip": 0,
"hideControls": false,
"links": [],
"refresh": false,
"rows": [
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 3,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(node_cpu_seconds_total{mode=\"idle\"}[2m])) * 100",
"hide": false,
"intervalFactor": 10,
"legendFormat": "",
"refId": "A",
"step": 50
}
],
"title": "Idle CPU",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "percent",
"label": "cpu usage",
"logBase": 1,
"min": 0,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 9,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_load1)",
"intervalFactor": 4,
"legendFormat": "load 1m",
"refId": "A",
"step": 20,
"target": ""
},
{
"expr": "sum(node_load5)",
"intervalFactor": 4,
"legendFormat": "load 5m",
"refId": "B",
"step": 20,
"target": ""
},
{
"expr": "sum(node_load15)",
"intervalFactor": 4,
"legendFormat": "load 15m",
"refId": "C",
"step": 20,
"target": ""
}
],
"title": "System Load",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "percentunit",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
}
],
"showTitle": false,
"title": "New Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 4,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "node_memory_SwapFree_bytes{instance=\"172.17.0.1:9100\",job=\"prometheus\"}",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 9,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_memory_MemTotal_bytes) - sum(node_memory_MemFree_bytes) - sum(node_memory_Buffers_bytes) - sum(node_memory_Cached_bytes)",
"intervalFactor": 2,
"legendFormat": "memory usage",
"metric": "memo",
"refId": "A",
"step": 10,
"target": ""
},
{
"expr": "sum(node_memory_Buffers_bytes)",
"interval": "",
"intervalFactor": 2,
"legendFormat": "memory buffers",
"metric": "memo",
"refId": "B",
"step": 10,
"target": ""
},
{
"expr": "sum(node_memory_Cached_bytes)",
"interval": "",
"intervalFactor": 2,
"legendFormat": "memory cached",
"metric": "memo",
"refId": "C",
"step": 10,
"target": ""
},
{
"expr": "sum(node_memory_MemFree_bytes)",
"interval": "",
"intervalFactor": 2,
"legendFormat": "memory free",
"metric": "memo",
"refId": "D",
"step": 10,
"target": ""
}
],
"title": "Memory Usage",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"min": "0",
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 5,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "((sum(node_memory_MemTotal_bytes) - sum(node_memory_MemFree_bytes) - sum(node_memory_Buffers_bytes) - sum(node_memory_Cached_bytes)) / sum(node_memory_MemTotal_bytes)) * 100",
"intervalFactor": 2,
"metric": "",
"refId": "A",
"step": 60,
"target": ""
}
],
"thresholds": "80, 90",
"title": "Memory Usage",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
}
],
"showTitle": false,
"title": "New Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "246px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 6,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "read",
"yaxis": 1
},
{
"alias": "{instance=\"172.17.0.1:9100\"}",
"yaxis": 2
},
{
"alias": "io time",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(rate(node_disk_read_bytes_total[5m]))",
"hide": false,
"intervalFactor": 4,
"legendFormat": "read",
"refId": "A",
"step": 20,
"target": ""
},
{
"expr": "max(rate(node_disk_written_bytes_total[5m]))",
"intervalFactor": 4,
"legendFormat": "written",
"refId": "B",
"step": 20
},
{
"expr": "max(rate(node_disk_io_time_seconds_total[5m]))",
"intervalFactor": 4,
"legendFormat": "io time",
"refId": "C",
"step": 20
}
],
"title": "Disk I/O",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "ms",
"logBase": 1,
"show": true
}
]
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percentunit",
"gauge": {
"maxValue": 1,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 12,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "(sum(node_filesystem_size_bytes{device!=\"rootfs\"}) - sum(node_filesystem_free_bytes{device!=\"rootfs\"})) / sum(node_filesystem_size_bytes{device!=\"rootfs\"})",
"intervalFactor": 2,
"refId": "A",
"step": 60,
"target": ""
}
],
"thresholds": "0.75, 0.9",
"title": "Disk Space Usage",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"showTitle": false,
"title": "New Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 8,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "transmitted",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(node_network_receive_bytes_total{device!~\"lo\"}[5m]))",
"hide": false,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10,
"target": ""
}
],
"title": "Network Received",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "bytes",
"logBase": 1,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 10,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "transmitted",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(node_network_transmit_bytes_total{device!~\"lo\"}[5m]))",
"hide": false,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10,
"target": ""
}
],
"title": "Network Transmitted",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "bytes",
"logBase": 1,
"show": true
}
]
}
],
"showTitle": false,
"title": "New Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "276px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 11,
"isNew": true,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 11,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_info)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Current number of Pods",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_node_status_capacity_pods)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Maximum capacity of pods",
"refId": "B",
"step": 10
}
],
"title": "Cluster Pod Utilization",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 7,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 60,
"target": ""
}
],
"thresholds": "80, 90",
"title": "Pod Utilization",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"showTitle": false,
"title": "New Row",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"sharedCrosshair": false,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "Kubernetes Capacity Planning",
"version": 4
}
kubernetes-cluster-health-dashboard.json: |+
{
"__inputs": [
{
"description": "",
"label": "prometheus",
"name": "prometheus",
"pluginId": "prometheus",
"pluginName": "Prometheus",
"type": "datasource"
}
],
"annotations": {
"list": []
},
"editable": false,
"graphTooltip": 0,
"hideControls": false,
"links": [],
"refresh": "10s",
"rows": [
{
"collapse": false,
"editable": false,
"height": "254px",
"panels": [
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 1,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 600
}
],
"thresholds": "1, 3",
"title": "Control Plane Components Down",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "Everything UP and healthy",
"value": "null"
},
{
"op": "=",
"text": "",
"value": ""
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 2,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 600
}
],
"thresholds": "1, 3",
"title": "Alerts Firing",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 3,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(ALERTS{alertstate=\"pending\",alertname!=\"DeadMansSwitch\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 600
}
],
"thresholds": "3, 5",
"title": "Alerts Pending",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 4,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "count(increase(kube_pod_container_status_restarts[1h]) > 5)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 600
}
],
"thresholds": "1, 3",
"title": "Crashlooping Pods",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"showTitle": false,
"title": "Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 5,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status!=\"true\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 600
}
],
"thresholds": "1, 3",
"title": "Node Not Ready",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 6,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(kube_node_status_condition{condition=\"DiskPressure\",status=\"true\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 600
}
],
"thresholds": "1, 3",
"title": "Node Disk Pressure",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 7,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 600
}
],
"thresholds": "1, 3",
"title": "Node Memory Pressure",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 8,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(kube_node_spec_unschedulable)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 600
}
],
"thresholds": "1, 3",
"title": "Nodes Unschedulable",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"showTitle": false,
"title": "Row",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"sharedCrosshair": false,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "Kubernetes Cluster Health",
"version": 9
}
kubernetes-cluster-status-dashboard.json: |+
{
"__inputs": [
{
"description": "",
"label": "prometheus",
"name": "prometheus",
"pluginId": "prometheus",
"pluginName": "Prometheus",
"type": "datasource"
}
],
"annotations": {
"list": []
},
"editable": false,
"graphTooltip": 0,
"hideControls": false,
"links": [],
"rows": [
{
"collapse": false,
"editable": false,
"height": "129px",
"panels": [
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 5,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 6,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "1, 3",
"title": "Control Plane UP",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "UP",
"value": "null"
}
],
"valueName": "total"
},
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 6,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 6,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "3, 5",
"title": "Alerts Firing",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"showTitle": true,
"title": "Cluster Health",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "168px",
"panels": [
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 1,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "(sum(up{job=\"apiserver\"} == 1) / count(up{job=\"apiserver\"})) * 100",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "50, 80",
"title": "API Servers UP",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 2,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / count(up{job=\"kube-controller-manager\"})) * 100",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "50, 80",
"title": "Controller Managers UP",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 3,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "(sum(up{job=\"kube-scheduler\"} == 1) / count(up{job=\"kube-scheduler\"})) * 100",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "50, 80",
"title": "Schedulers UP",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"colorBackground": false,
"colorValue": true,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 4,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "count(increase(kube_pod_container_status_restarts{namespace=~\"kube-system|tectonic-system\"}[1h]) > 5)",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "1, 3",
"title": "Crashlooping Control Plane Pods",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"showTitle": true,
"title": "Control Plane Status",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "158px",
"panels": [
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 8,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "sum(100 - (avg by (instance) (rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[5m])) * 100)) / count(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"})",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "80, 90",
"title": "CPU Utilization",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 7,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "((sum(node_memory_MemTotal_bytes) - sum(node_memory_MemFree_bytes) - sum(node_memory_Buffers_bytes) - sum(node_memory_Cached_bytes)) / sum(node_memory_MemTotal_bytes)) * 100",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "80, 90",
"title": "Memory Utilization",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 9,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "(sum(node_filesystem_size_bytes{device!=\"rootfs\"}) - sum(node_filesystem_free_bytes{device!=\"rootfs\"})) / sum(node_filesystem_size_bytes{device!=\"rootfs\"})",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "80, 90",
"title": "Filesystem Utilization",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 10,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "80, 90",
"title": "Pod Utilization",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
}
],
"showTitle": true,
"title": "Capacity Planning",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"sharedCrosshair": false,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "Kubernetes Cluster Status",
"version": 3
}
kubernetes-control-plane-status-dashboard.json: |+
{
"__inputs": [
{
"description": "",
"label": "prometheus",
"name": "prometheus",
"pluginId": "prometheus",
"pluginName": "Prometheus",
"type": "datasource"
}
],
"annotations": {
"list": []
},
"editable": false,
"graphTooltip": 0,
"hideControls": false,
"links": [],
"rows": [
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 1,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "(sum(up{job=\"apiserver\"} == 1) / sum(up{job=\"apiserver\"})) * 100",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "50, 80",
"title": "API Servers UP",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 2,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / sum(up{job=\"kube-controller-manager\"})) * 100",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "50, 80",
"title": "Controller Managers UP",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 3,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "(sum(up{job=\"kube-scheduler\"} == 1) / sum(up{job=\"kube-scheduler\"})) * 100",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"thresholds": "50, 80",
"title": "Schedulers UP",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 4,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "max(sum by(instance) (rate(apiserver_request_count{code=~\"5..\"}[5m])) / sum by(instance) (rate(apiserver_request_count[5m]))) * 100",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 600
}
],
"thresholds": "5, 10",
"title": "API Server Request Error Rate",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "avg"
}
],
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 7,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(verb) (rate(apiserver_latency_seconds:quantile[5m]) >= 0)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 30
}
],
"title": "API Server Request Latency",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
}
],
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 5,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "cluster:scheduler_e2e_scheduling_latency_seconds:quantile",
"format": "time_series",
"intervalFactor": 2,
"refId": "A",
"step": 60
}
],
"title": "End to End Scheduling Latency",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"logBase": 1,
"show": true
},
{
"format": "dtdurations",
"logBase": 1,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 6,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(instance) (rate(apiserver_request_count{code!~\"2..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Error Rate",
"refId": "A",
"step": 60
},
{
"expr": "sum by(instance) (rate(apiserver_request_count[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Request Rate",
"refId": "B",
"step": 60
}
],
"title": "API Server Request Rates",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
}
],
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"sharedCrosshair": false,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "Kubernetes Control Plane Status",
"version": 3
}
kubernetes-resource-requests-dashboard.json: |+
{
"__inputs": [
{
"description": "",
"label": "prometheus",
"name": "prometheus",
"pluginId": "prometheus",
"pluginName": "Prometheus",
"type": "datasource"
}
],
"annotations": {
"list": []
},
"editable": false,
"graphTooltip": 0,
"hideControls": false,
"links": [],
"refresh": false,
"rows": [
{
"collapse": false,
"editable": false,
"height": "300px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"description": "This represents the total [CPU resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) in the cluster.\nFor comparison the total [allocatable CPU cores](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 1,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "min(sum(kube_node_status_allocatable_cpu_cores) by (instance))",
"hide": false,
"intervalFactor": 2,
"legendFormat": "Allocatable CPU Cores",
"refId": "A",
"step": 20
},
{
"expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance))",
"hide": false,
"intervalFactor": 2,
"legendFormat": "Requested CPU Cores",
"refId": "B",
"step": 20
}
],
"title": "CPU Cores",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": "CPU Cores",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 2,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"targets": [
{
"expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance)) / min(sum(kube_node_status_allocatable_cpu_cores) by (instance)) * 100",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 240
}
],
"thresholds": "80, 90",
"title": "CPU Cores",
"transparent": false,
"type": "singlestat",
"valueFontSize": "110%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
}
],
"showTitle": false,
"title": "CPU Cores",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "300px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"description": "This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) in the cluster.\nFor comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 3,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "min(sum(kube_node_status_allocatable_memory_bytes) by (instance))",
"hide": false,
"intervalFactor": 2,
"legendFormat": "Allocatable Memory",
"refId": "A",
"step": 20
},
{
"expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance))",
"hide": false,
"intervalFactor": 2,
"legendFormat": "Requested Memory",
"refId": "B",
"step": 20
}
],
"title": "Memory",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"label": "Memory",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 4,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"targets": [
{
"expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance)) / min(sum(kube_node_status_allocatable_memory_bytes) by (instance)) * 100",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 240
}
],
"thresholds": "80, 90",
"title": "Memory",
"transparent": false,
"type": "singlestat",
"valueFontSize": "110%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
}
],
"showTitle": false,
"title": "Memory",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"sharedCrosshair": false,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-3h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "Kubernetes Resource Requests",
"version": 2
}
nodes-dashboard.json: |+
{
"__inputs": [
{
"description": "",
"label": "prometheus",
"name": "prometheus",
"pluginId": "prometheus",
"pluginName": "Prometheus",
"type": "datasource"
}
],
"annotations": {
"list": []
},
"description": "Dashboard to get an overview of one server",
"editable": false,
"gnetId": 22,
"graphTooltip": 0,
"hideControls": false,
"links": [],
"refresh": false,
"rows": [
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 3,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "100 - (avg by (cpu) (irate(node_cpu_seconds_total{mode=\"idle\", instance=\"$server\"}[5m])) * 100)",
"hide": false,
"intervalFactor": 10,
"legendFormat": "{{cpu}}",
"refId": "A",
"step": 50
}
],
"title": "Idle CPU",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "percent",
"label": "cpu usage",
"logBase": 1,
"max": 100,
"min": 0,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 9,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node_load1{instance=\"$server\"}",
"intervalFactor": 4,
"legendFormat": "load 1m",
"refId": "A",
"step": 20,
"target": ""
},
{
"expr": "node_load5{instance=\"$server\"}",
"intervalFactor": 4,
"legendFormat": "load 5m",
"refId": "B",
"step": 20,
"target": ""
},
{
"expr": "node_load15{instance=\"$server\"}",
"intervalFactor": 4,
"legendFormat": "load 15m",
"refId": "C",
"step": 20,
"target": ""
}
],
"title": "System Load",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "percentunit",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
}
],
"showTitle": false,
"title": "New Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 4,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "node_memory_SwapFree_bytes{instance=\"172.17.0.1:9100\",job=\"prometheus\"}",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 9,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node_memory_MemTotal_bytes{instance=\"$server\"} - node_memory_MemFree_bytes{instance=\"$server\"} - node_memory_Buffers_bytes{instance=\"$server\"} - node_memory_Cached_bytes{instance=\"$server\"}",
"hide": false,
"interval": "",
"intervalFactor": 2,
"legendFormat": "memory used",
"metric": "",
"refId": "C",
"step": 10
},
{
"expr": "node_memory_Buffers_bytes{instance=\"$server\"}",
"interval": "",
"intervalFactor": 2,
"legendFormat": "memory buffers",
"metric": "",
"refId": "E",
"step": 10
},
{
"expr": "node_memory_Cached{instance=\"$server\"}",
"intervalFactor": 2,
"legendFormat": "memory cached",
"metric": "",
"refId": "F",
"step": 10
},
{
"expr": "node_memory_MemFree_bytes{instance=\"$server\"}",
"intervalFactor": 2,
"legendFormat": "memory free",
"metric": "",
"refId": "D",
"step": 10
}
],
"title": "Memory Usage",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"min": "0",
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 5,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "((node_memory_MemTotal_bytes{instance=\"$server\"} - node_memory_MemFree_bytes{instance=\"$server\"} - node_memory_Buffers_bytes{instance=\"$server\"} - node_memory_Cached_bytes{instance=\"$server\"}) / node_memory_MemTotal_bytes{instance=\"$server\"}) * 100",
"intervalFactor": 2,
"refId": "A",
"step": 60,
"target": ""
}
],
"thresholds": "80, 90",
"title": "Memory Usage",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
}
],
"showTitle": false,
"title": "New Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 6,
"isNew": true,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "read",
"yaxis": 1
},
{
"alias": "{instance=\"172.17.0.1:9100\"}",
"yaxis": 2
},
{
"alias": "io time",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (instance) (rate(node_disk_read_bytes_total{instance=\"$server\"}[2m]))",
"hide": false,
"intervalFactor": 4,
"legendFormat": "read",
"refId": "A",
"step": 20,
"target": ""
},
{
"expr": "sum by (instance) (rate(node_disk_written_bytes_total{instance=\"$server\"}[2m]))",
"intervalFactor": 4,
"legendFormat": "written",
"refId": "B",
"step": 20
},
{
"expr": "sum by (instance) (rate(node_disk_io_time_seconds_total{instance=\"$server\"}[2m]))",
"intervalFactor": 4,
"legendFormat": "io time",
"refId": "C",
"step": 20
}
],
"title": "Disk I/O",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "ms",
"logBase": 1,
"show": true
}
]
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "prometheus",
"editable": false,
"format": "percentunit",
"gauge": {
"maxValue": 1,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"hideTimeOverride": false,
"id": 7,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "(sum(node_filesystem_size_bytes{device!=\"rootfs\",instance=\"$server\"}) - sum(node_filesystem_free_bytes{device!=\"rootfs\",instance=\"$server\"})) / sum(node_filesystem_size_bytes{device!=\"rootfs\",instance=\"$server\"})",
"intervalFactor": 2,
"refId": "A",
"step": 60,
"target": ""
}
],
"thresholds": "0.75, 0.9",
"title": "Disk Space Usage",
"transparent": false,
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"showTitle": false,
"title": "New Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 8,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "transmitted",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_receive_bytes_total{instance=\"$server\",device!~\"lo\"}[5m])",
"hide": false,
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A",
"step": 10,
"target": ""
}
],
"title": "Network Received",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "bytes",
"logBase": 1,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 10,
"isNew": false,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "transmitted",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_transmit_bytes_total{instance=\"$server\",device!~\"lo\"}[5m])",
"hide": false,
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "B",
"step": 10,
"target": ""
}
],
"title": "Network Transmitted",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "bytes",
"logBase": 1,
"show": true
}
]
}
],
"showTitle": false,
"title": "New Row",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"sharedCrosshair": false,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"allValue": null,
"current": {},
"datasource": "prometheus",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "server",
"options": [],
"query": "label_values(node_boot_time_seconds, instance)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "Nodes",
"version": 2
}
pods-dashboard.json: |+
{
"__inputs": [
{
"description": "",
"label": "prometheus",
"name": "prometheus",
"pluginId": "prometheus",
"pluginName": "Prometheus",
"type": "datasource"
}
],
"annotations": {
"list": []
},
"editable": false,
"graphTooltip": 1,
"hideControls": false,
"links": [],
"refresh": false,
"rows": [
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 1,
"isNew": false,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(container_name) (container_memory_usage_bytes{pod_name=\"$pod\", container_name=~\"$container\", container_name!=\"POD\"})",
"interval": "10s",
"intervalFactor": 1,
"legendFormat": "Current: {{ container_name }}",
"metric": "container_memory_usage_bytes",
"refId": "A",
"step": 15
},
{
"expr": "kube_pod_container_resource_requests_memory_bytes{pod=\"$pod\", container=~\"$container\"}",
"interval": "10s",
"intervalFactor": 2,
"legendFormat": "Requested: {{ container }}",
"metric": "kube_pod_container_resource_requests_memory_bytes",
"refId": "B",
"step": 20
},
{
"expr": "kube_pod_container_resource_limits_memory_bytes{pod=\"$pod\", container=~\"$container\"}",
"interval": "10s",
"intervalFactor": 2,
"legendFormat": "Limit: {{ container }}",
"metric": "kube_pod_container_resource_limits_memory_bytes",
"refId": "C",
"step": 20
}
],
"title": "Memory Usage",
"tooltip": {
"msResolution": true,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
}
],
"showTitle": false,
"title": "Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 2,
"isNew": false,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (container_name)(rate(container_cpu_usage_seconds_total{image!=\"\",container_name!=\"POD\",pod_name=\"$pod\"}[1m]))",
"intervalFactor": 2,
"legendFormat": "{{ container_name }}",
"refId": "A",
"step": 30
},
{
"expr": "kube_pod_container_resource_requests_cpu_cores{pod=\"$pod\", container=~\"$container\"}",
"interval": "10s",
"intervalFactor": 2,
"legendFormat": "Requested: {{ container }}",
"metric": "kube_pod_container_resource_requests_cpu_cores",
"refId": "B",
"step": 20
},
{
"expr": "kube_pod_container_resource_limits_cpu_cores{pod=\"$pod\", container=~\"$container\"}",
"interval": "10s",
"intervalFactor": 2,
"legendFormat": "Limit: {{ container }}",
"metric": "kube_pod_container_resource_limits_memory_bytes",
"refId": "C",
"step": 20
}
],
"title": "CPU Usage",
"tooltip": {
"msResolution": true,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
}
],
"showTitle": false,
"title": "Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "250px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 3,
"isNew": false,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{pod_name=\"$pod\"}[1m])))",
"intervalFactor": 2,
"legendFormat": "{{ pod_name }}",
"refId": "A",
"step": 30
}
],
"title": "Network I/O",
"tooltip": {
"msResolution": true,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"logBase": 1,
"show": true
},
{
"format": "short",
"logBase": 1,
"show": true
}
]
}
],
"showTitle": false,
"title": "New Row",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"sharedCrosshair": false,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"allValue": ".*",
"current": {},
"datasource": "prometheus",
"hide": 0,
"includeAll": true,
"label": "Namespace",
"multi": false,
"name": "namespace",
"options": [],
"query": "label_values(kube_pod_info, namespace)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {},
"datasource": "prometheus",
"hide": 0,
"includeAll": false,
"label": "Pod",
"multi": false,
"name": "pod",
"options": [],
"query": "label_values(kube_pod_info{namespace=~\"$namespace\"}, pod)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".*",
"current": {},
"datasource": "prometheus",
"hide": 0,
"includeAll": true,
"label": "Container",
"multi": false,
"name": "container",
"options": [],
"query": "label_values(kube_pod_container_info{namespace=\"$namespace\", pod=\"$pod\"}, container)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "Pods",
"version": 1
}
statefulset-dashboard.json: |+
{
"__inputs": [
{
"description": "",
"label": "prometheus",
"name": "prometheus",
"pluginId": "prometheus",
"pluginName": "Prometheus",
"type": "datasource"
}
],
"annotations": {
"list": []
},
"editable": false,
"graphTooltip": 1,
"hideControls": false,
"links": [],
"rows": [
{
"collapse": false,
"editable": false,
"height": "200px",
"panels": [
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 8,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "cores",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "CPU",
"type": "singlestat",
"valueFontSize": "110%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 9,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "GB",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "80%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"targets": [
{
"expr": "sum(container_memory_usage_bytes{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}) / 1024^3",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "Memory",
"type": "singlestat",
"valueFontSize": "110%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "Bps",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": false
},
"id": 7,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"targets": [
{
"expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "Network",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
}
],
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "100px",
"panels": [
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": false
},
"id": 5,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"metric": "kube_statefulset_replicas",
"refId": "A",
"step": 600
}
],
"title": "Desired Replicas",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 6,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "Available Replicas",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 3,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "max(kube_statefulset_status_observed_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "Observed Generation",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "prometheus",
"editable": false,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 2,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"targets": [
{
"expr": "max(kube_statefulset_metadata_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"refId": "A",
"step": 600
}
],
"title": "Metadata Generation",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
}
],
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
},
{
"collapse": false,
"editable": false,
"height": "350px",
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "prometheus",
"editable": false,
"error": false,
"fill": 1,
"grid": {
"threshold1Color": "rgba(216, 200, 27, 0.27)",
"threshold2Color": "rgba(234, 112, 112, 0.22)"
},
"id": 1,
"isNew": true,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"legendFormat": "available",
"refId": "B",
"step": 30
},
{
"expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
"intervalFactor": 2,
"legendFormat": "desired",
"refId": "E",
"step": 30
}
],
"title": "Replicas",
"tooltip": {
"msResolution": true,
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"mode": "time",
"show": true,
"values": []
},
"yaxes": [
{
"format": "none",
"label": "",
"logBase": 1,
"show": true
},
{
"format": "short",
"label": "",
"logBase": 1,
"show": false
}
]
}
],
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"sharedCrosshair": false,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"allValue": ".*",
"current": {},
"datasource": "prometheus",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"multi": false,
"name": "statefulset_namespace",
"options": [],
"query": "label_values(kube_statefulset_metadata_generation, namespace)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": null,
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {},
"datasource": "prometheus",
"hide": 0,
"includeAll": false,
"label": "StatefulSet",
"multi": false,
"name": "statefulset_name",
"options": [],
"query": "label_values(kube_statefulset_metadata_generation{namespace=\"$statefulset_namespace\"}, statefulset)",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "statefulset",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "StatefulSet",
"version": 1
}
---

View File

@ -10,7 +10,15 @@ data:
- name: prometheus - name: prometheus
type: prometheus type: prometheus
access: proxy access: proxy
orgId: 1
url: http://prometheus.monitoring.svc.cluster.local url: http://prometheus.monitoring.svc.cluster.local
version: 1 version: 1
editable: false editable: false
loki.yaml: |+
apiVersion: 1
datasources:
- name: loki
type: loki
access: proxy
url: http://loki.monitoring.svc.cluster.local
version: 1
editable: false

View File

@ -23,44 +23,55 @@ spec:
spec: spec:
containers: containers:
- name: grafana - name: grafana
image: grafana/grafana:5.4.3 image: grafana/grafana:6.0.2
env: env:
- name: GF_SERVER_HTTP_PORT - name: GF_PATHS_CONFIG
value: "8080" value: "/etc/grafana/custom.ini"
- name: GF_AUTH_BASIC_ENABLED
value: "false"
- name: GF_AUTH_DISABLE_LOGIN_FORM
value: "true"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
value: Viewer
- name: GF_ANALYTICS_REPORTING_ENABLED
value: "false"
ports: ports:
- name: http - name: http
containerPort: 8080 containerPort: 8080
livenessProbe:
httpGet:
path: /metrics
port: 8080
initialDelaySeconds: 10
readinessProbe:
httpGet:
path: /api/health
port: 8080
initialDelaySeconds: 10
resources: resources:
requests: requests:
memory: 100Mi
cpu: 100m cpu: 100m
memory: 100Mi
limits: limits:
memory: 200Mi
cpu: 200m cpu: 200m
memory: 200Mi
volumeMounts: volumeMounts:
- name: config
mountPath: /etc/grafana
- name: datasources - name: datasources
mountPath: /etc/grafana/provisioning/datasources mountPath: /etc/grafana/provisioning/datasources
- name: dashboard-providers - name: providers
mountPath: /etc/grafana/provisioning/dashboards mountPath: /etc/grafana/provisioning/dashboards
- name: dashboards - name: dashboards-etcd
mountPath: /var/lib/grafana/dashboards mountPath: /etc/grafana/dashboards/etcd
- name: dashboards-k8s
mountPath: /etc/grafana/dashboards/k8s
volumes: volumes:
- name: config
configMap:
name: grafana-config
- name: datasources - name: datasources
configMap: configMap:
name: grafana-datasources name: grafana-datasources
- name: dashboard-providers - name: providers
configMap: configMap:
name: grafana-dashboard-providers name: grafana-providers
- name: dashboards - name: dashboards-etcd
configMap: configMap:
name: grafana-dashboards name: grafana-dashboards-etcd
- name: dashboards-k8s
configMap:
name: grafana-dashboards-k8s

View File

@ -1,10 +1,10 @@
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap
metadata: metadata:
name: grafana-dashboard-providers name: grafana-providers
namespace: monitoring namespace: monitoring
data: data:
dashboard-providers.yaml: |+ providers.yaml: |+
apiVersion: 1 apiVersion: 1
providers: providers:
- name: 'default' - name: 'default'
@ -12,4 +12,4 @@ data:
folder: '' folder: ''
type: file type: file
options: options:
path: /var/lib/grafana/dashboards path: /etc/grafana/dashboards

View File

@ -1,12 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: heapster
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: heapster
subjects:
- kind: ServiceAccount
name: heapster
namespace: kube-system

View File

@ -1,30 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: heapster
rules:
- apiGroups:
- ""
resources:
- events
- namespaces
- nodes
- pods
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- deployments
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- nodes/stats
verbs:
- get

View File

@ -1,62 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: heapster
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
name: heapster
phase: prod
template:
metadata:
labels:
name: heapster
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
serviceAccountName: heapster
containers:
- name: heapster
image: k8s.gcr.io/heapster-amd64:v1.5.4
command:
- /heapster
- --source=kubernetes.summary_api:''?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250&insecure=true
livenessProbe:
httpGet:
path: /healthz
port: 8082
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 5
- name: heapster-nanny
image: k8s.gcr.io/addon-resizer:1.7
command:
- /pod_nanny
- --cpu=80m
- --extra-cpu=0.5m
- --memory=140Mi
- --extra-memory=4Mi
- --threshold=5
- --deployment=heapster
- --container=heapster
- --poll-period=300000
- --estimator=exponential
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
limits:
cpu: 50m
memory: 90Mi
requests:
cpu: 50m
memory: 90Mi

View File

@ -1,13 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: heapster
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: system:pod-nanny
subjects:
- kind: ServiceAccount
name: heapster
namespace: kube-system

View File

@ -1,19 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: system:pod-nanny
namespace: kube-system
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- "extensions"
resources:
- deployments
verbs:
- get
- update

View File

@ -1,5 +0,0 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: heapster
namespace: kube-system

View File

@ -1,12 +0,0 @@
apiVersion: v1
kind: Service
metadata:
name: heapster
namespace: kube-system
spec:
type: ClusterIP
selector:
name: heapster
ports:
- port: 80
targetPort: 8082

View File

@ -1,42 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: default-backend
namespace: ingress
spec:
replicas: 1
selector:
matchLabels:
name: default-backend
phase: prod
template:
metadata:
labels:
name: default-backend
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
containers:
- name: default-backend
# Any image is permissable as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: k8s.gcr.io/defaultbackend:1.4
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
terminationGracePeriodSeconds: 60

View File

@ -1,15 +0,0 @@
apiVersion: v1
kind: Service
metadata:
name: default-backend
namespace: ingress
spec:
type: ClusterIP
selector:
name: default-backend
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080

View File

@ -24,10 +24,9 @@ spec:
node-role.kubernetes.io/node: "" node-role.kubernetes.io/node: ""
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.22.0 image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-backend
- --ingress-class=public - --ingress-class=public
# use downward API # use downward API
env: env:
@ -58,7 +57,7 @@ spec:
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
timeoutSeconds: 1 timeoutSeconds: 5
readinessProbe: readinessProbe:
failureThreshold: 3 failureThreshold: 3
httpGet: httpGet:
@ -67,7 +66,7 @@ spec:
scheme: HTTP scheme: HTTP
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
timeoutSeconds: 1 timeoutSeconds: 5
securityContext: securityContext:
capabilities: capabilities:
add: add:

View File

@ -1,42 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: default-backend
namespace: ingress
spec:
replicas: 1
selector:
matchLabels:
name: default-backend
phase: prod
template:
metadata:
labels:
name: default-backend
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
containers:
- name: default-backend
# Any image is permissable as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: k8s.gcr.io/defaultbackend:1.4
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
terminationGracePeriodSeconds: 60

View File

@ -1,15 +0,0 @@
apiVersion: v1
kind: Service
metadata:
name: default-backend
namespace: ingress
spec:
type: ClusterIP
selector:
name: default-backend
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080

View File

@ -24,10 +24,9 @@ spec:
node-role.kubernetes.io/node: "" node-role.kubernetes.io/node: ""
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.22.0 image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-backend
- --ingress-class=public - --ingress-class=public
# use downward API # use downward API
env: env:
@ -58,7 +57,7 @@ spec:
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
timeoutSeconds: 1 timeoutSeconds: 5
readinessProbe: readinessProbe:
failureThreshold: 3 failureThreshold: 3
httpGet: httpGet:
@ -67,7 +66,7 @@ spec:
scheme: HTTP scheme: HTTP
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
timeoutSeconds: 1 timeoutSeconds: 5
securityContext: securityContext:
capabilities: capabilities:
add: add:

View File

@ -1,42 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: default-backend
namespace: ingress
spec:
replicas: 1
selector:
matchLabels:
name: default-backend
phase: prod
template:
metadata:
labels:
name: default-backend
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
containers:
- name: default-backend
# Any image is permissable as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: k8s.gcr.io/defaultbackend:1.4
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
terminationGracePeriodSeconds: 60

View File

@ -1,15 +0,0 @@
apiVersion: v1
kind: Service
metadata:
name: default-backend
namespace: ingress
spec:
type: ClusterIP
selector:
name: default-backend
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080

View File

@ -22,10 +22,9 @@ spec:
spec: spec:
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.22.0 image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-backend
- --ingress-class=public - --ingress-class=public
# use downward API # use downward API
env: env:
@ -53,7 +52,7 @@ spec:
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
failureThreshold: 3 failureThreshold: 3
timeoutSeconds: 1 timeoutSeconds: 5
readinessProbe: readinessProbe:
httpGet: httpGet:
path: /healthz path: /healthz
@ -62,7 +61,7 @@ spec:
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
failureThreshold: 3 failureThreshold: 3
timeoutSeconds: 1 timeoutSeconds: 5
securityContext: securityContext:
capabilities: capabilities:
add: add:

View File

@ -24,10 +24,9 @@ spec:
node-role.kubernetes.io/node: "" node-role.kubernetes.io/node: ""
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.22.0 image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-backend
- --ingress-class=public - --ingress-class=public
# use downward API # use downward API
env: env:
@ -58,7 +57,7 @@ spec:
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
timeoutSeconds: 1 timeoutSeconds: 5
readinessProbe: readinessProbe:
failureThreshold: 3 failureThreshold: 3
httpGet: httpGet:
@ -67,7 +66,7 @@ spec:
scheme: HTTP scheme: HTTP
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
timeoutSeconds: 1 timeoutSeconds: 5
securityContext: securityContext:
capabilities: capabilities:
add: add:

View File

@ -1,42 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: default-backend
namespace: ingress
spec:
replicas: 1
selector:
matchLabels:
name: default-backend
phase: prod
template:
metadata:
labels:
name: default-backend
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
containers:
- name: default-backend
# Any image is permissable as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: k8s.gcr.io/defaultbackend:1.4
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
terminationGracePeriodSeconds: 60

View File

@ -1,15 +0,0 @@
apiVersion: v1
kind: Service
metadata:
name: default-backend
namespace: ingress
spec:
type: ClusterIP
selector:
name: default-backend
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080

View File

@ -1,42 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: default-backend
namespace: ingress
spec:
replicas: 1
selector:
matchLabels:
name: default-backend
phase: prod
template:
metadata:
labels:
name: default-backend
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
containers:
- name: default-backend
# Any image is permissable as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: k8s.gcr.io/defaultbackend:1.4
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
terminationGracePeriodSeconds: 60

View File

@ -1,15 +0,0 @@
apiVersion: v1
kind: Service
metadata:
name: default-backend
namespace: ingress
spec:
type: ClusterIP
selector:
name: default-backend
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080

View File

@ -24,10 +24,9 @@ spec:
node-role.kubernetes.io/node: "" node-role.kubernetes.io/node: ""
containers: containers:
- name: nginx-ingress-controller - name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.22.0 image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
args: args:
- /nginx-ingress-controller - /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-backend
- --ingress-class=public - --ingress-class=public
# use downward API # use downward API
env: env:
@ -58,7 +57,7 @@ spec:
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
timeoutSeconds: 1 timeoutSeconds: 5
readinessProbe: readinessProbe:
failureThreshold: 3 failureThreshold: 3
httpGet: httpGet:
@ -67,7 +66,7 @@ spec:
scheme: HTTP scheme: HTTP
periodSeconds: 10 periodSeconds: 10
successThreshold: 1 successThreshold: 1
timeoutSeconds: 1 timeoutSeconds: 5
securityContext: securityContext:
capabilities: capabilities:
add: add:

View File

@ -55,6 +55,17 @@ data:
action: replace action: replace
target_label: job target_label: job
metric_relabel_configs:
- source_labels: [__name__]
action: drop
regex: etcd_(debugging|disk|request|server).*
- source_labels: [__name__]
action: drop
regex: apiserver_admission_controller_admission_latencies_seconds_.*
- source_labels: [__name__]
action: drop
regex: apiserver_admission_step_admission_latencies_seconds_.*
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore # Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
# metrics from a node by scraping kubelet (127.0.0.1:10250/metrics). # metrics from a node by scraping kubelet (127.0.0.1:10250/metrics).
- job_name: 'kubelet' - job_name: 'kubelet'
@ -89,6 +100,13 @@ data:
relabel_configs: relabel_configs:
- action: labelmap - action: labelmap
regex: __meta_kubernetes_node_label_(.+) regex: __meta_kubernetes_node_label_(.+)
metric_relabel_configs:
- source_labels: [__name__, image]
action: drop
regex: container_([a-z_]+);
- source_labels: [__name__]
action: drop
regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
# Scrap etcd metrics from controllers via listen-metrics-urls # Scrap etcd metrics from controllers via listen-metrics-urls
@ -119,10 +137,10 @@ data:
# * `prometheus.io/port`: If the metrics are exposed on a different port to the # * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately. # service then set this appropriately.
- job_name: 'kubernetes-service-endpoints' - job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs: kubernetes_sd_configs:
- role: endpoints - role: endpoints
honor_labels: true
relabel_configs: relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep action: keep
@ -144,11 +162,19 @@ data:
regex: __meta_kubernetes_service_label_(.+) regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace] - source_labels: [__meta_kubernetes_namespace]
action: replace action: replace
target_label: kubernetes_namespace target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name] - source_labels: [__meta_kubernetes_service_name]
action: replace action: replace
target_label: job target_label: job
metric_relabel_configs:
- source_labels: [__name__]
action: drop
regex: etcd_(debugging|disk|request|server).*
# Example scrape config for probing services via the Blackbox Exporter. # Example scrape config for probing services via the Blackbox Exporter.
# #
# The relabeling allows the actual service scrape endpoint to be configured # The relabeling allows the actual service scrape endpoint to be configured
@ -177,7 +203,7 @@ data:
- action: labelmap - action: labelmap
regex: __meta_kubernetes_service_label_(.+) regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace] - source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace target_label: namespace
- source_labels: [__meta_kubernetes_service_name] - source_labels: [__meta_kubernetes_service_name]
target_label: job target_label: job

View File

@ -20,7 +20,7 @@ spec:
serviceAccountName: prometheus serviceAccountName: prometheus
containers: containers:
- name: prometheus - name: prometheus
image: quay.io/prometheus/prometheus:v2.7.1 image: quay.io/prometheus/prometheus:v2.8.0
args: args:
- --web.listen-address=0.0.0.0:9090 - --web.listen-address=0.0.0.0:9090
- --config.file=/etc/prometheus/prometheus.yaml - --config.file=/etc/prometheus/prometheus.yaml
@ -28,6 +28,10 @@ spec:
ports: ports:
- name: web - name: web
containerPort: 9090 containerPort: 9090
resources:
requests:
cpu: 100m
memory: 200Mi
volumeMounts: volumeMounts:
- name: config - name: config
mountPath: /etc/prometheus mountPath: /etc/prometheus
@ -35,18 +39,18 @@ spec:
mountPath: /etc/prometheus/rules mountPath: /etc/prometheus/rules
- name: data - name: data
mountPath: /var/lib/prometheus mountPath: /var/lib/prometheus
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 10
timeoutSeconds: 10
livenessProbe: livenessProbe:
httpGet: httpGet:
path: /-/healthy path: /-/healthy
port: 9090 port: 9090
initialDelaySeconds: 10 initialDelaySeconds: 10
timeoutSeconds: 10 timeoutSeconds: 10
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 10
timeoutSeconds: 10
terminationGracePeriodSeconds: 30 terminationGracePeriodSeconds: 30
volumes: volumes:
- name: config - name: config

View File

@ -0,0 +1,28 @@
# Allow Grafana access and in-cluster Prometheus scraping
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: prometheus
namespace: monitoring
spec:
podSelector:
matchLabels:
name: prometheus
ingress:
- ports:
- protocol: TCP
port: 9090
from:
- namespaceSelector:
matchLabels:
name: monitoring
podSelector:
matchLabels:
name: grafana
- namespaceSelector:
matchLabels:
name: monitoring
podSelector:
matchLabels:
name: prometheus

View File

@ -4,582 +4,1113 @@ metadata:
name: prometheus-rules name: prometheus-rules
namespace: monitoring namespace: monitoring
data: data:
alertmanager.rules.yaml: | etcd.yaml: |-
groups: {
- name: alertmanager.rules "groups": [
rules: {
- alert: AlertmanagerConfigInconsistent "name": "etcd",
expr: count_values("config_hash", alertmanager_config_hash) BY (service) / ON(service) "rules": [
GROUP_LEFT() label_replace(prometheus_operator_alertmanager_spec_replicas, "service", {
"alertmanager-$1", "alertmanager", "(.*)") != 1 "alert": "etcdInsufficientMembers",
for: 5m "annotations": {
labels: "message": "etcd cluster \"{{ $labels.job }}\": insufficient members ({{ $value }})."
severity: critical },
annotations: "expr": "sum(up{job=~\".*etcd.*\"} == bool 1) by (job) < ((count(up{job=~\".*etcd.*\"}) by (job) + 1) / 2)\n",
description: The configuration of the instances of the Alertmanager cluster "for": "3m",
`{{$labels.service}}` are out of sync. "labels": {
- alert: AlertmanagerDownOrMissing "severity": "critical"
expr: label_replace(prometheus_operator_alertmanager_spec_replicas, "job", "alertmanager-$1", }
"alertmanager", "(.*)") / ON(job) GROUP_RIGHT() sum(up) BY (job) != 1 },
for: 5m {
labels: "alert": "etcdNoLeader",
severity: warning "annotations": {
annotations: "message": "etcd cluster \"{{ $labels.job }}\": member {{ $labels.instance }} has no leader."
description: An unexpected number of Alertmanagers are scraped or Alertmanagers },
disappeared from discovery. "expr": "etcd_server_has_leader{job=~\".*etcd.*\"} == 0\n",
- alert: AlertmanagerFailedReload "for": "1m",
expr: alertmanager_config_last_reload_successful == 0 "labels": {
for: 10m "severity": "critical"
labels: }
severity: warning },
annotations: {
description: Reloading Alertmanager's configuration has failed for {{ $labels.namespace "alert": "etcdHighNumberOfLeaderChanges",
}}/{{ $labels.pod}}. "annotations": {
etcd3.rules.yaml: | "message": "etcd cluster \"{{ $labels.job }}\": instance {{ $labels.instance }} has seen {{ $value }} leader changes within the last 30 minutes."
groups: },
- name: ./etcd3.rules "expr": "rate(etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}[15m]) > 3\n",
rules: "for": "15m",
- alert: InsufficientMembers "labels": {
expr: count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1) "severity": "warning"
for: 3m }
labels: },
severity: critical {
annotations: "alert": "etcdGRPCRequestsSlow",
description: If one more etcd member goes down the cluster will be unavailable "annotations": {
summary: etcd cluster insufficient members "message": "etcd cluster \"{{ $labels.job }}\": gRPC requests to {{ $labels.grpc_method }} are taking {{ $value }}s on etcd instance {{ $labels.instance }}."
- alert: NoLeader },
expr: etcd_server_has_leader{job="etcd"} == 0 "expr": "histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job=~\".*etcd.*\", grpc_type=\"unary\"}[5m])) by (job, instance, grpc_service, grpc_method, le))\n> 0.15\n",
for: 1m "for": "10m",
labels: "labels": {
severity: critical "severity": "critical"
annotations: }
description: etcd member {{ $labels.instance }} has no leader },
summary: etcd member has no leader {
- alert: HighNumberOfLeaderChanges "alert": "etcdMemberCommunicationSlow",
expr: increase(etcd_server_leader_changes_seen_total{job="etcd"}[1h]) > 3 "annotations": {
labels: "message": "etcd cluster \"{{ $labels.job }}\": member communication with {{ $labels.To }} is taking {{ $value }}s on etcd instance {{ $labels.instance }}."
severity: warning },
annotations: "expr": "histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket{job=~\".*etcd.*\"}[5m]))\n> 0.15\n",
description: etcd instance {{ $labels.instance }} has seen {{ $value }} leader "for": "10m",
changes within the last hour "labels": {
summary: a high number of leader changes within the etcd cluster are happening "severity": "warning"
- alert: GRPCRequestsSlow }
expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job="etcd",grpc_type="unary"}[5m])) by (grpc_service, grpc_method, le)) },
> 0.15 {
for: 10m "alert": "etcdHighNumberOfFailedProposals",
labels: "annotations": {
severity: critical "message": "etcd cluster \"{{ $labels.job }}\": {{ $value }} proposal failures within the last 30 minutes on etcd instance {{ $labels.instance }}."
annotations: },
description: on etcd instance {{ $labels.instance }} gRPC requests to {{ $labels.grpc_method "expr": "rate(etcd_server_proposals_failed_total{job=~\".*etcd.*\"}[15m]) > 5\n",
}} are slow "for": "15m",
summary: slow gRPC requests "labels": {
- alert: HighNumberOfFailedHTTPRequests "severity": "warning"
expr: sum(rate(etcd_http_failed_total{job="etcd"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job="etcd"}[5m])) }
BY (method) > 0.01 },
for: 10m {
labels: "alert": "etcdHighFsyncDurations",
severity: warning "annotations": {
annotations: "message": "etcd cluster \"{{ $labels.job }}\": 99th percentile fync durations are {{ $value }}s on etcd instance {{ $labels.instance }}."
description: '{{ $value }}% of requests for {{ $labels.method }} failed on etcd },
instance {{ $labels.instance }}' "expr": "histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{job=~\".*etcd.*\"}[5m]))\n> 0.5\n",
summary: a high number of HTTP requests are failing "for": "10m",
- alert: HighNumberOfFailedHTTPRequests "labels": {
expr: sum(rate(etcd_http_failed_total{job="etcd"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job="etcd"}[5m])) "severity": "warning"
BY (method) > 0.05 }
for: 5m },
labels: {
severity: critical "alert": "etcdHighCommitDurations",
annotations: "annotations": {
description: '{{ $value }}% of requests for {{ $labels.method }} failed on etcd "message": "etcd cluster \"{{ $labels.job }}\": 99th percentile commit durations {{ $value }}s on etcd instance {{ $labels.instance }}."
instance {{ $labels.instance }}' },
summary: a high number of HTTP requests are failing "expr": "histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket{job=~\".*etcd.*\"}[5m]))\n> 0.25\n",
- alert: HTTPRequestsSlow "for": "10m",
expr: histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[5m])) "labels": {
> 0.15 "severity": "warning"
for: 10m }
labels: },
severity: warning {
annotations: "alert": "etcdHighNumberOfFailedHTTPRequests",
description: on etcd instance {{ $labels.instance }} HTTP requests to {{ $labels.method "annotations": {
}} are slow "message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}"
summary: slow HTTP requests },
- alert: EtcdMemberCommunicationSlow "expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nBY (method) > 0.01\n",
expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[5m])) "for": "10m",
> 0.15 "labels": {
for: 10m "severity": "warning"
labels: }
severity: warning },
annotations: {
description: etcd instance {{ $labels.instance }} member communication with "alert": "etcdHighNumberOfFailedHTTPRequests",
{{ $labels.To }} is slow "annotations": {
summary: etcd member communication is slow "message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}."
- alert: HighNumberOfFailedProposals },
expr: increase(etcd_server_proposals_failed_total{job="etcd"}[1h]) > 5 "expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nBY (method) > 0.05\n",
labels: "for": "10m",
severity: warning "labels": {
annotations: "severity": "critical"
description: etcd instance {{ $labels.instance }} has seen {{ $value }} proposal }
failures within the last hour },
summary: a high number of proposals within the etcd cluster are failing {
- alert: HighFsyncDurations "alert": "etcdHTTPRequestsSlow",
expr: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) "annotations": {
> 0.5 "message": "etcd instance {{ $labels.instance }} HTTP requests to {{ $labels.method }} are slow."
for: 10m },
labels: "expr": "histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[5m]))\n> 0.15\n",
severity: warning "for": "10m",
annotations: "labels": {
description: etcd instance {{ $labels.instance }} fync durations are high "severity": "warning"
summary: high fsync durations }
- alert: HighCommitDurations }
expr: histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) ]
> 0.25 }
for: 10m ]
labels: }
severity: warning extra.yaml: |-
annotations: {
description: etcd instance {{ $labels.instance }} commit durations are high "groups": [
summary: high commit durations {
general.rules.yaml: | "name": "extra.rules",
groups: "rules": [
- name: general.rules {
rules: "alert": "InactiveRAIDDisk",
- alert: TargetDown "annotations": {
expr: 100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10 "message": "{{ $value }} RAID disk(s) on node {{ $labels.instance }} are inactive."
for: 10m },
labels: "expr": "node_md_disks - node_md_disks_active > 0",
severity: warning "for": "10m",
annotations: "labels": {
description: '{{ $value }}% of {{ $labels.job }} targets are down.' "severity": "warning"
summary: Targets are down }
- record: fd_utilization }
expr: process_open_fds / process_max_fds ]
- alert: FdExhaustionClose }
expr: predict_linear(fd_utilization[1h], 3600 * 4) > 1 ]
for: 10m }
labels: kube.yaml: |-
severity: warning {
annotations: "groups": [
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance {
will exhaust in file/socket descriptors within the next 4 hours' "name": "k8s.rules",
summary: file descriptors soon exhausted "rules": [
- alert: FdExhaustionClose {
expr: predict_linear(fd_utilization[10m], 3600) > 1 "expr": "sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container_name!=\"\"}[5m])) by (namespace)\n",
for: 10m "record": "namespace:container_cpu_usage_seconds_total:sum_rate"
labels: },
severity: critical {
annotations: "expr": "sum by (namespace, pod_name, container_name) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container_name!=\"\"}[5m])\n)\n",
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance "record": "namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate"
will exhaust in file/socket descriptors within the next hour' },
summary: file descriptors soon exhausted {
kube-controller-manager.rules.yaml: | "expr": "sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\", image!=\"\", container_name!=\"\"}) by (namespace)\n",
groups: "record": "namespace:container_memory_usage_bytes:sum"
- name: kube-controller-manager.rules },
rules: {
- alert: K8SControllerManagerDown "expr": "sum by (namespace, label_name) (\n sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container_name!=\"\"}[5m])) by (namespace, pod_name)\n * on (namespace, pod_name) group_left(label_name)\n label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\", \"(.*)\")\n)\n",
expr: absent(up{job="kube-controller-manager"} == 1) "record": "namespace_name:container_cpu_usage_seconds_total:sum_rate"
for: 5m },
labels: {
severity: critical "expr": "sum by (namespace, label_name) (\n sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\",image!=\"\", container_name!=\"\"}) by (pod_name, namespace)\n* on (namespace, pod_name) group_left(label_name)\n label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\", \"(.*)\")\n)\n",
annotations: "record": "namespace_name:container_memory_usage_bytes:sum"
description: There is no running K8S controller manager. Deployments and replication },
controllers are not making progress. {
summary: Controller manager is down "expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"}) by (namespace, pod)\n* on (namespace, pod) group_left(label_name)\n label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\", \"(.*)\")\n)\n",
kube-scheduler.rules.yaml: | "record": "namespace_name:kube_pod_container_resource_requests_memory_bytes:sum"
groups: },
- name: kube-scheduler.rules {
rules: "expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"} and on(pod) kube_pod_status_scheduled{condition=\"true\"}) by (namespace, pod)\n* on (namespace, pod) group_left(label_name)\n label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\", \"(.*)\")\n)\n",
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile "record": "namespace_name:kube_pod_container_resource_requests_cpu_cores:sum"
expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) }
BY (le, cluster)) / 1e+06 ]
labels: },
quantile: "0.99" {
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile "name": "kube-scheduler.rules",
expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) "rules": [
BY (le, cluster)) / 1e+06 {
labels: "expr": "histogram_quantile(0.99, sum(rate(scheduler_e2e_scheduling_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
quantile: "0.9" "labels": {
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile "quantile": "0.99"
expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket) },
BY (le, cluster)) / 1e+06 "record": "cluster_quantile:scheduler_e2e_scheduling_latency:histogram_quantile"
labels: },
quantile: "0.5" {
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile "expr": "histogram_quantile(0.99, sum(rate(scheduler_scheduling_algorithm_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket) "labels": {
BY (le, cluster)) / 1e+06 "quantile": "0.99"
labels: },
quantile: "0.99" "record": "cluster_quantile:scheduler_scheduling_algorithm_latency:histogram_quantile"
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile },
expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket) {
BY (le, cluster)) / 1e+06 "expr": "histogram_quantile(0.99, sum(rate(scheduler_binding_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
labels: "labels": {
quantile: "0.9" "quantile": "0.99"
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile },
expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket) "record": "cluster_quantile:scheduler_binding_latency:histogram_quantile"
BY (le, cluster)) / 1e+06 },
labels: {
quantile: "0.5" "expr": "histogram_quantile(0.9, sum(rate(scheduler_e2e_scheduling_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
- record: cluster:scheduler_binding_latency_seconds:quantile "labels": {
expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket) "quantile": "0.9"
BY (le, cluster)) / 1e+06 },
labels: "record": "cluster_quantile:scheduler_e2e_scheduling_latency:histogram_quantile"
quantile: "0.99" },
- record: cluster:scheduler_binding_latency_seconds:quantile {
expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket) "expr": "histogram_quantile(0.9, sum(rate(scheduler_scheduling_algorithm_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
BY (le, cluster)) / 1e+06 "labels": {
labels: "quantile": "0.9"
quantile: "0.9" },
- record: cluster:scheduler_binding_latency_seconds:quantile "record": "cluster_quantile:scheduler_scheduling_algorithm_latency:histogram_quantile"
expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket) },
BY (le, cluster)) / 1e+06 {
labels: "expr": "histogram_quantile(0.9, sum(rate(scheduler_binding_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
quantile: "0.5" "labels": {
- alert: K8SSchedulerDown "quantile": "0.9"
expr: absent(up{job="kube-scheduler"} == 1) },
for: 5m "record": "cluster_quantile:scheduler_binding_latency:histogram_quantile"
labels: },
severity: critical {
annotations: "expr": "histogram_quantile(0.5, sum(rate(scheduler_e2e_scheduling_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
description: There is no running K8S scheduler. New pods are not being assigned "labels": {
to nodes. "quantile": "0.5"
summary: Scheduler is down },
kube-state-metrics.rules.yaml: | "record": "cluster_quantile:scheduler_e2e_scheduling_latency:histogram_quantile"
groups: },
- name: kube-state-metrics.rules {
rules: "expr": "histogram_quantile(0.5, sum(rate(scheduler_scheduling_algorithm_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
- alert: DeploymentGenerationMismatch "labels": {
expr: kube_deployment_status_observed_generation != kube_deployment_metadata_generation "quantile": "0.5"
for: 15m },
labels: "record": "cluster_quantile:scheduler_scheduling_algorithm_latency:histogram_quantile"
severity: warning },
annotations: {
description: Observed deployment generation does not match expected one for "expr": "histogram_quantile(0.5, sum(rate(scheduler_binding_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
deployment {{$labels.namespaces}}/{{$labels.deployment}} "labels": {
summary: Deployment is outdated "quantile": "0.5"
- alert: DeploymentReplicasNotUpdated },
expr: ((kube_deployment_status_replicas_updated != kube_deployment_spec_replicas) "record": "cluster_quantile:scheduler_binding_latency:histogram_quantile"
or (kube_deployment_status_replicas_available != kube_deployment_spec_replicas)) }
unless (kube_deployment_spec_paused == 1) ]
for: 15m },
labels: {
severity: warning "name": "kube-apiserver.rules",
annotations: "rules": [
description: Replicas are not updated and available for deployment {{$labels.namespaces}}/{{$labels.deployment}} {
summary: Deployment replicas are outdated "expr": "histogram_quantile(0.99, sum(rate(apiserver_request_latencies_bucket{job=\"apiserver\"}[5m])) without(instance, pod)) / 1e+06\n",
- alert: DaemonSetRolloutStuck "labels": {
expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled "quantile": "0.99"
* 100 < 100 },
for: 15m "record": "cluster_quantile:apiserver_request_latencies:histogram_quantile"
labels: },
severity: warning {
annotations: "expr": "histogram_quantile(0.9, sum(rate(apiserver_request_latencies_bucket{job=\"apiserver\"}[5m])) without(instance, pod)) / 1e+06\n",
description: Only {{$value}}% of desired pods scheduled and ready for daemon "labels": {
set {{$labels.namespaces}}/{{$labels.daemonset}} "quantile": "0.9"
summary: DaemonSet is missing pods },
- alert: K8SDaemonSetsNotScheduled "record": "cluster_quantile:apiserver_request_latencies:histogram_quantile"
expr: kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled },
> 0 {
for: 10m "expr": "histogram_quantile(0.5, sum(rate(apiserver_request_latencies_bucket{job=\"apiserver\"}[5m])) without(instance, pod)) / 1e+06\n",
labels: "labels": {
severity: warning "quantile": "0.5"
annotations: },
description: A number of daemonsets are not scheduled. "record": "cluster_quantile:apiserver_request_latencies:histogram_quantile"
summary: Daemonsets are not scheduled correctly }
- alert: DaemonSetsMissScheduled ]
expr: kube_daemonset_status_number_misscheduled > 0 },
for: 10m {
labels: "name": "node.rules",
severity: warning "rules": [
annotations: {
description: A number of daemonsets are running where they are not supposed "expr": "sum(min(kube_pod_info) by (node))",
to run. "record": ":kube_pod_info_node_count:"
summary: Daemonsets are not scheduled correctly },
- alert: PodFrequentlyRestarting {
expr: increase(kube_pod_container_status_restarts_total[1h]) > 5 "expr": "max(label_replace(kube_pod_info{job=\"kube-state-metrics\"}, \"pod\", \"$1\", \"pod\", \"(.*)\")) by (node, namespace, pod)\n",
for: 10m "record": "node_namespace_pod:kube_pod_info:"
labels: },
severity: warning {
annotations: "expr": "count by (node) (sum by (node, cpu) (\n node_cpu_seconds_total{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n))\n",
description: Pod {{$labels.namespaces}}/{{$labels.pod}} restarted {{$value}} "record": "node:node_num_cpu:sum"
times within the last hour },
summary: Pod is restarting frequently {
kubelet.rules.yaml: | "expr": "1 - avg(rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m]))\n",
groups: "record": ":node_cpu_utilisation:avg1m"
- name: kubelet.rules },
rules: {
- alert: K8SNodeNotReady "expr": "1 - avg by (node) (\n rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m])\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:)\n",
expr: kube_node_status_condition{condition="Ready",status="true"} == 0 "record": "node:node_cpu_utilisation:avg1m"
for: 1h },
labels: {
severity: warning "expr": "node:node_cpu_utilisation:avg1m\n *\nnode:node_num_cpu:sum\n /\nscalar(sum(node:node_num_cpu:sum))\n",
annotations: "record": "node:cluster_cpu_utilisation:ratio"
description: The Kubelet on {{ $labels.node }} has not checked in with the API, },
or has set itself to NotReady, for more than an hour {
summary: Node status is NotReady "expr": "sum(node_load1{job=\"node-exporter\"})\n/\nsum(node:node_num_cpu:sum)\n",
- alert: K8SManyNodesNotReady "record": ":node_cpu_saturation_load1:"
expr: count(kube_node_status_condition{condition="Ready",status="true"} == 0) },
> 1 and (count(kube_node_status_condition{condition="Ready",status="true"} == {
0) / count(kube_node_status_condition{condition="Ready",status="true"})) > 0.2 "expr": "sum by (node) (\n node_load1{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n/\nnode:node_num_cpu:sum\n",
for: 1m "record": "node:node_cpu_saturation_load1:"
labels: },
severity: critical {
annotations: "expr": "1 -\nsum(node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n/\nsum(node_memory_MemTotal_bytes{job=\"node-exporter\"})\n",
description: '{{ $value }}% of Kubernetes nodes are not ready' "record": ":node_memory_utilisation:"
- alert: K8SKubeletDown },
expr: count(up{job="kubelet"} == 0) / count(up{job="kubelet"}) * 100 > 3 {
for: 1h "expr": "sum(node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n",
labels: "record": ":node_memory_MemFreeCachedBuffers_bytes:sum"
severity: warning },
annotations: {
description: Prometheus failed to scrape {{ $value }}% of kubelets. "expr": "sum(node_memory_MemTotal_bytes{job=\"node-exporter\"})\n",
- alert: K8SKubeletDown "record": ":node_memory_MemTotal_bytes:sum"
expr: (absent(up{job="kubelet"} == 1) or count(up{job="kubelet"} == 0) / count(up{job="kubelet"})) },
* 100 > 10 {
for: 1h "expr": "sum by (node) (\n (node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
labels: "record": "node:node_memory_bytes_available:sum"
severity: critical },
annotations: {
description: Prometheus failed to scrape {{ $value }}% of kubelets, or all Kubelets "expr": "sum by (node) (\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
have disappeared from service discovery. "record": "node:node_memory_bytes_total:sum"
summary: Many Kubelets cannot be scraped },
- alert: K8SKubeletTooManyPods {
expr: kubelet_running_pod_count > 100 "expr": "(node:node_memory_bytes_total:sum - node:node_memory_bytes_available:sum)\n/\nnode:node_memory_bytes_total:sum\n",
for: 10m "record": "node:node_memory_utilisation:ratio"
labels: },
severity: warning {
annotations: "expr": "(node:node_memory_bytes_total:sum - node:node_memory_bytes_available:sum)\n/\nscalar(sum(node:node_memory_bytes_total:sum))\n",
description: Kubelet {{$labels.instance}} is running {{$value}} pods, close "record": "node:cluster_memory_utilisation:ratio"
to the limit of 110 },
summary: Kubelet is close to pod limit {
kubernetes.rules.yaml: | "expr": "1e3 * sum(\n (rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m])\n + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n)\n",
groups: "record": ":node_memory_swap_io_bytes:sum_rate"
- name: kubernetes.rules },
rules: {
- record: pod_name:container_memory_usage_bytes:sum "expr": "1 -\nsum by (node) (\n (node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n/\nsum by (node) (\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY "record": "node:node_memory_utilisation:"
(pod_name) },
- record: pod_name:container_spec_cpu_shares:sum {
expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) BY (pod_name) "expr": "1 - (node:node_memory_bytes_available:sum / node:node_memory_bytes_total:sum)\n",
- record: pod_name:container_cpu_usage:sum "record": "node:node_memory_utilisation_2:"
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m])) },
BY (pod_name) {
- record: pod_name:container_fs_usage_bytes:sum "expr": "1e3 * sum by (node) (\n (rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m])\n + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
expr: sum(container_fs_usage_bytes{container_name!="POD",pod_name!=""}) BY (pod_name) "record": "node:node_memory_swap_io_bytes:sum_rate"
- record: namespace:container_memory_usage_bytes:sum },
expr: sum(container_memory_usage_bytes{container_name!=""}) BY (namespace) {
- record: namespace:container_spec_cpu_shares:sum "expr": "avg(irate(node_disk_io_time_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m]))\n",
expr: sum(container_spec_cpu_shares{container_name!=""}) BY (namespace) "record": ":node_disk_utilisation:avg_irate"
- record: namespace:container_cpu_usage:sum },
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m])) {
BY (namespace) "expr": "avg by (node) (\n irate(node_disk_io_time_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m])\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
- record: cluster:memory_usage:ratio "record": "node:node_disk_utilisation:avg_irate"
expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY },
(cluster) / sum(machine_memory_bytes) BY (cluster) {
- record: cluster:container_spec_cpu_shares:ratio "expr": "avg(irate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m]) / 1e3)\n",
expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) / 1000 "record": ":node_disk_saturation:avg_irate"
/ sum(machine_cpu_cores) },
- record: cluster:container_cpu_usage:ratio {
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m])) "expr": "avg by (node) (\n irate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m]) / 1e3\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
/ sum(machine_cpu_cores) "record": "node:node_disk_saturation:avg_irate"
- record: apiserver_latency_seconds:quantile },
expr: histogram_quantile(0.99, rate(apiserver_request_latencies_bucket[5m])) / {
1e+06 "expr": "max by (namespace, pod, device) ((node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"}\n- node_filesystem_avail_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n/ node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n",
labels: "record": "node:node_filesystem_usage:"
quantile: "0.99" },
- record: apiserver_latency:quantile_seconds {
expr: histogram_quantile(0.9, rate(apiserver_request_latencies_bucket[5m])) / "expr": "max by (namespace, pod, device) (node_filesystem_avail_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"} / node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n",
1e+06 "record": "node:node_filesystem_avail:"
labels: },
quantile: "0.9" {
- record: apiserver_latency_seconds:quantile "expr": "sum(irate(node_network_receive_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m])) +\nsum(irate(node_network_transmit_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n",
expr: histogram_quantile(0.5, rate(apiserver_request_latencies_bucket[5m])) / "record": ":node_net_utilisation:sum_irate"
1e+06 },
labels: {
quantile: "0.5" "expr": "sum by (node) (\n (irate(node_network_receive_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]) +\n irate(node_network_transmit_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
- alert: APIServerLatencyHigh "record": "node:node_net_utilisation:sum_irate"
expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"} },
> 1 {
for: 10m "expr": "sum(irate(node_network_receive_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m])) +\nsum(irate(node_network_transmit_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n",
labels: "record": ":node_net_saturation:sum_irate"
severity: warning },
annotations: {
description: the API server has a 99th percentile latency of {{ $value }} seconds "expr": "sum by (node) (\n (irate(node_network_receive_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]) +\n irate(node_network_transmit_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
for {{$labels.verb}} {{$labels.resource}} "record": "node:node_net_saturation:sum_irate"
- alert: APIServerLatencyHigh },
expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"} {
> 4 "expr": "max(\n max(\n kube_pod_info{job=\"kube-state-metrics\", host_ip!=\"\"}\n ) by (node, host_ip)\n * on (host_ip) group_right (node)\n label_replace(\n (max(node_filesystem_files{job=\"node-exporter\", mountpoint=\"/\"}) by (instance)), \"host_ip\", \"$1\", \"instance\", \"(.*):.*\"\n )\n) by (node)\n",
for: 10m "record": "node:node_inodes_total:"
labels: },
severity: critical {
annotations: "expr": "max(\n max(\n kube_pod_info{job=\"kube-state-metrics\", host_ip!=\"\"}\n ) by (node, host_ip)\n * on (host_ip) group_right (node)\n label_replace(\n (max(node_filesystem_files_free{job=\"node-exporter\", mountpoint=\"/\"}) by (instance)), \"host_ip\", \"$1\", \"instance\", \"(.*):.*\"\n )\n) by (node)\n",
description: the API server has a 99th percentile latency of {{ $value }} seconds "record": "node:node_inodes_free:"
for {{$labels.verb}} {{$labels.resource}} }
- alert: APIServerErrorsHigh ]
expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m]) },
* 100 > 2 {
for: 10m "name": "kubernetes-absent",
labels: "rules": [
severity: warning {
annotations: "alert": "KubeAPIDown",
description: API server returns errors for {{ $value }}% of requests "annotations": {
- alert: APIServerErrorsHigh "message": "KubeAPI has disappeared from Prometheus target discovery.",
expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m]) "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown"
* 100 > 5 },
for: 10m "expr": "absent(up{job=\"apiserver\"} == 1)\n",
labels: "for": "15m",
severity: critical "labels": {
annotations: "severity": "critical"
description: API server returns errors for {{ $value }}% of requests }
- alert: K8SApiserverDown },
expr: absent(up{job="apiserver"} == 1) {
for: 20m "alert": "KubeControllerManagerDown",
labels: "annotations": {
severity: critical "message": "KubeControllerManager has disappeared from Prometheus target discovery.",
annotations: "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown"
description: No API servers are reachable or all have disappeared from service },
discovery "expr": "absent(up{job=\"kube-controller-manager\"} == 1)\n",
"for": "15m",
- alert: K8sCertificateExpirationNotice "labels": {
labels: "severity": "critical"
severity: warning }
annotations: },
description: Kubernetes API Certificate is expiring soon (less than 7 days) {
expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="604800"}) > 0 "alert": "KubeSchedulerDown",
"annotations": {
- alert: K8sCertificateExpirationNotice "message": "KubeScheduler has disappeared from Prometheus target discovery.",
labels: "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown"
severity: critical },
annotations: "expr": "absent(up{job=\"kube-scheduler\"} == 1)\n",
description: Kubernetes API Certificate is expiring in less than 1 day "for": "15m",
expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="86400"}) > 0 "labels": {
node.rules.yaml: | "severity": "critical"
groups: }
- name: node.rules },
rules: {
- record: instance:node_cpu:rate:sum "alert": "KubeletDown",
expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!~"^(?:guest.*)$"}[3m])) "annotations": {
BY (instance) "message": "Kubelet has disappeared from Prometheus target discovery.",
- record: instance:node_filesystem_usage:sum "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown"
expr: sum((node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_free_bytes{mountpoint="/"})) },
BY (instance) "expr": "absent(up{job=\"kubelet\"} == 1)\n",
- record: instance:node_network_receive_bytes:rate:sum "for": "15m",
expr: sum(rate(node_network_receive_bytes_total[3m])) BY (instance) "labels": {
- record: instance:node_network_transmit_bytes:rate:sum "severity": "critical"
expr: sum(rate(node_network_transmit_bytes_total[3m])) BY (instance) }
- record: instance:node_cpu:ratio }
expr: sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) WITHOUT (cpu, mode) / ON(instance) ]
GROUP_LEFT() count(sum(node_cpu_seconds_total) BY (instance, cpu)) BY (instance) },
- record: cluster:node_cpu:sum_rate5m {
expr: sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) "name": "kubernetes-apps",
- record: cluster:node_cpu:ratio "rules": [
expr: cluster:node_cpu:sum_rate5m / count(sum(node_cpu_seconds_total) BY (instance, cpu)) {
- alert: NodeExporterDown "alert": "KubePodCrashLooping",
expr: absent(up{job="node-exporter"} == 1) "annotations": {
for: 10m "message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf \"%.2f\" $value }} times / 5 minutes.",
labels: "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping"
severity: warning },
annotations: "expr": "rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\"}[15m]) * 60 * 5 > 0\n",
description: Prometheus could not scrape a node-exporter for more than 10m, "for": "1h",
or node-exporters have disappeared from discovery "labels": {
- alert: NodeDiskRunningFull "severity": "critical"
expr: predict_linear(node_filesystem_free_bytes[6h], 3600 * 24) < 0 }
for: 30m },
labels: {
severity: warning "alert": "KubePodNotReady",
annotations: "annotations": {
description: device {{$labels.device}} on node {{$labels.instance}} is running "message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than an hour.",
full within the next 24 hours (mounted at {{$labels.mountpoint}}) "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"
- alert: NodeDiskRunningFull },
expr: predict_linear(node_filesystem_free_bytes[30m], 3600 * 2) < 0 "expr": "sum by (namespace, pod) (kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Pending|Unknown\"}) > 0\n",
for: 10m "for": "1h",
labels: "labels": {
severity: critical "severity": "critical"
annotations: }
description: device {{$labels.device}} on node {{$labels.instance}} is running },
full within the next 2 hours (mounted at {{$labels.mountpoint}}) {
- alert: InactiveRAIDDisk "alert": "KubeDeploymentGenerationMismatch",
expr: node_md_disks - node_md_disks_active > 0 "annotations": {
for: 10m "message": "Deployment generation for {{ $labels.namespace }}/{{ $labels.deployment }} does not match, this indicates that the Deployment has failed but has not been rolled back.",
labels: "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentgenerationmismatch"
severity: warning },
annotations: "expr": "kube_deployment_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_deployment_metadata_generation{job=\"kube-state-metrics\"}\n",
description: '{{$value}} RAID disk(s) on node {{$labels.instance}} are inactive' "for": "15m",
prometheus.rules.yaml: | "labels": {
groups: "severity": "critical"
- name: prometheus.rules }
rules: },
- alert: PrometheusConfigReloadFailed {
expr: prometheus_config_last_reload_successful == 0 "alert": "KubeDeploymentReplicasMismatch",
for: 10m "annotations": {
labels: "message": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than an hour.",
severity: warning "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"
annotations: },
description: Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}} "expr": "kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n !=\nkube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n",
- alert: PrometheusNotificationQueueRunningFull "for": "1h",
expr: predict_linear(prometheus_notifications_queue_length[5m], 60 * 30) > prometheus_notifications_queue_capacity "labels": {
for: 10m "severity": "critical"
labels: }
severity: warning },
annotations: {
description: Prometheus' alert notification queue is running full for {{$labels.namespace}}/{{ "alert": "KubeStatefulSetReplicasMismatch",
$labels.pod}} "annotations": {
- alert: PrometheusErrorSendingAlerts "message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 15 minutes.",
expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m]) "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetreplicasmismatch"
> 0.01 },
for: 10m "expr": "kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_status_replicas{job=\"kube-state-metrics\"}\n",
labels: "for": "15m",
severity: warning "labels": {
annotations: "severity": "critical"
description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{ }
$labels.pod}} to Alertmanager {{$labels.Alertmanager}} },
- alert: PrometheusErrorSendingAlerts {
expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m]) "alert": "KubeStatefulSetGenerationMismatch",
> 0.03 "annotations": {
for: 10m "message": "StatefulSet generation for {{ $labels.namespace }}/{{ $labels.statefulset }} does not match, this indicates that the StatefulSet has failed but has not been rolled back.",
labels: "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetgenerationmismatch"
severity: critical },
annotations: "expr": "kube_statefulset_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_metadata_generation{job=\"kube-state-metrics\"}\n",
description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{ "for": "15m",
$labels.pod}} to Alertmanager {{$labels.Alertmanager}} "labels": {
- alert: PrometheusNotConnectedToAlertmanagers "severity": "critical"
expr: prometheus_notifications_alertmanagers_discovered < 1 }
for: 10m },
labels: {
severity: warning "alert": "KubeStatefulSetUpdateNotRolledOut",
annotations: "annotations": {
description: Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected "message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update has not been rolled out.",
to any Alertmanagers "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetupdatenotrolledout"
- alert: PrometheusTSDBReloadsFailing },
expr: increase(prometheus_tsdb_reloads_failures_total[2h]) > 0 "expr": "max without (revision) (\n kube_statefulset_status_current_revision{job=\"kube-state-metrics\"}\n unless\n kube_statefulset_status_update_revision{job=\"kube-state-metrics\"}\n)\n *\n(\n kube_statefulset_replicas{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}\n)\n",
for: 12h "for": "15m",
labels: "labels": {
severity: warning "severity": "critical"
annotations: }
description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}} },
reload failures over the last four hours.' {
summary: Prometheus has issues reloading data blocks from disk "alert": "KubeDaemonSetRolloutStuck",
- alert: PrometheusTSDBCompactionsFailing "annotations": {
expr: increase(prometheus_tsdb_compactions_failed_total[2h]) > 0 "message": "Only {{ $value }}% of the desired Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are scheduled and ready.",
for: 12h "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck"
labels: },
severity: warning "expr": "kube_daemonset_status_number_ready{job=\"kube-state-metrics\"}\n /\nkube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"} * 100 < 100\n",
annotations: "for": "15m",
description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}} "labels": {
compaction failures over the last four hours.' "severity": "critical"
summary: Prometheus has issues compacting sample blocks }
- alert: PrometheusTSDBWALCorruptions },
expr: tsdb_wal_corruptions_total > 0 {
for: 4h "alert": "KubeDaemonSetNotScheduled",
labels: "annotations": {
severity: warning "message": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled.",
annotations: "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetnotscheduled"
description: '{{$labels.job}} at {{$labels.instance}} has a corrupted write-ahead },
log (WAL).' "expr": "kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n -\nkube_daemonset_status_current_number_scheduled{job=\"kube-state-metrics\"} > 0\n",
summary: Prometheus write-ahead log is corrupted "for": "10m",
- alert: PrometheusNotIngestingSamples "labels": {
expr: rate(prometheus_tsdb_head_samples_appended_total[5m]) <= 0 "severity": "warning"
for: 10m }
labels: },
severity: warning {
annotations: "alert": "KubeDaemonSetMisScheduled",
description: "Prometheus {{ $labels.namespace }}/{{ $labels.pod}} isn't ingesting samples." "annotations": {
summary: "Prometheus isn't ingesting samples" "message": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are running where they are not supposed to run.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetmisscheduled"
},
"expr": "kube_daemonset_status_number_misscheduled{job=\"kube-state-metrics\"} > 0\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeCronJobRunning",
"annotations": {
"message": "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is taking more than 1h to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecronjobrunning"
},
"expr": "time() - kube_cronjob_next_schedule_time{job=\"kube-state-metrics\"} > 3600\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeJobCompletion",
"annotations": {
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more than one hour to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobcompletion"
},
"expr": "kube_job_spec_completions{job=\"kube-state-metrics\"} - kube_job_status_succeeded{job=\"kube-state-metrics\"} > 0\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeJobFailed",
"annotations": {
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobfailed"
},
"expr": "kube_job_status_failed{job=\"kube-state-metrics\"} > 0\n",
"for": "1h",
"labels": {
"severity": "warning"
}
}
]
},
{
"name": "kubernetes-resources",
"rules": [
{
"alert": "KubeCPUOvercommit",
"annotations": {
"message": "Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit"
},
"expr": "sum(namespace_name:kube_pod_container_resource_requests_cpu_cores:sum)\n /\nsum(node:node_num_cpu:sum)\n >\n(count(node:node_num_cpu:sum)-1) / count(node:node_num_cpu:sum)\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeMemOvercommit",
"annotations": {
"message": "Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit"
},
"expr": "sum(namespace_name:kube_pod_container_resource_requests_memory_bytes:sum)\n /\nsum(node_memory_MemTotal_bytes)\n >\n(count(node:node_num_cpu:sum)-1)\n /\ncount(node:node_num_cpu:sum)\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeCPUOvercommit",
"annotations": {
"message": "Cluster has overcommitted CPU resource requests for Namespaces.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit"
},
"expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"requests.cpu\"})\n /\nsum(node:node_num_cpu:sum)\n > 1.5\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeMemOvercommit",
"annotations": {
"message": "Cluster has overcommitted memory resource requests for Namespaces.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit"
},
"expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"requests.memory\"})\n /\nsum(node_memory_MemTotal_bytes{job=\"node-exporter\"})\n > 1.5\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeQuotaExceeded",
"annotations": {
"message": "Namespace {{ $labels.namespace }} is using {{ printf \"%0.0f\" $value }}% of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded"
},
"expr": "100 * kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 90\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "CPUThrottlingHigh",
"annotations": {
"message": "{{ printf \"%0.0f\" $value }}% throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container_name }} in pod {{ $labels.pod_name }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh"
},
"expr": "100 * sum(increase(container_cpu_cfs_throttled_periods_total{container_name!=\"\", }[5m])) by (container_name, pod_name, namespace)\n /\nsum(increase(container_cpu_cfs_periods_total{}[5m])) by (container_name, pod_name, namespace)\n > 100 \n",
"for": "15m",
"labels": {
"severity": "warning"
}
}
]
},
{
"name": "kubernetes-storage",
"rules": [
{
"alert": "KubePersistentVolumeUsageCritical",
"annotations": {
"message": "The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is only {{ printf \"%0.2f\" $value }}% free.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeusagecritical"
},
"expr": "100 * kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\nkubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n < 3\n",
"for": "1m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubePersistentVolumeFullInFourDays",
"annotations": {
"message": "Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is expected to fill up within four days. Currently {{ printf \"%0.2f\" $value }}% is available.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefullinfourdays"
},
"expr": "100 * (\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 15\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\n",
"for": "5m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubePersistentVolumeErrors",
"annotations": {
"message": "The persistent volume {{ $labels.persistentvolume }} has status {{ $labels.phase }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeerrors"
},
"expr": "kube_persistentvolume_status_phase{phase=~\"Failed|Pending\",job=\"kube-state-metrics\"} > 0\n",
"for": "5m",
"labels": {
"severity": "critical"
}
}
]
},
{
"name": "kubernetes-system",
"rules": [
{
"alert": "KubeNodeNotReady",
"annotations": {
"message": "{{ $labels.node }} has been unready for more than an hour.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready"
},
"expr": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeVersionMismatch",
"annotations": {
"message": "There are {{ $value }} different semantic versions of Kubernetes components running.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch"
},
"expr": "count(count by (gitVersion) (label_replace(kubernetes_build_info{job!=\"coredns\"},\"gitVersion\",\"$1\",\"gitVersion\",\"(v[0-9]*.[0-9]*.[0-9]*).*\"))) > 1\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeClientErrors",
"annotations": {
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ printf \"%0.0f\" $value }}% errors.'",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors"
},
"expr": "(sum(rate(rest_client_requests_total{code=~\"5..\"}[5m])) by (instance, job)\n /\nsum(rate(rest_client_requests_total[5m])) by (instance, job))\n* 100 > 1\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeClientErrors",
"annotations": {
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ printf \"%0.0f\" $value }} errors / second.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors"
},
"expr": "sum(rate(ksm_scrape_error_total{job=\"kube-state-metrics\"}[5m])) by (instance, job) > 0.1\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletTooManyPods",
"annotations": {
"message": "Kubelet {{ $labels.instance }} is running {{ $value }} Pods, close to the limit of 110.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods"
},
"expr": "kubelet_running_pod_count{job=\"kubelet\"} > 110 * 0.9\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPILatencyHigh",
"annotations": {
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
},
"expr": "cluster_quantile:apiserver_request_latencies:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"^(?:LIST|WATCH|WATCHLIST|PROXY|CONNECT)$\"} > 1\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPILatencyHigh",
"annotations": {
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
},
"expr": "cluster_quantile:apiserver_request_latencies:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"^(?:LIST|WATCH|WATCHLIST|PROXY|CONNECT)$\"} > 4\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_count{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m]))\n /\nsum(rate(apiserver_request_count{job=\"apiserver\"}[5m])) * 100 > 3\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_count{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m]))\n /\nsum(rate(apiserver_request_count{job=\"apiserver\"}[5m])) * 100 > 1\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_count{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_count{job=\"apiserver\"}[5m])) by (resource,subresource,verb) * 100 > 10\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_count{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_count{job=\"apiserver\"}[5m])) by (resource,subresource,verb) * 100 > 5\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeClientCertificateExpiration",
"annotations": {
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 7 days.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration"
},
"expr": "histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 604800\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeClientCertificateExpiration",
"annotations": {
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 24 hours.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration"
},
"expr": "histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 86400\n",
"labels": {
"severity": "critical"
}
}
]
}
]
}
kubeprom.yaml: |-
{
"groups": [
{
"name": "kube-prometheus-node-recording.rules",
"rules": [
{
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[3m])) BY (instance)",
"record": "instance:node_cpu:rate:sum"
},
{
"expr": "sum((node_filesystem_size_bytes{mountpoint=\"/\"} - node_filesystem_free_bytes{mountpoint=\"/\"})) BY (instance)",
"record": "instance:node_filesystem_usage:sum"
},
{
"expr": "sum(rate(node_network_receive_bytes_total[3m])) BY (instance)",
"record": "instance:node_network_receive_bytes:rate:sum"
},
{
"expr": "sum(rate(node_network_transmit_bytes_total[3m])) BY (instance)",
"record": "instance:node_network_transmit_bytes:rate:sum"
},
{
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[5m])) WITHOUT (cpu, mode) / ON(instance) GROUP_LEFT() count(sum(node_cpu_seconds_total) BY (instance, cpu)) BY (instance)",
"record": "instance:node_cpu:ratio"
},
{
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[5m]))",
"record": "cluster:node_cpu:sum_rate5m"
},
{
"expr": "cluster:node_cpu_seconds_total:rate5m / count(sum(node_cpu_seconds_total) BY (instance, cpu))",
"record": "cluster:node_cpu:ratio"
}
]
},
{
"name": "kube-prometheus-node-alerting.rules",
"rules": [
{
"alert": "NodeDiskRunningFull",
"annotations": {
"message": "Device {{ $labels.device }} of node-exporter {{ $labels.namespace }}/{{ $labels.pod }} will be full within the next 24 hours."
},
"expr": "(node:node_filesystem_usage: > 0.85) and (predict_linear(node:node_filesystem_avail:[6h], 3600 * 24) < 0)\n",
"for": "30m",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeDiskRunningFull",
"annotations": {
"message": "Device {{ $labels.device }} of node-exporter {{ $labels.namespace }}/{{ $labels.pod }} will be full within the next 2 hours."
},
"expr": "(node:node_filesystem_usage: > 0.85) and (predict_linear(node:node_filesystem_avail:[30m], 3600 * 2) < 0)\n",
"for": "10m",
"labels": {
"severity": "critical"
}
}
]
},
{
"name": "prometheus.rules",
"rules": [
{
"alert": "PrometheusConfigReloadFailed",
"annotations": {
"description": "Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}}",
"summary": "Reloading Prometheus' configuration failed"
},
"expr": "prometheus_config_last_reload_successful{job=\"prometheus\"} == 0\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusNotificationQueueRunningFull",
"annotations": {
"description": "Prometheus' alert notification queue is running full for {{$labels.namespace}}/{{ $labels.pod}}",
"summary": "Prometheus' alert notification queue is running full"
},
"expr": "predict_linear(prometheus_notifications_queue_length{job=\"prometheus\"}[5m], 60 * 30) > prometheus_notifications_queue_capacity{job=\"prometheus\"}\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusErrorSendingAlerts",
"annotations": {
"description": "Errors while sending alerts from Prometheus {{$labels.namespace}}/{{ $labels.pod}} to Alertmanager {{$labels.Alertmanager}}",
"summary": "Errors while sending alert from Prometheus"
},
"expr": "rate(prometheus_notifications_errors_total{job=\"prometheus\"}[5m]) / rate(prometheus_notifications_sent_total{job=\"prometheus\"}[5m]) > 0.01\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusErrorSendingAlerts",
"annotations": {
"description": "Errors while sending alerts from Prometheus {{$labels.namespace}}/{{ $labels.pod}} to Alertmanager {{$labels.Alertmanager}}",
"summary": "Errors while sending alerts from Prometheus"
},
"expr": "rate(prometheus_notifications_errors_total{job=\"prometheus\"}[5m]) / rate(prometheus_notifications_sent_total{job=\"prometheus\"}[5m]) > 0.03\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "PrometheusNotConnectedToAlertmanagers",
"annotations": {
"description": "Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected to any Alertmanagers",
"summary": "Prometheus is not connected to any Alertmanagers"
},
"expr": "prometheus_notifications_alertmanagers_discovered{job=\"prometheus\"} < 1\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusTSDBReloadsFailing",
"annotations": {
"description": "{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}} reload failures over the last four hours.",
"summary": "Prometheus has issues reloading data blocks from disk"
},
"expr": "increase(prometheus_tsdb_reloads_failures_total{job=\"prometheus\"}[2h]) > 0\n",
"for": "12h",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusTSDBCompactionsFailing",
"annotations": {
"description": "{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}} compaction failures over the last four hours.",
"summary": "Prometheus has issues compacting sample blocks"
},
"expr": "increase(prometheus_tsdb_compactions_failed_total{job=\"prometheus\"}[2h]) > 0\n",
"for": "12h",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusTSDBWALCorruptions",
"annotations": {
"description": "{{$labels.job}} at {{$labels.instance}} has a corrupted write-ahead log (WAL).",
"summary": "Prometheus write-ahead log is corrupted"
},
"expr": "prometheus_tsdb_wal_corruptions_total{job=\"prometheus\"} > 0\n",
"for": "4h",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusNotIngestingSamples",
"annotations": {
"description": "Prometheus {{ $labels.namespace }}/{{ $labels.pod}} isn't ingesting samples.",
"summary": "Prometheus isn't ingesting samples"
},
"expr": "rate(prometheus_tsdb_head_samples_appended_total{job=\"prometheus\"}[5m]) <= 0\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusTargetScrapesDuplicate",
"annotations": {
"description": "{{$labels.namespace}}/{{$labels.pod}} has many samples rejected due to duplicate timestamps but different values",
"summary": "Prometheus has many samples rejected"
},
"expr": "increase(prometheus_target_scrapes_sample_duplicate_timestamp_total{job=\"prometheus\"}[5m]) > 0\n",
"for": "10m",
"labels": {
"severity": "warning"
}
}
]
},
{
"name": "general.rules",
"rules": [
{
"alert": "TargetDown",
"annotations": {
"message": "{{ $value }}% of the {{ $labels.job }} targets are down."
},
"expr": "100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10",
"for": "10m",
"labels": {
"severity": "warning"
}
}
]
}
]
}

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.13.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube)) * Kubernetes v1.13.5 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests) # Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" { module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=c12a11c8006606b59335ecc994abe22358aaf68b" source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=da0321287ba8cd48e4492894483fcdc1da057657"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"] api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]

View File

@ -7,7 +7,7 @@ systemd:
- name: 40-etcd-cluster.conf - name: 40-etcd-cluster.conf
contents: | contents: |
[Service] [Service]
Environment="ETCD_IMAGE_TAG=v3.3.11" Environment="ETCD_IMAGE_TAG=v3.3.12"
Environment="ETCD_NAME=${etcd_name}" Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379" Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380" Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -123,7 +123,7 @@ storage:
contents: contents:
inline: | inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.13.3 KUBELET_IMAGE_TAG=v1.13.5
- path: /etc/sysctl.d/max-user-watches.conf - path: /etc/sysctl.d/max-user-watches.conf
filesystem: root filesystem: root
contents: contents:

View File

@ -79,7 +79,7 @@ data "template_file" "etcds" {
count = "${var.controller_count}" count = "${var.controller_count}"
template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380" template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380"
vars { vars = {
index = "${count.index}" index = "${count.index}"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
dns_zone = "${var.dns_zone}" dns_zone = "${var.dns_zone}"

View File

@ -5,7 +5,7 @@ terraform {
} }
provider "aws" { provider "aws" {
version = "~> 1.13" version = ">= 1.13, < 3.0"
} }
provider "local" { provider "local" {

View File

@ -93,7 +93,7 @@ storage:
contents: contents:
inline: | inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.13.3 KUBELET_IMAGE_TAG=v1.13.5
- path: /etc/sysctl.d/max-user-watches.conf - path: /etc/sysctl.d/max-user-watches.conf
filesystem: root filesystem: root
contents: contents:
@ -111,7 +111,7 @@ storage:
--volume config,kind=host,source=/etc/kubernetes \ --volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \ --mount volume=config,target=/etc/kubernetes \
--insecure-options=image \ --insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.13.3 \ docker://k8s.gcr.io/hyperkube:v1.13.5 \
--net=host \ --net=host \
--dns=host \ --dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname) --exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.13.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube)) * Kubernetes v1.13.5 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/) and [spot](https://typhoon.psdn.io/cl/aws/#spot) workers * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/) and [spot](https://typhoon.psdn.io/cl/aws/#spot) workers

View File

@ -1,6 +1,6 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests) # Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" { module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=c12a11c8006606b59335ecc994abe22358aaf68b" source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=da0321287ba8cd48e4492894483fcdc1da057657"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"] api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]

View File

@ -78,8 +78,8 @@ bootcmd:
runcmd: runcmd:
- [systemctl, daemon-reload] - [systemctl, daemon-reload]
- [systemctl, restart, NetworkManager] - [systemctl, restart, NetworkManager]
- "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.11" - "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.12"
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.3" - "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.5"
- "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.14.0" - "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.14.0"
- [systemctl, start, --no-block, etcd.service] - [systemctl, start, --no-block, etcd.service]
- [systemctl, start, --no-block, kubelet.service] - [systemctl, start, --no-block, kubelet.service]

View File

@ -71,7 +71,7 @@ data "template_file" "etcds" {
count = "${var.controller_count}" count = "${var.controller_count}"
template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380" template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380"
vars { vars = {
index = "${count.index}" index = "${count.index}"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
dns_zone = "${var.dns_zone}" dns_zone = "${var.dns_zone}"

View File

@ -5,7 +5,7 @@ terraform {
} }
provider "aws" { provider "aws" {
version = "~> 1.13" version = ">= 1.13, < 3.0"
} }
provider "local" { provider "local" {

View File

@ -54,7 +54,7 @@ bootcmd:
runcmd: runcmd:
- [systemctl, daemon-reload] - [systemctl, daemon-reload]
- [systemctl, restart, NetworkManager] - [systemctl, restart, NetworkManager]
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.3" - "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.5"
- [systemctl, start, --no-block, kubelet.service] - [systemctl, start, --no-block, kubelet.service]
users: users:
- default - default

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.13.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube)) * Kubernetes v1.13.5 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [low-priority](https://typhoon.psdn.io/cl/azure/#low-priority) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [low-priority](https://typhoon.psdn.io/cl/azure/#low-priority) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests) # Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" { module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=c12a11c8006606b59335ecc994abe22358aaf68b" source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=da0321287ba8cd48e4492894483fcdc1da057657"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"] api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]

View File

@ -7,7 +7,7 @@ systemd:
- name: 40-etcd-cluster.conf - name: 40-etcd-cluster.conf
contents: | contents: |
[Service] [Service]
Environment="ETCD_IMAGE_TAG=v3.3.11" Environment="ETCD_IMAGE_TAG=v3.3.12"
Environment="ETCD_NAME=${etcd_name}" Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379" Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380" Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -123,7 +123,7 @@ storage:
contents: contents:
inline: | inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.13.3 KUBELET_IMAGE_TAG=v1.13.5
- path: /etc/sysctl.d/max-user-watches.conf - path: /etc/sysctl.d/max-user-watches.conf
filesystem: root filesystem: root
contents: contents:

View File

@ -160,7 +160,7 @@ data "template_file" "etcds" {
count = "${var.controller_count}" count = "${var.controller_count}"
template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380" template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380"
vars { vars = {
index = "${count.index}" index = "${count.index}"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
dns_zone = "${var.dns_zone}" dns_zone = "${var.dns_zone}"

View File

@ -93,7 +93,7 @@ storage:
contents: contents:
inline: | inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.13.3 KUBELET_IMAGE_TAG=v1.13.5
- path: /etc/sysctl.d/max-user-watches.conf - path: /etc/sysctl.d/max-user-watches.conf
filesystem: root filesystem: root
contents: contents:
@ -111,7 +111,7 @@ storage:
--volume config,kind=host,source=/etc/kubernetes \ --volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \ --mount volume=config,target=/etc/kubernetes \
--insecure-options=image \ --insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.13.3 \ docker://k8s.gcr.io/hyperkube:v1.13.5 \
--net=host \ --net=host \
--dns=host \ --dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname | tr '[:upper:]' '[:lower:]') --exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname | tr '[:upper:]' '[:lower:]')

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.13.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube)) * Kubernetes v1.13.5 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests) # Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" { module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=c12a11c8006606b59335ecc994abe22358aaf68b" source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=da0321287ba8cd48e4492894483fcdc1da057657"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
api_servers = ["${var.k8s_domain_name}"] api_servers = ["${var.k8s_domain_name}"]

View File

@ -7,7 +7,7 @@ systemd:
- name: 40-etcd-cluster.conf - name: 40-etcd-cluster.conf
contents: | contents: |
[Service] [Service]
Environment="ETCD_IMAGE_TAG=v3.3.11" Environment="ETCD_IMAGE_TAG=v3.3.12"
Environment="ETCD_NAME=${etcd_name}" Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379" Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380" Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
@ -128,7 +128,7 @@ storage:
contents: contents:
inline: | inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.13.3 KUBELET_IMAGE_TAG=v1.13.5
- path: /etc/hostname - path: /etc/hostname
filesystem: root filesystem: root
mode: 0644 mode: 0644

View File

@ -89,7 +89,7 @@ storage:
contents: contents:
inline: | inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.13.3 KUBELET_IMAGE_TAG=v1.13.5
- path: /etc/hostname - path: /etc/hostname
filesystem: root filesystem: root
mode: 0644 mode: 0644

View File

@ -5,7 +5,7 @@ resource "matchbox_group" "install" {
profile = "${local.flavor == "flatcar" ? var.cached_install == "true" ? element(matchbox_profile.cached-flatcar-linux-install.*.name, count.index) : element(matchbox_profile.flatcar-install.*.name, count.index) : var.cached_install == "true" ? element(matchbox_profile.cached-container-linux-install.*.name, count.index) : element(matchbox_profile.container-linux-install.*.name, count.index)}" profile = "${local.flavor == "flatcar" ? var.cached_install == "true" ? element(matchbox_profile.cached-flatcar-linux-install.*.name, count.index) : element(matchbox_profile.flatcar-install.*.name, count.index) : var.cached_install == "true" ? element(matchbox_profile.cached-container-linux-install.*.name, count.index) : element(matchbox_profile.container-linux-install.*.name, count.index)}"
selector { selector = {
mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}" mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}"
} }
} }
@ -15,7 +15,7 @@ resource "matchbox_group" "controller" {
name = "${format("%s-%s", var.cluster_name, element(var.controller_names, count.index))}" name = "${format("%s-%s", var.cluster_name, element(var.controller_names, count.index))}"
profile = "${element(matchbox_profile.controllers.*.name, count.index)}" profile = "${element(matchbox_profile.controllers.*.name, count.index)}"
selector { selector = {
mac = "${element(var.controller_macs, count.index)}" mac = "${element(var.controller_macs, count.index)}"
os = "installed" os = "installed"
} }
@ -26,7 +26,7 @@ resource "matchbox_group" "worker" {
name = "${format("%s-%s", var.cluster_name, element(var.worker_names, count.index))}" name = "${format("%s-%s", var.cluster_name, element(var.worker_names, count.index))}"
profile = "${element(matchbox_profile.workers.*.name, count.index)}" profile = "${element(matchbox_profile.workers.*.name, count.index)}"
selector { selector = {
mac = "${element(var.worker_macs, count.index)}" mac = "${element(var.worker_macs, count.index)}"
os = "installed" os = "installed"
} }

View File

@ -11,10 +11,10 @@ resource "matchbox_profile" "container-linux-install" {
count = "${length(var.controller_names) + length(var.worker_names)}" count = "${length(var.controller_names) + length(var.worker_names)}"
name = "${format("%s-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}" name = "${format("%s-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}"
kernel = "http://${local.channel}.release.core-os.net/amd64-usr/${var.os_version}/coreos_production_pxe.vmlinuz" kernel = "${var.download_protocol}://${local.channel}.release.core-os.net/amd64-usr/${var.os_version}/coreos_production_pxe.vmlinuz"
initrd = [ initrd = [
"http://${local.channel}.release.core-os.net/amd64-usr/${var.os_version}/coreos_production_pxe_image.cpio.gz", "${var.download_protocol}://${local.channel}.release.core-os.net/amd64-usr/${var.os_version}/coreos_production_pxe_image.cpio.gz",
] ]
args = [ args = [
@ -34,7 +34,7 @@ data "template_file" "container-linux-install-configs" {
template = "${file("${path.module}/cl/install.yaml.tmpl")}" template = "${file("${path.module}/cl/install.yaml.tmpl")}"
vars { vars = {
os_flavor = "${local.flavor}" os_flavor = "${local.flavor}"
os_channel = "${local.channel}" os_channel = "${local.channel}"
os_version = "${var.os_version}" os_version = "${var.os_version}"
@ -77,7 +77,7 @@ data "template_file" "cached-container-linux-install-configs" {
template = "${file("${path.module}/cl/install.yaml.tmpl")}" template = "${file("${path.module}/cl/install.yaml.tmpl")}"
vars { vars = {
os_flavor = "${local.flavor}" os_flavor = "${local.flavor}"
os_channel = "${local.channel}" os_channel = "${local.channel}"
os_version = "${var.os_version}" os_version = "${var.os_version}"
@ -96,10 +96,10 @@ resource "matchbox_profile" "flatcar-install" {
count = "${length(var.controller_names) + length(var.worker_names)}" count = "${length(var.controller_names) + length(var.worker_names)}"
name = "${format("%s-flatcar-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}" name = "${format("%s-flatcar-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}"
kernel = "http://${local.channel}.release.flatcar-linux.net/amd64-usr/${var.os_version}/flatcar_production_pxe.vmlinuz" kernel = "${var.download_protocol}://${local.channel}.release.flatcar-linux.net/amd64-usr/${var.os_version}/flatcar_production_pxe.vmlinuz"
initrd = [ initrd = [
"http://${local.channel}.release.flatcar-linux.net/amd64-usr/${var.os_version}/flatcar_production_pxe_image.cpio.gz", "${var.download_protocol}://${local.channel}.release.flatcar-linux.net/amd64-usr/${var.os_version}/flatcar_production_pxe_image.cpio.gz",
] ]
args = [ args = [
@ -159,7 +159,7 @@ data "template_file" "controller-configs" {
template = "${file("${path.module}/cl/controller.yaml.tmpl")}" template = "${file("${path.module}/cl/controller.yaml.tmpl")}"
vars { vars = {
domain_name = "${element(var.controller_domains, count.index)}" domain_name = "${element(var.controller_domains, count.index)}"
etcd_name = "${element(var.controller_names, count.index)}" etcd_name = "${element(var.controller_names, count.index)}"
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}" etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
@ -190,7 +190,7 @@ data "template_file" "worker-configs" {
template = "${file("${path.module}/cl/worker.yaml.tmpl")}" template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
vars { vars = {
domain_name = "${element(var.worker_domains, count.index)}" domain_name = "${element(var.worker_domains, count.index)}"
cluster_dns_service_ip = "${module.bootkube.cluster_dns_service_ip}" cluster_dns_service_ip = "${module.bootkube.cluster_dns_service_ip}"
cluster_domain_suffix = "${var.cluster_domain_suffix}" cluster_domain_suffix = "${var.cluster_domain_suffix}"

View File

@ -118,6 +118,12 @@ variable "cluster_domain_suffix" {
default = "cluster.local" default = "cluster.local"
} }
variable "download_protocol" {
type = "string"
default = "https"
description = "Protocol iPXE should use to download the kernel and initrd. Defaults to https, which requires iPXE compiled with crypto support. Unused if cached_install is true."
}
variable "cached_install" { variable "cached_install" {
type = "string" type = "string"
default = "false" default = "false"

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.13.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube)) * Kubernetes v1.13.5 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)

View File

@ -1,6 +1,6 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests) # Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" { module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=c12a11c8006606b59335ecc994abe22358aaf68b" source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=da0321287ba8cd48e4492894483fcdc1da057657"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
api_servers = ["${var.k8s_domain_name}"] api_servers = ["${var.k8s_domain_name}"]

View File

@ -84,8 +84,8 @@ runcmd:
- [systemctl, daemon-reload] - [systemctl, daemon-reload]
- [systemctl, restart, NetworkManager] - [systemctl, restart, NetworkManager]
- [hostnamectl, set-hostname, ${domain_name}] - [hostnamectl, set-hostname, ${domain_name}]
- "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.11" - "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.12"
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.3" - "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.5"
- "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.14.0" - "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.14.0"
- [systemctl, start, --no-block, etcd.service] - [systemctl, start, --no-block, etcd.service]
- [systemctl, enable, kubelet.path] - [systemctl, enable, kubelet.path]

View File

@ -60,7 +60,7 @@ runcmd:
- [systemctl, daemon-reload] - [systemctl, daemon-reload]
- [systemctl, restart, NetworkManager] - [systemctl, restart, NetworkManager]
- [hostnamectl, set-hostname, ${domain_name}] - [hostnamectl, set-hostname, ${domain_name}]
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.3" - "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.5"
- [systemctl, enable, kubelet.path] - [systemctl, enable, kubelet.path]
- [systemctl, start, --no-block, kubelet.path] - [systemctl, start, --no-block, kubelet.path]
users: users:

View File

@ -5,11 +5,11 @@ resource "matchbox_group" "install" {
name = "${format("fedora-install-%s", element(concat(var.controller_names, var.worker_names), count.index))}" name = "${format("fedora-install-%s", element(concat(var.controller_names, var.worker_names), count.index))}"
profile = "${element(matchbox_profile.cached-fedora-install.*.name, count.index)}" profile = "${element(matchbox_profile.cached-fedora-install.*.name, count.index)}"
selector { selector = {
mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}" mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}"
} }
metadata { metadata = {
ssh_authorized_key = "${var.ssh_authorized_key}" ssh_authorized_key = "${var.ssh_authorized_key}"
} }
} }
@ -19,7 +19,7 @@ resource "matchbox_group" "controller" {
name = "${format("%s-%s", var.cluster_name, element(var.controller_names, count.index))}" name = "${format("%s-%s", var.cluster_name, element(var.controller_names, count.index))}"
profile = "${element(matchbox_profile.controllers.*.name, count.index)}" profile = "${element(matchbox_profile.controllers.*.name, count.index)}"
selector { selector = {
mac = "${element(var.controller_macs, count.index)}" mac = "${element(var.controller_macs, count.index)}"
os = "installed" os = "installed"
} }
@ -30,7 +30,7 @@ resource "matchbox_group" "worker" {
name = "${format("%s-%s", var.cluster_name, element(var.worker_names, count.index))}" name = "${format("%s-%s", var.cluster_name, element(var.worker_names, count.index))}"
profile = "${element(matchbox_profile.workers.*.name, count.index)}" profile = "${element(matchbox_profile.workers.*.name, count.index)}"
selector { selector = {
mac = "${element(var.worker_macs, count.index)}" mac = "${element(var.worker_macs, count.index)}"
os = "installed" os = "installed"
} }

View File

@ -33,7 +33,7 @@ data "template_file" "install-kickstarts" {
template = "${file("${path.module}/kickstart/fedora-atomic.ks.tmpl")}" template = "${file("${path.module}/kickstart/fedora-atomic.ks.tmpl")}"
vars { vars = {
matchbox_http_endpoint = "${var.matchbox_http_endpoint}" matchbox_http_endpoint = "${var.matchbox_http_endpoint}"
atomic_assets_endpoint = "${local.atomic_assets_endpoint}" atomic_assets_endpoint = "${local.atomic_assets_endpoint}"
mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}" mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}"
@ -54,7 +54,7 @@ data "template_file" "controller-configs" {
template = "${file("${path.module}/cloudinit/controller.yaml.tmpl")}" template = "${file("${path.module}/cloudinit/controller.yaml.tmpl")}"
vars { vars = {
domain_name = "${element(var.controller_domains, count.index)}" domain_name = "${element(var.controller_domains, count.index)}"
etcd_name = "${element(var.controller_names, count.index)}" etcd_name = "${element(var.controller_names, count.index)}"
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}" etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
@ -78,7 +78,7 @@ data "template_file" "worker-configs" {
template = "${file("${path.module}/cloudinit/worker.yaml.tmpl")}" template = "${file("${path.module}/cloudinit/worker.yaml.tmpl")}"
vars { vars = {
domain_name = "${element(var.worker_domains, count.index)}" domain_name = "${element(var.worker_domains, count.index)}"
cluster_dns_service_ip = "${module.bootkube.cluster_dns_service_ip}" cluster_dns_service_ip = "${module.bootkube.cluster_dns_service_ip}"
cluster_domain_suffix = "${var.cluster_domain_suffix}" cluster_domain_suffix = "${var.cluster_domain_suffix}"

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.13.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube)) * Kubernetes v1.13.5 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization * Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests) # Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" { module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=c12a11c8006606b59335ecc994abe22358aaf68b" source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=da0321287ba8cd48e4492894483fcdc1da057657"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"] api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]

View File

@ -7,7 +7,7 @@ systemd:
- name: 40-etcd-cluster.conf - name: 40-etcd-cluster.conf
contents: | contents: |
[Service] [Service]
Environment="ETCD_IMAGE_TAG=v3.3.11" Environment="ETCD_IMAGE_TAG=v3.3.12"
Environment="ETCD_NAME=${etcd_name}" Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379" Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380" Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -56,9 +56,12 @@ systemd:
contents: | contents: |
[Unit] [Unit]
Description=Kubelet via Hyperkube Description=Kubelet via Hyperkube
Requires=coreos-metadata.service
After=coreos-metadata.service
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
EnvironmentFile=/etc/kubernetes/kubelet.env EnvironmentFile=/etc/kubernetes/kubelet.env
EnvironmentFile=/run/metadata/coreos
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \ Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \ --volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \ --mount volume=resolv,target=/etc/resolv.conf \
@ -90,6 +93,7 @@ systemd:
--cluster_domain=${cluster_domain_suffix} \ --cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \ --cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \ --exit-on-lock-contention \
--hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
--kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \ --lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \ --network-plugin=cni \
@ -125,7 +129,7 @@ storage:
contents: contents:
inline: | inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.13.3 KUBELET_IMAGE_TAG=v1.13.5
- path: /etc/sysctl.d/max-user-watches.conf - path: /etc/sysctl.d/max-user-watches.conf
filesystem: root filesystem: root
contents: contents:

View File

@ -31,9 +31,12 @@ systemd:
contents: | contents: |
[Unit] [Unit]
Description=Kubelet via Hyperkube Description=Kubelet via Hyperkube
Requires=coreos-metadata.service
After=coreos-metadata.service
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
EnvironmentFile=/etc/kubernetes/kubelet.env EnvironmentFile=/etc/kubernetes/kubelet.env
EnvironmentFile=/run/metadata/coreos
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \ Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \ --volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \ --mount volume=resolv,target=/etc/resolv.conf \
@ -63,6 +66,7 @@ systemd:
--cluster_domain=${cluster_domain_suffix} \ --cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \ --cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \ --exit-on-lock-contention \
--hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
--kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \ --lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \ --network-plugin=cni \
@ -95,7 +99,7 @@ storage:
contents: contents:
inline: | inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.13.3 KUBELET_IMAGE_TAG=v1.13.5
- path: /etc/sysctl.d/max-user-watches.conf - path: /etc/sysctl.d/max-user-watches.conf
filesystem: root filesystem: root
contents: contents:
@ -113,7 +117,7 @@ storage:
--volume config,kind=host,source=/etc/kubernetes \ --volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \ --mount volume=config,target=/etc/kubernetes \
--insecure-options=image \ --insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.13.3 \ docker://k8s.gcr.io/hyperkube:v1.13.5 \
--net=host \ --net=host \
--dns=host \ --dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname) --exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)

View File

@ -93,7 +93,7 @@ data "template_file" "etcds" {
count = "${var.controller_count}" count = "${var.controller_count}"
template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380" template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380"
vars { vars = {
index = "${count.index}" index = "${count.index}"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
dns_zone = "${var.dns_zone}" dns_zone = "${var.dns_zone}"

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.13.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube)) * Kubernetes v1.13.5 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/) * Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)

View File

@ -1,6 +1,6 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests) # Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" { module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=c12a11c8006606b59335ecc994abe22358aaf68b" source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=da0321287ba8cd48e4492894483fcdc1da057657"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"] api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]

View File

@ -19,9 +19,24 @@ write_files:
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true ETCD_PEER_CLIENT_CERT_AUTH=true
- path: /etc/systemd/system/cloud-metadata.service
content: |
[Unit]
Description=Cloud metadata agent
[Service]
Type=oneshot
Environment=OUTPUT=/run/metadata/cloud
ExecStart=/usr/bin/mkdir -p /run/metadata
ExecStart=/usr/bin/bash -c 'echo "HOSTNAME_OVERRIDE=$(curl\
--url http://169.254.169.254/metadata/v1/interfaces/private/0/ipv4/address\
--retry 10)" > $${OUTPUT}'
[Install]
WantedBy=multi-user.target
- path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf - path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf
content: | content: |
[Unit] [Unit]
Requires=cloud-metadata.service
After=cloud-metadata.service
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
@ -75,10 +90,11 @@ bootcmd:
- [modprobe, ip_vs] - [modprobe, ip_vs]
runcmd: runcmd:
- [systemctl, daemon-reload] - [systemctl, daemon-reload]
- "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.11" - "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.12"
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.3" - "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.5"
- "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.14.0" - "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.14.0"
- [systemctl, start, --no-block, etcd.service] - [systemctl, start, --no-block, etcd.service]
- [systemctl, enable, cloud-metadata.service]
- [systemctl, enable, kubelet.path] - [systemctl, enable, kubelet.path]
- [systemctl, start, --no-block, kubelet.path] - [systemctl, start, --no-block, kubelet.path]
users: users:

View File

@ -1,8 +1,23 @@
#cloud-config #cloud-config
write_files: write_files:
- path: /etc/systemd/system/cloud-metadata.service
content: |
[Unit]
Description=Cloud metadata agent
[Service]
Type=oneshot
Environment=OUTPUT=/run/metadata/cloud
ExecStart=/usr/bin/mkdir -p /run/metadata
ExecStart=/usr/bin/bash -c 'echo "HOSTNAME_OVERRIDE=$(curl\
--url http://169.254.169.254/metadata/v1/interfaces/private/0/ipv4/address\
--retry 10)" > $${OUTPUT}'
[Install]
WantedBy=multi-user.target
- path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf - path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf
content: | content: |
[Unit] [Unit]
Requires=cloud-metadata.service
After=cloud-metadata.service
Wants=rpc-statd.service Wants=rpc-statd.service
[Service] [Service]
ExecStartPre=/bin/mkdir -p /opt/cni/bin ExecStartPre=/bin/mkdir -p /opt/cni/bin
@ -51,7 +66,8 @@ bootcmd:
- [modprobe, ip_vs] - [modprobe, ip_vs]
runcmd: runcmd:
- [systemctl, daemon-reload] - [systemctl, daemon-reload]
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.3" - [systemctl, enable, cloud-metadata.service]
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.5"
- [systemctl, enable, kubelet.path] - [systemctl, enable, kubelet.path]
- [systemctl, start, --no-block, kubelet.path] - [systemctl, start, --no-block, kubelet.path]
users: users:

View File

@ -87,7 +87,7 @@ data "template_file" "etcds" {
count = "${var.controller_count}" count = "${var.controller_count}"
template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380" template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380"
vars { vars = {
index = "${count.index}" index = "${count.index}"
cluster_name = "${var.cluster_name}" cluster_name = "${var.cluster_name}"
dns_zone = "${var.dns_zone}" dns_zone = "${var.dns_zone}"

View File

@ -14,7 +14,8 @@ kubectl port-forward grafana-POD-ID 8080 -n monitoring
Visit [127.0.0.1:8080](http://127.0.0.1:8080) to view the bundled dashboards. Visit [127.0.0.1:8080](http://127.0.0.1:8080) to view the bundled dashboards.
![Grafana Capacity Planning](../img/grafana-capacity.png) ![Grafana etcd](../img/grafana-etcd.png)
![Grafana Control Plane](../img/grafana-control-plane.png) ![Grafana resources cluster](../img/grafana-resources-cluster.png)
![Grafana Node View](../img/grafana-node.png) ![Grafana usage cluster](../img/grafana-usage-cluster.png)
![Grafana usage node](../img/grafana-usage-node.png)

View File

@ -1,19 +0,0 @@
# Heapster
[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command.
## Create
```sh
kubectl apply -f addons/heapster -R
```
## Usage
Allow heapster to run for a few minutes, then check CPU and memory usage.
```sh
kubectl top node
kubectl top pod
```

View File

@ -4,7 +4,6 @@ Every Typhoon cluster is verified to work well with several post-install addons.
* [CLUO](cluo.md) (Container Linux only) * [CLUO](cluo.md) (Container Linux only)
* Nginx [Ingress Controller](ingress.md) * Nginx [Ingress Controller](ingress.md)
* [Heapster](heapster.md)
* [Prometheus](prometheus.md) * [Prometheus](prometheus.md)
* [Grafana](grafana.md) * [Grafana](grafana.md)

View File

@ -16,7 +16,7 @@ Create a cluster following the AWS [tutorial](../cl/aws.md#cluster). Define a wo
```tf ```tf
module "tempest-worker-pool" { module "tempest-worker-pool" {
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes/workers?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes/workers?ref=v1.13.5"
providers = { providers = {
aws = "aws.default" aws = "aws.default"
@ -82,7 +82,7 @@ Create a cluster following the Azure [tutorial](../cl/azure.md#cluster). Define
```tf ```tf
module "ramius-worker-pool" { module "ramius-worker-pool" {
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes/workers?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes/workers?ref=v1.13.5"
providers = { providers = {
azurerm = "azurerm.default" azurerm = "azurerm.default"
@ -152,7 +152,7 @@ Create a cluster following the Google Cloud [tutorial](../cl/google-cloud.md#clu
```tf ```tf
module "yavin-worker-pool" { module "yavin-worker-pool" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes/workers?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes/workers?ref=v1.13.5"
providers = { providers = {
google = "google.default" google = "google.default"
@ -187,11 +187,11 @@ Verify a managed instance group of workers joins the cluster within a few minute
``` ```
$ kubectl get nodes $ kubectl get nodes
NAME STATUS AGE VERSION NAME STATUS AGE VERSION
yavin-controller-0.c.example-com.internal Ready 6m v1.13.3 yavin-controller-0.c.example-com.internal Ready 6m v1.13.5
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.13.3 yavin-worker-jrbf.c.example-com.internal Ready 5m v1.13.5
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.13.3 yavin-worker-mzdm.c.example-com.internal Ready 5m v1.13.5
yavin-16x-worker-jrbf.c.example-com.internal Ready 3m v1.13.3 yavin-16x-worker-jrbf.c.example-com.internal Ready 3m v1.13.5
yavin-16x-worker-mzdm.c.example-com.internal Ready 3m v1.13.3 yavin-16x-worker-mzdm.c.example-com.internal Ready 3m v1.13.5
``` ```
### Variables ### Variables

View File

@ -1,5 +1,19 @@
# Announce <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo-small.png"> # Announce <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo-small.png">
## March 27, 2019
Last April, Typhoon [introduced](#april-26-2018) alpha support for creating Kubernetes clusters with Fedora Atomic on AWS, Google Cloud, DigitalOcean, and bare-metal. Fedora Atomic shared many of Container Linux's aims for a container-optimized operating system, introduced novel ideas, and provided technical diversification for an uncertain future. However, Project Atomic efforts were merged into Fedora CoreOS and future Fedora Atomic releases are [not expected](http://www.projectatomic.io/blog/2018/06/welcome-to-fedora-coreos/). *Typhoon modules for Fedora Atomic will not be updated much beyond Kubernetes v1.13*. They may later be removed.
Typhoon for Fedora Atomic fell short of goals to provide a consistent, practical experience across operating systems and platforms. The modules have remained alpha, despite improvements. Features like coordinated OS updates and boot-time declarative customization were not realized. Inelegance of Cloud-Init/kickstart loomed large. With that brief but obligatory summary, I'd like to change gears and celebrate the many positives.
Fedora Atomic showcased [rpm-ostree](https://github.com/projectatomic/rpm-ostree) as a different approach to Container Linux's AB update scheme. It provided a viable route toward [CRI-O](https://github.com/kubernetes-sigs/cri-o) to replace Docker as the container engine. And Fedora Atomic devised [system containers](http://www.projectatomic.io/blog/2016/09/intro-to-system-containers/) as a way to package and run raw OCI images through runc for host-level containers[^2]. Many of these ideas will live on in Fedora CoreOS, which is exciting!
For Typhoon, Fedora Atomic brought fresh ideas and broader perspectives about different container-optimized base operating systems and related tools. Its sad to let go of so much work, but I think its time. Many of the concepts and technologies that were explored will surface again and Typhoon is better positioned as a result.
Thank you Project Atomic team members for your work! - dghubble
[^2]: Container Linux's own primordial rkt-fly shim dates back to the pre-OCI era. In some ways, rkt drove the OCI standards that made newer ideas, like system containers, appealing.
## May 23, 2018 ## May 23, 2018
Starting in v1.10.3, Typhoon AWS and bare-metal `container-linux` modules allow picking between the Red Hat [Container Linux](https://coreos.com/os/docs/latest/) (formerly CoreOS Container Linux) and Kinvolk [Flatcar Linux](https://www.flatcar-linux.org/) operating system. Flatcar Linux serves as a drop-in compatible "friendly fork" of Container Linux. Flatcar Linux publishes the same channels and versions as Container Linux and gets provisioned, managed, and operated in an identical way (e.g. login as user "core"). Starting in v1.10.3, Typhoon AWS and bare-metal `container-linux` modules allow picking between the Red Hat [Container Linux](https://coreos.com/os/docs/latest/) (formerly CoreOS Container Linux) and Kinvolk [Flatcar Linux](https://www.flatcar-linux.org/) operating system. Flatcar Linux serves as a drop-in compatible "friendly fork" of Container Linux. Flatcar Linux publishes the same channels and versions as Container Linux and gets provisioned, managed, and operated in an identical way (e.g. login as user "core").

View File

@ -1,9 +1,9 @@
# AWS # AWS
!!! danger !!! danger
Typhoon for Fedora Atomic is alpha. Expect rough edges and changes. Typhoon for Fedora Atomic will not be updated much beyond Kubernetes v1.13.
In this tutorial, we'll create a Kubernetes v1.13.3 cluster on AWS with Fedora Atomic. In this tutorial, we'll create a Kubernetes v1.13.5 cluster on AWS with Fedora Atomic.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a VPC, gateway, subnets, security groups, controller instances, worker auto-scaling group, network load balancer, and TLS assets. Instances are provisioned on first boot with cloud-init. We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a VPC, gateway, subnets, security groups, controller instances, worker auto-scaling group, network load balancer, and TLS assets. Instances are provisioned on first boot with cloud-init.
@ -21,7 +21,7 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your sys
```sh ```sh
$ terraform version $ terraform version
Terraform v0.11.7 Terraform v0.11.12
``` ```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`). Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -44,7 +44,7 @@ Configure the AWS provider to use your access key credentials in a `providers.tf
```tf ```tf
provider "aws" { provider "aws" {
version = "~> 1.13.0" version = "~> 2.3.0"
alias = "default" alias = "default"
region = "eu-central-1" region = "eu-central-1"
@ -83,7 +83,7 @@ Define a Kubernetes cluster using the module `aws/fedora-atomic/kubernetes`.
```tf ```tf
module "aws-tempest" { module "aws-tempest" {
source = "git::https://github.com/poseidon/typhoon//aws/fedora-atomic/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//aws/fedora-atomic/kubernetes?ref=v1.13.5"
providers = { providers = {
aws = "aws.default" aws = "aws.default"
@ -156,9 +156,9 @@ In 5-10 minutes, the Kubernetes cluster will be ready.
$ export KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME STATUS ROLES AGE VERSION NAME STATUS ROLES AGE VERSION
ip-10-0-3-155 Ready controller,master 10m v1.13.3 ip-10-0-3-155 Ready controller,master 10m v1.13.5
ip-10-0-26-65 Ready node 10m v1.13.3 ip-10-0-26-65 Ready node 10m v1.13.5
ip-10-0-41-21 Ready node 10m v1.13.3 ip-10-0-41-21 Ready node 10m v1.13.5
``` ```
List the pods. List the pods.

View File

@ -1,9 +1,9 @@
# Bare-Metal # Bare-Metal
!!! danger !!! danger
Typhoon for Fedora Atomic is alpha. Expect rough edges and changes. Typhoon for Fedora Atomic will not be updated much beyond Kubernetes v1.13.
In this tutorial, we'll network boot and provision a Kubernetes v1.13.3 cluster on bare-metal with Fedora Atomic. In this tutorial, we'll network boot and provision a Kubernetes v1.13.5 cluster on bare-metal with Fedora Atomic.
First, we'll deploy a [Matchbox](https://github.com/coreos/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Fedora Atomic via kickstart, reboot into the disk install, and provision themselves as Kubernetes controllers or workers via cloud-init. First, we'll deploy a [Matchbox](https://github.com/coreos/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Fedora Atomic via kickstart, reboot into the disk install, and provision themselves as Kubernetes controllers or workers via cloud-init.
@ -171,23 +171,15 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your sys
```sh ```sh
$ terraform version $ terraform version
Terraform v0.11.7 Terraform v0.11.12
``` ```
Add the [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) plugin binary for your system. Add the [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh ```sh
wget https://github.com/coreos/terraform-provider-matchbox/releases/download/v0.2.2/terraform-provider-matchbox-v0.2.2-linux-amd64.tar.gz wget https://github.com/coreos/terraform-provider-matchbox/releases/download/v0.2.3/terraform-provider-matchbox-v0.2.3-linux-amd64.tar.gz
tar xzf terraform-provider-matchbox-v0.2.2-linux-amd64.tar.gz tar xzf terraform-provider-matchbox-v0.2.3-linux-amd64.tar.gz
sudo mv terraform-provider-matchbox-v0.2.2-linux-amd64/terraform-provider-matchbox /usr/local/bin/ mv terraform-provider-matchbox-v0.2.3-linux-amd64/terraform-provider-matchbox ~/.terraform.d/plugins/terraform-provider-matchbox_v0.2.3
```
Add the plugin to your `~/.terraformrc`.
```
providers {
matchbox = "/usr/local/bin/terraform-provider-matchbox"
}
``` ```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`). Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -202,6 +194,7 @@ Configure the Matchbox provider to use your Matchbox API endpoint and client cer
```tf ```tf
provider "matchbox" { provider "matchbox" {
version = "0.2.3"
endpoint = "matchbox.example.com:8081" endpoint = "matchbox.example.com:8081"
client_cert = "${file("~/.config/matchbox/client.crt")}" client_cert = "${file("~/.config/matchbox/client.crt")}"
client_key = "${file("~/.config/matchbox/client.key")}" client_key = "${file("~/.config/matchbox/client.key")}"
@ -235,7 +228,7 @@ Define a Kubernetes cluster using the module `bare-metal/fedora-atomic/kubernete
```tf ```tf
module "bare-metal-mercury" { module "bare-metal-mercury" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/fedora-atomic/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//bare-metal/fedora-atomic/kubernetes?ref=v1.13.5"
providers = { providers = {
local = "local.default" local = "local.default"
@ -361,9 +354,9 @@ bootkube[5]: Tearing down temporary bootstrap control plane...
$ export KUBECONFIG=/home/user/.secrets/clusters/mercury/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/mercury/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME STATUS ROLES AGE VERSION NAME STATUS ROLES AGE VERSION
node1.example.com Ready controller,master 10m v1.13.3 node1.example.com Ready controller,master 10m v1.13.5
node2.example.com Ready node 10m v1.13.3 node2.example.com Ready node 10m v1.13.5
node3.example.com Ready node 10m v1.13.3 node3.example.com Ready node 10m v1.13.5
``` ```
List the pods. List the pods.

View File

@ -1,9 +1,9 @@
# Digital Ocean # Digital Ocean
!!! danger !!! danger
Typhoon for Fedora Atomic is alpha. Expect rough edges and changes. Typhoon for Fedora Atomic will not be updated much beyond Kubernetes v1.13.
In this tutorial, we'll create a Kubernetes v1.13.3 cluster on DigitalOcean with Fedora Atomic. In this tutorial, we'll create a Kubernetes v1.13.5 cluster on DigitalOcean with Fedora Atomic.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create controller droplets, worker droplets, DNS records, tags, and TLS assets. Instances are provisioned on first boot with cloud-init. We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create controller droplets, worker droplets, DNS records, tags, and TLS assets. Instances are provisioned on first boot with cloud-init.
@ -21,7 +21,7 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your sys
```sh ```sh
$ terraform version $ terraform version
Terraform v0.11.7 Terraform v0.11.12
``` ```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`). Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -45,7 +45,7 @@ Configure the DigitalOcean provider to use your token in a `providers.tf` file.
```tf ```tf
provider "digitalocean" { provider "digitalocean" {
version = "1.0.0" version = "~> 1.1.0"
token = "${chomp(file("~/.config/digital-ocean/token"))}" token = "${chomp(file("~/.config/digital-ocean/token"))}"
alias = "default" alias = "default"
} }
@ -77,7 +77,7 @@ Define a Kubernetes cluster using the module `digital-ocean/fedora-atomic/kubern
```tf ```tf
module "digital-ocean-nemo" { module "digital-ocean-nemo" {
source = "git::https://github.com/poseidon/typhoon//digital-ocean/fedora-atomic/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//digital-ocean/fedora-atomic/kubernetes?ref=v1.13.5"
providers = { providers = {
digitalocean = "digitalocean.default" digitalocean = "digitalocean.default"
@ -152,9 +152,9 @@ In 3-6 minutes, the Kubernetes cluster will be ready.
$ export KUBECONFIG=/home/user/.secrets/clusters/nemo/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/nemo/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME STATUS ROLES AGE VERSION NAME STATUS ROLES AGE VERSION
nemo-controller-0 Ready controller,master 10m v1.13.3 10.132.110.130 Ready controller,master 10m v1.13.5
nemo-worker-0 Ready node 10m v1.13.3 10.132.115.81 Ready node 10m v1.13.5
nemo-worker-1 Ready node 10m v1.13.3 10.132.124.107 Ready node 10m v1.13.5
``` ```
List the pods. List the pods.
@ -175,7 +175,7 @@ kube-system kube-proxy-k35rc 1/1 Running 0
kube-system kube-scheduler-3895335239-2bc4c 1/1 Running 0 11m kube-system kube-scheduler-3895335239-2bc4c 1/1 Running 0 11m
kube-system kube-scheduler-3895335239-b7q47 1/1 Running 1 11m kube-system kube-scheduler-3895335239-b7q47 1/1 Running 1 11m
kube-system pod-checkpointer-pr1lq 1/1 Running 0 11m kube-system pod-checkpointer-pr1lq 1/1 Running 0 11m
kube-system pod-checkpointer-pr1lq-nemo-controller-0 1/1 Running 0 10m kube-system pod-checkpointer-pr1lq-10.132.115.81 1/1 Running 0 10m
``` ```
## Going Further ## Going Further

View File

@ -1,9 +1,9 @@
# Google Cloud # Google Cloud
!!! danger !!! danger
Typhoon for Fedora Atomic is alpha. Fedora does not publish official images for Google Cloud so you must prepare them yourself. Expect rough edges and changes. Typhoon for Fedora Atomic will not be updated much beyond Kubernetes v1.13. Fedora does not publish official images for Google Cloud so you must prepare them yourself. Expect rough edges and changes.
In this tutorial, we'll create a Kubernetes v1.13.3 cluster on Google Compute Engine with Fedora Atomic. In this tutorial, we'll create a Kubernetes v1.13.5 cluster on Google Compute Engine with Fedora Atomic.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a network, firewall rules, health checks, controller instances, worker managed instance group, load balancers, and TLS assets. Instances are provisioned on first boot with cloud-init. We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a network, firewall rules, health checks, controller instances, worker managed instance group, load balancers, and TLS assets. Instances are provisioned on first boot with cloud-init.
@ -22,7 +22,7 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your sys
```sh ```sh
$ terraform version $ terraform version
Terraform v0.11.7 Terraform v0.11.12
``` ```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`). Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -35,7 +35,7 @@ cd infra/clusters
Login to your Google Console [API Manager](https://console.cloud.google.com/apis/dashboard) and select a project, or [signup](https://cloud.google.com/free/) if you don't have an account. Login to your Google Console [API Manager](https://console.cloud.google.com/apis/dashboard) and select a project, or [signup](https://cloud.google.com/free/) if you don't have an account.
Select "Credentials" and create a service account key. Choose the "Compute Engine Admin" role and save the JSON private key to a file that can be referenced in configs. Select "Credentials" and create a service account key. Choose the "Compute Engine Admin" and "DNS Administrator" roles and save the JSON private key to a file that can be referenced in configs.
```sh ```sh
mv ~/Downloads/project-id-43048204.json ~/.config/google-cloud/terraform.json mv ~/Downloads/project-id-43048204.json ~/.config/google-cloud/terraform.json
@ -45,7 +45,7 @@ Configure the Google Cloud provider to use your service account key, project-id,
```tf ```tf
provider "google" { provider "google" {
version = "1.6" version = "~> 2.2.0"
alias = "default" alias = "default"
credentials = "${file("~/.config/google-cloud/terraform.json")}" credentials = "${file("~/.config/google-cloud/terraform.json")}"
@ -121,7 +121,7 @@ Define a Kubernetes cluster using the module `google-cloud/fedora-atomic/kuberne
```tf ```tf
module "google-cloud-yavin" { module "google-cloud-yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/fedora-atomic/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//google-cloud/fedora-atomic/kubernetes?ref=v1.13.5"
providers = { providers = {
google = "google.default" google = "google.default"
@ -197,9 +197,9 @@ In 5-10 minutes, the Kubernetes cluster will be ready.
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME ROLES STATUS AGE VERSION NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal controller,master Ready 6m v1.13.3 yavin-controller-0.c.example-com.internal controller,master Ready 6m v1.13.5
yavin-worker-jrbf.c.example-com.internal node Ready 5m v1.13.3 yavin-worker-jrbf.c.example-com.internal node Ready 5m v1.13.5
yavin-worker-mzdm.c.example-com.internal node Ready 5m v1.13.3 yavin-worker-mzdm.c.example-com.internal node Ready 5m v1.13.5
``` ```
List the pods. List the pods.

View File

@ -1,6 +1,6 @@
# AWS # AWS
In this tutorial, we'll create a Kubernetes v1.13.3 cluster on AWS with Container Linux. In this tutorial, we'll create a Kubernetes v1.13.5 cluster on AWS with Container Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a VPC, gateway, subnets, security groups, controller instances, worker auto-scaling group, network load balancer, and TLS assets. We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a VPC, gateway, subnets, security groups, controller instances, worker auto-scaling group, network load balancer, and TLS assets.
@ -18,15 +18,15 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your sys
```sh ```sh
$ terraform version $ terraform version
Terraform v0.11.11 Terraform v0.11.12
``` ```
Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name. Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh ```sh
wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.3.0/terraform-provider-ct-v0.3.0-linux-amd64.tar.gz wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.3.1/terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.3.0-linux-amd64.tar.gz tar xzf terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
mv terraform-provider-ct-v0.3.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.0 mv terraform-provider-ct-v0.3.1-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.1
``` ```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`). Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -49,7 +49,7 @@ Configure the AWS provider to use your access key credentials in a `providers.tf
```tf ```tf
provider "aws" { provider "aws" {
version = "~> 1.13.0" version = "~> 2.3.0"
alias = "default" alias = "default"
region = "eu-central-1" region = "eu-central-1"
@ -57,7 +57,7 @@ provider "aws" {
} }
provider "ct" { provider "ct" {
version = "0.3.0" version = "0.3.1"
} }
provider "local" { provider "local" {
@ -92,7 +92,7 @@ Define a Kubernetes cluster using the module `aws/container-linux/kubernetes`.
```tf ```tf
module "aws-tempest" { module "aws-tempest" {
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.13.5"
providers = { providers = {
aws = "aws.default" aws = "aws.default"
@ -165,9 +165,9 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
$ export KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME STATUS ROLES AGE VERSION NAME STATUS ROLES AGE VERSION
ip-10-0-3-155 Ready controller,master 10m v1.13.3 ip-10-0-3-155 Ready controller,master 10m v1.13.5
ip-10-0-26-65 Ready node 10m v1.13.3 ip-10-0-26-65 Ready node 10m v1.13.5
ip-10-0-41-21 Ready node 10m v1.13.3 ip-10-0-41-21 Ready node 10m v1.13.5
``` ```
List the pods. List the pods.

View File

@ -3,7 +3,7 @@
!!! danger !!! danger
Typhoon for Azure is alpha. For production, use AWS, Google Cloud, or bare-metal. As Azure matures, check [errata](https://github.com/poseidon/typhoon/wiki/Errata) for known shortcomings. Typhoon for Azure is alpha. For production, use AWS, Google Cloud, or bare-metal. As Azure matures, check [errata](https://github.com/poseidon/typhoon/wiki/Errata) for known shortcomings.
In this tutorial, we'll create a Kubernetes v1.13.3 cluster on Azure with Container Linux. In this tutorial, we'll create a Kubernetes v1.13.5 cluster on Azure with Container Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a resource group, virtual network, subnets, security groups, controller availability set, worker scale set, load balancer, and TLS assets. We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a resource group, virtual network, subnets, security groups, controller availability set, worker scale set, load balancer, and TLS assets.
@ -21,15 +21,15 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your sys
```sh ```sh
$ terraform version $ terraform version
Terraform v0.11.11 Terraform v0.11.12
``` ```
Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name. Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh ```sh
wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.3.0/terraform-provider-ct-v0.3.0-linux-amd64.tar.gz wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.3.1/terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.3.0-linux-amd64.tar.gz tar xzf terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
mv terraform-provider-ct-v0.3.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.0 mv terraform-provider-ct-v0.3.1-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.1
``` ```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`). Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -50,12 +50,12 @@ Configure the Azure provider in a `providers.tf` file.
```tf ```tf
provider "azurerm" { provider "azurerm" {
version = "1.16.0" version = "~> 1.23.0"
alias = "default" alias = "default"
} }
provider "ct" { provider "ct" {
version = "0.3.0" version = "0.3.1"
} }
provider "local" { provider "local" {
@ -87,7 +87,7 @@ Define a Kubernetes cluster using the module `azure/container-linux/kubernetes`.
```tf ```tf
module "azure-ramius" { module "azure-ramius" {
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes?ref=v1.13.5"
providers = { providers = {
azurerm = "azurerm.default" azurerm = "azurerm.default"
@ -161,9 +161,9 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
$ export KUBECONFIG=/home/user/.secrets/clusters/ramius/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/ramius/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME STATUS ROLES AGE VERSION NAME STATUS ROLES AGE VERSION
ramius-controller-0 Ready controller,master 24m v1.13.3 ramius-controller-0 Ready controller,master 24m v1.13.5
ramius-worker-000001 Ready node 25m v1.13.3 ramius-worker-000001 Ready node 25m v1.13.5
ramius-worker-000002 Ready node 24m v1.13.3 ramius-worker-000002 Ready node 24m v1.13.5
``` ```
List the pods. List the pods.

View File

@ -1,6 +1,6 @@
# Bare-Metal # Bare-Metal
In this tutorial, we'll network boot and provision a Kubernetes v1.13.3 cluster on bare-metal with Container Linux. In this tutorial, we'll network boot and provision a Kubernetes v1.13.5 cluster on bare-metal with Container Linux.
First, we'll deploy a [Matchbox](https://github.com/coreos/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Container Linux to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers via Ignition. First, we'll deploy a [Matchbox](https://github.com/coreos/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Container Linux to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers via Ignition.
@ -9,7 +9,7 @@ Controllers are provisioned to run an `etcd-member` peer and a `kubelet` service
## Requirements ## Requirements
* Machines with 2GB RAM, 30GB disk, PXE-enabled NIC, IPMI * Machines with 2GB RAM, 30GB disk, PXE-enabled NIC, IPMI
* PXE-enabled [network boot](https://coreos.com/matchbox/docs/latest/network-setup.html) environment * PXE-enabled [network boot](https://coreos.com/matchbox/docs/latest/network-setup.html) environment (with HTTPS support)
* Matchbox v0.6+ deployment with API enabled * Matchbox v0.6+ deployment with API enabled
* Matchbox credentials `client.crt`, `client.key`, `ca.crt` * Matchbox credentials `client.crt`, `client.key`, `ca.crt`
* Terraform v0.11.x, [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox), and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally * Terraform v0.11.x, [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox), and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
@ -82,7 +82,7 @@ $ openssl s_client -connect matchbox.example.com:8081 \
## PXE Environment ## PXE Environment
Create a iPXE-enabled network boot environment. Configure PXE clients to chainload [iPXE](http://ipxe.org/cmd) and instruct iPXE clients to chainload from your Matchbox service's `/boot.ipxe` endpoint. Create an iPXE-enabled network boot environment. Configure PXE clients to chainload [iPXE](http://ipxe.org/cmd) firmware compiled to support [HTTPS downloads](https://ipxe.org/crypto). Instruct iPXE clients to chainload from your Matchbox service's `/boot.ipxe` endpoint.
For networks already supporting iPXE clients, you can add a `default.ipxe` config. For networks already supporting iPXE clients, you can add a `default.ipxe` config.
@ -93,8 +93,6 @@ chain http://matchbox.foo:8080/boot.ipxe
For networks with Ubiquiti Routers, you can [configure the router](/topics/hardware/#ubiquiti) itself to chainload machines to iPXE and Matchbox. For networks with Ubiquiti Routers, you can [configure the router](/topics/hardware/#ubiquiti) itself to chainload machines to iPXE and Matchbox.
For a small lab, you may wish to checkout the [quay.io/coreos/dnsmasq](https://quay.io/repository/coreos/dnsmasq) container image and [copy-paste examples](https://github.com/coreos/matchbox/blob/master/Documentation/network-setup.md#coreosdnsmasq).
Read about the [many ways](https://coreos.com/matchbox/docs/latest/network-setup.html) to setup a compliant iPXE-enabled network. There is quite a bit of flexibility: Read about the [many ways](https://coreos.com/matchbox/docs/latest/network-setup.html) to setup a compliant iPXE-enabled network. There is quite a bit of flexibility:
* Continue using existing DHCP, TFTP, or DNS services * Continue using existing DHCP, TFTP, or DNS services
@ -104,29 +102,32 @@ Read about the [many ways](https://coreos.com/matchbox/docs/latest/network-setup
!!! note "" !!! note ""
TFTP chainloading to modern boot firmware, like iPXE, avoids issues with old NICs and allows faster transfer protocols like HTTP to be used. TFTP chainloading to modern boot firmware, like iPXE, avoids issues with old NICs and allows faster transfer protocols like HTTP to be used.
!!! warning
Compile iPXE from [source](https://github.com/ipxe/ipxe) with support for [HTTPS downloads](https://ipxe.org/crypto). iPXE's pre-built firmware binaries do not enable this. If you cannot enable HTTPS downloads, set `download_protocol = "http"` (discouraged).
## Terraform Setup ## Terraform Setup
Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your system. Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your system.
```sh ```sh
$ terraform version $ terraform version
Terraform v0.11.7 Terraform v0.11.12
``` ```
Add the [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name. Add the [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh ```sh
wget https://github.com/coreos/terraform-provider-matchbox/releases/download/v0.2.2/terraform-provider-matchbox-v0.2.2-linux-amd64.tar.gz wget https://github.com/coreos/terraform-provider-matchbox/releases/download/v0.2.3/terraform-provider-matchbox-v0.2.3-linux-amd64.tar.gz
tar xzf terraform-provider-matchbox-v0.2.2-linux-amd64.tar.gz tar xzf terraform-provider-matchbox-v0.2.3-linux-amd64.tar.gz
mv terraform-provider-matchbox-v0.2.2-linux-amd64/terraform-provider-matchbox ~/.terraform.d/plugins/terraform-provider-matchbox_v0.2.2 mv terraform-provider-matchbox-v0.2.3-linux-amd64/terraform-provider-matchbox ~/.terraform.d/plugins/terraform-provider-matchbox_v0.2.3
``` ```
Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name. Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh ```sh
wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.3.0/terraform-provider-ct-v0.3.0-linux-amd64.tar.gz wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.3.1/terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.3.0-linux-amd64.tar.gz tar xzf terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
mv terraform-provider-ct-v0.3.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.0 mv terraform-provider-ct-v0.3.1-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.1
``` ```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`). Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -141,7 +142,7 @@ Configure the Matchbox provider to use your Matchbox API endpoint and client cer
```tf ```tf
provider "matchbox" { provider "matchbox" {
version = "0.2.2" version = "0.2.3"
endpoint = "matchbox.example.com:8081" endpoint = "matchbox.example.com:8081"
client_cert = "${file("~/.config/matchbox/client.crt")}" client_cert = "${file("~/.config/matchbox/client.crt")}"
client_key = "${file("~/.config/matchbox/client.key")}" client_key = "${file("~/.config/matchbox/client.key")}"
@ -149,7 +150,7 @@ provider "matchbox" {
} }
provider "ct" { provider "ct" {
version = "0.3.0" version = "0.3.1"
} }
provider "local" { provider "local" {
@ -179,7 +180,7 @@ Define a Kubernetes cluster using the module `bare-metal/container-linux/kuberne
```tf ```tf
module "bare-metal-mercury" { module "bare-metal-mercury" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes?ref=v1.13.5"
providers = { providers = {
local = "local.default" local = "local.default"
@ -215,6 +216,9 @@ module "bare-metal-mercury" {
"node2.example.com", "node2.example.com",
"node3.example.com", "node3.example.com",
] ]
# set to http only if you cannot chainload to iPXE firmware with https support
# download_protocol = "http"
} }
``` ```
@ -288,9 +292,9 @@ Apply complete! Resources: 55 added, 0 changed, 0 destroyed.
To watch the install to disk (until machines reboot from disk), SSH to port 2222. To watch the install to disk (until machines reboot from disk), SSH to port 2222.
``` ```
# before v1.13.3 # before v1.13.5
$ ssh debug@node1.example.com $ ssh debug@node1.example.com
# after v1.13.3 # after v1.13.5
$ ssh -p 2222 core@node1.example.com $ ssh -p 2222 core@node1.example.com
``` ```
@ -315,9 +319,9 @@ bootkube[5]: Tearing down temporary bootstrap control plane...
$ export KUBECONFIG=/home/user/.secrets/clusters/mercury/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/mercury/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME STATUS ROLES AGE VERSION NAME STATUS ROLES AGE VERSION
node1.example.com Ready controller,master 10m v1.13.3 node1.example.com Ready controller,master 10m v1.13.5
node2.example.com Ready node 10m v1.13.3 node2.example.com Ready node 10m v1.13.5
node3.example.com Ready node 10m v1.13.3 node3.example.com Ready node 10m v1.13.5
``` ```
List the pods. List the pods.
@ -375,6 +379,7 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/bare-me
| Name | Description | Default | Example | | Name | Description | Default | Example |
|:-----|:------------|:--------|:--------| |:-----|:------------|:--------|:--------|
| download_protocol | Protocol iPXE uses to download the kernel and initrd. iPXE must be compiled with [crypto](https://ipxe.org/crypto) support for https. Unused if cached_install is true | "https" | "http" |
| cached_install | PXE boot and install from the Matchbox `/assets` cache. Admin MUST have downloaded Container Linux or Flatcar images into the cache | false | true | | cached_install | PXE boot and install from the Matchbox `/assets` cache. Admin MUST have downloaded Container Linux or Flatcar images into the cache | false | true |
| install_disk | Disk device where Container Linux should be installed | "/dev/sda" | "/dev/sdb" | | install_disk | Disk device where Container Linux should be installed | "/dev/sda" | "/dev/sdb" |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" | | networking | Choice of networking provider | "calico" | "calico" or "flannel" |

View File

@ -1,6 +1,6 @@
# Digital Ocean # Digital Ocean
In this tutorial, we'll create a Kubernetes v1.13.3 cluster on DigitalOcean with Container Linux. In this tutorial, we'll create a Kubernetes v1.13.5 cluster on DigitalOcean with Container Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create controller droplets, worker droplets, DNS records, tags, and TLS assets. We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create controller droplets, worker droplets, DNS records, tags, and TLS assets.
@ -18,15 +18,15 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your sys
```sh ```sh
$ terraform version $ terraform version
Terraform v0.11.11 Terraform v0.11.12
``` ```
Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name. Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh ```sh
wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.3.0/terraform-provider-ct-v0.3.0-linux-amd64.tar.gz wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.3.1/terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.3.0-linux-amd64.tar.gz tar xzf terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
mv terraform-provider-ct-v0.3.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.0 mv terraform-provider-ct-v0.3.1-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.1
``` ```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`). Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -50,13 +50,13 @@ Configure the DigitalOcean provider to use your token in a `providers.tf` file.
```tf ```tf
provider "digitalocean" { provider "digitalocean" {
version = "1.0.0" version = "~> 1.1.0"
token = "${chomp(file("~/.config/digital-ocean/token"))}" token = "${chomp(file("~/.config/digital-ocean/token"))}"
alias = "default" alias = "default"
} }
provider "ct" { provider "ct" {
version = "0.3.0" version = "0.3.1"
} }
provider "local" { provider "local" {
@ -86,7 +86,7 @@ Define a Kubernetes cluster using the module `digital-ocean/container-linux/kube
```tf ```tf
module "digital-ocean-nemo" { module "digital-ocean-nemo" {
source = "git::https://github.com/poseidon/typhoon//digital-ocean/container-linux/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//digital-ocean/container-linux/kubernetes?ref=v1.13.5"
providers = { providers = {
digitalocean = "digitalocean.default" digitalocean = "digitalocean.default"
@ -160,9 +160,9 @@ In 3-6 minutes, the Kubernetes cluster will be ready.
$ export KUBECONFIG=/home/user/.secrets/clusters/nemo/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/nemo/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME STATUS ROLES AGE VERSION NAME STATUS ROLES AGE VERSION
nemo-controller-0 Ready controller,master 10m v1.13.3 10.132.110.130 Ready controller,master 10m v1.13.5
nemo-worker-0 Ready node 10m v1.13.3 10.132.115.81 Ready node 10m v1.13.5
nemo-worker-1 Ready node 10m v1.13.3 10.132.124.107 Ready node 10m v1.13.5
``` ```
List the pods. List the pods.
@ -183,7 +183,7 @@ kube-system kube-proxy-k35rc 1/1 Running 0
kube-system kube-scheduler-3895335239-2bc4c 1/1 Running 0 11m kube-system kube-scheduler-3895335239-2bc4c 1/1 Running 0 11m
kube-system kube-scheduler-3895335239-b7q47 1/1 Running 1 11m kube-system kube-scheduler-3895335239-b7q47 1/1 Running 1 11m
kube-system pod-checkpointer-pr1lq 1/1 Running 0 11m kube-system pod-checkpointer-pr1lq 1/1 Running 0 11m
kube-system pod-checkpointer-pr1lq-nemo-controller-0 1/1 Running 0 10m kube-system pod-checkpointer-pr1lq-10.132.115.81 1/1 Running 0 10m
``` ```
## Going Further ## Going Further

View File

@ -1,6 +1,6 @@
# Google Cloud # Google Cloud
In this tutorial, we'll create a Kubernetes v1.13.3 cluster on Google Compute Engine with Container Linux. In this tutorial, we'll create a Kubernetes v1.13.5 cluster on Google Compute Engine with Container Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a network, firewall rules, health checks, controller instances, worker managed instance group, load balancers, and TLS assets. We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a network, firewall rules, health checks, controller instances, worker managed instance group, load balancers, and TLS assets.
@ -18,15 +18,15 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your sys
```sh ```sh
$ terraform version $ terraform version
Terraform v0.11.7 Terraform v0.11.12
``` ```
Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name. Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh ```sh
wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.3.0/terraform-provider-ct-v0.3.0-linux-amd64.tar.gz wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.3.1/terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.3.0-linux-amd64.tar.gz tar xzf terraform-provider-ct-v0.3.1-linux-amd64.tar.gz
mv terraform-provider-ct-v0.3.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.0 mv terraform-provider-ct-v0.3.1-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.3.1
``` ```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`). Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -39,7 +39,7 @@ cd infra/clusters
Login to your Google Console [API Manager](https://console.cloud.google.com/apis/dashboard) and select a project, or [signup](https://cloud.google.com/free/) if you don't have an account. Login to your Google Console [API Manager](https://console.cloud.google.com/apis/dashboard) and select a project, or [signup](https://cloud.google.com/free/) if you don't have an account.
Select "Credentials" and create a service account key. Choose the "Compute Engine Admin" role and save the JSON private key to a file that can be referenced in configs. Select "Credentials" and create a service account key. Choose the "Compute Engine Admin" and "DNS Administrator" roles and save the JSON private key to a file that can be referenced in configs.
```sh ```sh
mv ~/Downloads/project-id-43048204.json ~/.config/google-cloud/terraform.json mv ~/Downloads/project-id-43048204.json ~/.config/google-cloud/terraform.json
@ -49,7 +49,7 @@ Configure the Google Cloud provider to use your service account key, project-id,
```tf ```tf
provider "google" { provider "google" {
version = "1.6" version = "~> 2.2.0"
alias = "default" alias = "default"
credentials = "${file("~/.config/google-cloud/terraform.json")}" credentials = "${file("~/.config/google-cloud/terraform.json")}"
@ -58,7 +58,7 @@ provider "google" {
} }
provider "ct" { provider "ct" {
version = "0.3.0" version = "0.3.1"
} }
provider "local" { provider "local" {
@ -93,7 +93,7 @@ Define a Kubernetes cluster using the module `google-cloud/container-linux/kuber
```tf ```tf
module "google-cloud-yavin" { module "google-cloud-yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.13.5"
providers = { providers = {
google = "google.default" google = "google.default"
@ -168,9 +168,9 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME ROLES STATUS AGE VERSION NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal controller,master Ready 6m v1.13.3 yavin-controller-0.c.example-com.internal controller,master Ready 6m v1.13.5
yavin-worker-jrbf.c.example-com.internal node Ready 5m v1.13.3 yavin-worker-jrbf.c.example-com.internal node Ready 5m v1.13.5
yavin-worker-mzdm.c.example-com.internal node Ready 5m v1.13.3 yavin-worker-mzdm.c.example-com.internal node Ready 5m v1.13.5
``` ```
List the pods. List the pods.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 234 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 259 KiB

BIN
docs/img/grafana-etcd.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 240 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 103 KiB

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a> ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.13.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube)) * Kubernetes v1.13.5 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking * Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/) * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](advanced/worker-pools/), [preemptible](cl/google-cloud/#preemption) workers, and [snippets](advanced/customization/#container-linux) customization * Advanced features like [worker pools](advanced/worker-pools/), [preemptible](cl/google-cloud/#preemption) workers, and [snippets](advanced/customization/#container-linux) customization
@ -33,10 +33,10 @@ Fedora Atomic support is alpha and will evolve as Fedora Atomic is replaced by F
| Platform | Operating System | Terraform Module | Status | | Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------| |---------------|------------------|------------------|--------|
| AWS | Fedora Atomic | [aws/fedora-atomic/kubernetes](atomic/aws.md) | alpha | | AWS | Fedora Atomic | [aws/fedora-atomic/kubernetes](atomic/aws.md) | deprecated |
| Bare-Metal | Fedora Atomic | [bare-metal/fedora-atomic/kubernetes](atomic/bare-metal.md) | alpha | | Bare-Metal | Fedora Atomic | [bare-metal/fedora-atomic/kubernetes](atomic/bare-metal.md) | deprecated |
| Digital Ocean | Fedora Atomic | [digital-ocean/fedora-atomic/kubernetes](atomic/digital-ocean.md) | alpha | | Digital Ocean | Fedora Atomic | [digital-ocean/fedora-atomic/kubernetes](atomic/digital-ocean.md) | deprecated |
| Google Cloud | Fedora Atomic | [google-cloud/fedora-atomic/kubernetes](atomic/google-cloud.md) | alpha | | Google Cloud | Fedora Atomic | [google-cloud/fedora-atomic/kubernetes](atomic/google-cloud.md) | deprecated |
## Documentation ## Documentation
@ -49,7 +49,7 @@ Define a Kubernetes cluster by using the Terraform module for your chosen platfo
```tf ```tf
module "google-cloud-yavin" { module "google-cloud-yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.13.3" source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.13.5"
providers = { providers = {
google = "google.default" google = "google.default"
@ -90,9 +90,9 @@ In 4-8 minutes (varies by platform), the cluster will be ready. This Google Clou
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig $ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
$ kubectl get nodes $ kubectl get nodes
NAME ROLES STATUS AGE VERSION NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal controller,master Ready 6m v1.13.3 yavin-controller-0.c.example-com.internal controller,master Ready 6m v1.13.5
yavin-worker-jrbf.c.example-com.internal node Ready 5m v1.13.3 yavin-worker-jrbf.c.example-com.internal node Ready 5m v1.13.5
yavin-worker-mzdm.c.example-com.internal node Ready 5m v1.13.3 yavin-worker-mzdm.c.example-com.internal node Ready 5m v1.13.5
``` ```
List the pods. List the pods.

View File

@ -4,7 +4,7 @@ Typhoon ensures certain networking hardware integrates well with bare-metal Kube
## Ubiquiti ## Ubiquiti
Ubiquiti EdgeRouters work well with bare-metal Kubernetes clusters. Knowledge about how to setup an EdgeRouter and use the CLI is required. Ubiquiti EdgeRouters and EdgeOS work well with bare-metal Kubernetes clusters. Familiarity with EdgeRouter setup and CLI usage is required.
### PXE ### PXE
@ -12,7 +12,7 @@ Ubiquiti EdgeRouters can provide a PXE-enabled network boot environment for clie
#### ISC DHCP #### ISC DHCP
Add a subnet parameter to the LAN DHCP server to include an ISC DHCP config file. With ISC DHCP, add a subnet parameter to the LAN DHCP server to include an ISC DHCP config file.
``` ```
configure configure
@ -21,7 +21,7 @@ set service dhcp-server shared-network-name NAME subnet SUBNET subnet-parameters
commit-confirm commit-confirm
``` ```
Switch to root (i.e. `sudo -i`) and write the ISC DHCP config `/config/scripts/ipxe.conf`. iPXE client machines will chainload to `matchbox.example.com`, while non-iPXE clients will chainload to `undionly.kpxe` (requires TFTP to be enabled). Switch to root (i.e. `sudo -i`) and write the ISC DHCP config `/config/scripts/ipxe.conf`. iPXE client machines will chainload to `matchbox.example.com`, while non-iPXE clients will chainload to `undionly.kpxe` (requires TFTP).
``` ```
allow bootp; allow bootp;
@ -35,14 +35,23 @@ if exists user-class and option user-class = "iPXE" {
} }
``` ```
#### dnsmasq
With dnsmasq for DHCP, add options to chainload PXE clients to iPXE `undionly.kpxe` (requires TFTP), tag iPXE clients, and chainload iPXE clients to `matchbox.example.com`.
```
set service dns forwarding options 'dhcp-userclass=set:ipxe,iPXE'
set service dns forwarding options 'pxe-service=tag:#ipxe,x86PC,PXE chainload to iPXE,undionly.kpxe'
set service dns forwarding options 'pxe-service=tag:ipxe,x86PC,iPXE,http://matchbox.example.com/boot.ipxe'
```
### TFTP ### TFTP
Use `dnsmasq` as a TFTP server to serve [undionly.kpxe](http://boot.ipxe.org/undionly.kpxe). Use `dnsmasq` as a TFTP server to serve `undionly.kpxe`. Compiling from [source](https://github.com/ipxe/ipxe) with TLS support is strongly recommended. If you use a [pre-compiled](http://boot.ipxe.org/undionly.kpxe) copy, you must set `download_protocol = "http"` in your cluster definition (discouraged).
``` ```
sudo -i sudo -i
mkdir /var/lib/tftpboot mkdir /config/tftpboot && cd /config/tftpboot
cd /var/lib/tftpboot
curl http://boot.ipxe.org/undionly.kpxe -o undionly.kpxe curl http://boot.ipxe.org/undionly.kpxe -o undionly.kpxe
``` ```
@ -52,13 +61,10 @@ Add `dnsmasq` command line options to enable the TFTP file server.
configure configure
show service dns forwarding show service dns forwarding
set service dns forwarding options enable-tftp set service dns forwarding options enable-tftp
set service dns forwarding options tftp-root=/var/lib/tftpboot set service dns forwarding options tftp-root=/config/tftpboot
commit-confirm commit-confirm
``` ```
!!! warning
After firmware upgrades, the `/var/lib/tftpboot` directory will not exist and dnsmasq will not start properly. Repeat this process following an upgrade.
### DHCP ### DHCP
Assign static IPs to clients with known MAC addresses. This is called a static mapping by EdgeOS. Configure the router with the commands based on region inventory. Assign static IPs to clients with known MAC addresses. This is called a static mapping by EdgeOS. Configure the router with the commands based on region inventory.
@ -106,6 +112,9 @@ set protocols static route 10.3.0.0/16 next-hop NODE_IP
commit-confirm commit-confirm
``` ```
!!! note
Adding multiple next-hop nodes provides equal-cost multi-path (ECMP) routing. EdgeOS v2.0+ is required. The kernel in prior versions used flow-hash to balanced packets, whereas with v2.0, round-robin sessions are used.
### Port Forwarding ### Port Forwarding
Expose the [Ingress Controller](/addons/ingress.md#bare-metal) by adding `port-forward` rules that DNAT a port on the router's WAN interface to an internal IP and port. By convention, a public Ingress controller is assigned a fixed service IP (e.g. 10.3.0.12). Expose the [Ingress Controller](/addons/ingress.md#bare-metal) by adding `port-forward` rules that DNAT a port on the router's WAN interface to an internal IP and port. By convention, a public Ingress controller is assigned a fixed service IP (e.g. 10.3.0.12).

Some files were not shown because too many files have changed in this diff Show More