Compare commits

...

69 Commits

Author SHA1 Message Date
c3e22f3d13 Fix minor example typo in README 2019-12-10 23:14:12 -08:00
f69dc2ea0f Update CHANGES and tutorial notes for release
* Update recommended Terraform and provider plugin versions
* Update the rough count of resources created per cluster
since its not been refreshed in a while (will vary based
on cluster options)
2019-12-10 23:03:39 -08:00
c0ce04e1de Update Calico from v3.10.1 to v3.10.2
* https://docs.projectcalico.org/v3.10/release-notes/
2019-12-09 21:03:00 -08:00
ed3550dce1 Update systemd services for the v0.17.x hyperkube
* Binary asset locations within the upstream hyperkube image
changed https://github.com/kubernetes/kubernetes/pull/84662
* Fix Container Linux and Flatcar Linux kubelet.service
(rkt-fly with fairly dated CoreOS kubelet-wrapper)
* Fix Fedora CoreOS kubelet.service (podman)
* Fix Fedora CoreOS bootstrap.service
* Fix delete-node kubectl usage for workers where nodes may
delete themselves on shutdown (e.g. preemptible instances)
2019-12-09 18:39:17 -08:00
de36d99afc Update Kubernetes from v1.16.3 to v1.17.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md/#v1170
2019-12-09 18:31:58 -08:00
4fce9485c8 Reduce kube-controller-manager pod eviction timeout from 5m to 1m
* Reduce time to delete pods on unready nodes from 5m to 1m
* Present since v1.13.3, but mistakenly removed in v1.16.0 static
pod control plane migration

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/148
* https://github.com/poseidon/terraform-render-bootstrap/pull/164
2019-12-08 22:58:31 -08:00
178afe4a9b Reduce apiserver metrics cardinality and extraneous labels
* Stop mapping node labels to targets discovered via Kubernetes
nodes (e.g. etcd, kubelet, cadvisor). It is rarely useful to
store node labels (e.g. kubernetes.io/os=linux) on these metrics
* kube-apiserver's apiserver_request_duration_seconds_bucket metric
has a high cardinality that includes labels for the API group, verb,
scope, resource, and component for each object type, including for
each CRD. This one metric has ~10k time series in a typical cluster
(btw 10-40% of total)
* Removing the apiserver request duration outright would make latency
alerts a NoOp and break a Grafana apiserver panel. Instead, drop series
that have a "group" label. Effectively, only request durations for
core Kubernetes APIs will be kept (e.g. cardinality won't grow with
each CRD added). This reduces the metric to ~2k unique series
2019-12-08 22:48:25 -08:00
d9c7a9e049 Add/update docs for asset_dir and kubeconfig usage
* Original tutorials favored including the platform (e.g.
google-cloud) in modules (e.g. google-cloud-yavin). Prefer
naming conventions where each module / cluster has a simple
name (e.g. yavin) since the platform is usually redundant
* Retain the example cluster naming themes per platform
2019-12-05 22:56:42 -08:00
2837275265 Introduce cluster creation without local writes to asset_dir
* Allow generated assets (TLS materials, manifests) to be
securely distributed to controller node(s) via file provisioner
(i.e. ssh-agent) as an assets bundle file, rather than relying
on assets being locally rendered to disk in an asset_dir and
then securely distributed
* Change `asset_dir` from required to optional. Left unset,
asset_dir defaults to "" and no assets will be written to
files on the machine that runs terraform apply
* Enhancement: Managed cluster assets are kept only in Terraform
state, which supports different backends (GCS, S3, etcd, etc) and
optional encryption. terraform apply accesses state, runs in-memory,
and distributes sensitive materials to controllers without making
use of local disk (simplifies use in CI systems)
* Enhancement: Improve asset unpack and layout process to position
etcd certificates and control plane certificates more cleanly,
without unneeded secret materials

Details:

* Terraform file provisioner support for distributing directories of
contents (with unknown structure) has been limited to reading from a
local directory, meaning local writes to asset_dir were required.
https://github.com/poseidon/typhoon/issues/585 discusses the problem
and newer or upcoming Terraform features that might help.
* Observation: Terraform provisioner support for single files works
well, but iteration isn't viable. We're also constrained to Terraform
language features on the apply side (no extra plugins, no shelling out)
and CoreOS / Fedora tools on the receive side.
* Take a map representation of the contents that would have been splayed
out in asset_dir and pack/encode them into a single file format devised
for easy unpacking. Use an awk one-liner on the receive side to unpack.
In pratice, this has worked well and its rather nice that a single
assets file is transferred by file provisioner (all or none)

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162
2019-12-05 01:24:50 -08:00
5fa002f4f7 Update mkdocs-material from v4.5.0 to v4.5.1 2019-12-02 21:21:16 -08:00
aa275796cb Fix DigitalOcean controller and worker ipv4/ipv6 outputs (#594)
* Fix controller and worker ipv4/ipv4 outputs to be lists of strings
* With Terraform v0.11 syntax, an enclosing list was required to coerce the
output to be a list of strings
* With Terraform v0.12 syntax, the enclosing list shouldn't be needed
2019-12-02 21:20:47 -08:00
26674083b6 Update Grafana from v6.5.0 to v6.5.1
* https://github.com/grafana/grafana/releases/tag/v6.5.1
2019-11-28 14:11:25 -08:00
030a4cec19 Update Grafana from v6.4.4 to v6.5.0
* https://grafana.com/docs/guides/whats-new-in-v6-5/
2019-11-25 22:45:58 -08:00
ddea7dc452 Use new resource dashboards in Grafana deployment
* kubernetes-mixin pod resource dashboards were split into
two ConfigMap parts because they provide richer networking
details
* New dashboards have been used by the author at the global
level, but were missing in the per-cluster Grafana tracked
here
2019-11-25 22:27:11 -08:00
4b485a9bf2 Fix recent deletion of bootstrap module pinned SHA
* Fix deletion of bootstrap module pinned SHA, which was
introduced recently through an automation mistake creating
https://github.com/poseidon/typhoon/pull/589
2019-11-21 22:34:09 -08:00
4704b494f0 Update mkdocs-material from v4.4.3 to v4.4.0
* Upgrade dependency packages as well
2019-11-18 23:05:29 -08:00
525ae23305 Add node-exporter alerts and Grafana dashboard
* Add Prometheus alerts from node-exporter
* Add Grafana dashboard nodes.json, from node-exporter
* Not adding recording rules, since those are only used
by some node-exporter USE dashboards not being included
2019-11-16 13:47:20 -08:00
8a9e8595ae Fix terraform fmt formatting 2019-11-13 23:44:02 -08:00
19ee57dc04 Use GCP region_instance_group_manager version block format
* terraform-provider-google v2.19.0 deprecates `instance_template`
within `google_compute_region_instance_group_manager` in order to
support a scheme with multiple version blocks. Adapt our single
version to the new format to resolve deprecation warnings.
* Fixes: Warning: "instance_template": [DEPRECATED] This field
will be replaced by `version.instance_template` in 3.0.0
* Require terraform-provider-google v2.19.0+ (action required)
2019-11-13 17:41:13 -08:00
0e4ee5efc9 Add small CPU resource requests to static pods
* Set small CPU requests on static pods kube-apiserver,
kube-controller-manager, and kube-scheduler to align with
upstream tooling and for edge cases
* Effectively, a practical case for these requests hasn't been
observed. However, a small static pod CPU request may offer
a slight benefit if a controller became overloaded and the
below mechanisms were insufficient

Existing safeguards:

* Control plane nodes are tainted to isolate them from
ordinary workloads. Even dense workloads can only compress
CPU resources on worker nodes.
* Control plane static pods use the highest priority class, so
contention favors control plane pods (over say node-exporter)
and CPU is compressible too.

See: https://github.com/poseidon/terraform-render-bootstrap/pull/161
2019-11-13 17:18:45 -08:00
a271b9f340 Update CoreDNS from v1.6.2 to v1.6.5
* Add health `lameduck` option 5s. Before CoreDNS shuts down, it will
wait and report unhealthy for 5s to allow time for plugins to shutdown
cleanly
* Minor bug fixes over a few releases
* https://coredns.io/2019/08/31/coredns-1.6.3-release/
* https://coredns.io/2019/09/27/coredns-1.6.4-release/
* https://coredns.io/2019/11/05/coredns-1.6.5-release/
2019-11-13 16:47:44 -08:00
cb0598e275 Adopt Terraform v0.12 templatefile function
* Update terraform-render-bootstrap module to adopt the
Terrform v0.12 templatefile function feature to replace
the use of terraform-provider-template's `template_dir`
* Require Terraform v0.12.6+ which adds `for_each`

Background:

* `template_dir` was added to `terraform-provider-template`
to add support for template directory rendering in CoreOS
Tectonic Kubernetes distribution (~2017)
* Terraform v0.12 introduced a native `templatefile` function
and v0.12.6 introduced native `for_each` support (July 2019)
that makes it possible to replace `template_dir` usage
2019-11-13 16:33:36 -08:00
ad117f4592 Update recommended Terraform provider versions
* Recommend provider plugin version tested against
2019-11-13 13:53:46 -08:00
42b6df89c8 Update Prometheus from v2.14.0-rc.0 to v2.14.0
* https://github.com/prometheus/prometheus/releases/tag/v2.14.0
2019-11-13 13:41:11 -08:00
d7061020ba Update Kubernetes from v1.16.2 to v1.16.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163
2019-11-13 13:05:15 -08:00
a8b7792338 Update Grafana from v6.4.3 to v6.4.4
* https://github.com/grafana/grafana/releases/tag/v6.4.4
2019-11-07 12:00:25 -08:00
a3807086d4 Update Prometheus from v2.13.1 to v2.14.0-rc.0
* Happy PromCon 2019!
* https://github.com/prometheus/prometheus/releases/tag/v2.14.0-rc.0
2019-11-07 11:48:23 -08:00
2c163503f1 Update etcd from v3.4.2 to v3.4.3
* etcd v3.4.3 builds with Go v1.12.12 instead of v1.12.9
and adds a few minor metrics fixes
* https://github.com/etcd-io/etcd/compare/v3.4.2...v3.4.3
2019-11-07 11:41:01 -08:00
0034a15711 Update Calico from v3.10.0 to v3.10.1
* https://docs.projectcalico.org/v3.10/release-notes/
2019-11-07 11:38:32 -08:00
38957163cb Output resource_group_id in Azure (#577)
* Add an output variable `resource_group_id` to the azure module
2019-10-31 01:05:04 -07:00
d4573092b5 Improve Kubelet and Compute Resource dashboards
* Add cluster filter to Kubelet dashboard
* Add network details in resource dashboards
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/275
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/284
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/285
2019-10-28 02:22:15 -07:00
4775e9d0f7 Upgrade Calico v3.9.2 to v3.10.0
* Allow advertising Kubernetes service ClusterIPs to BGPPeer
routers via a BGPConfiguration
* Improve EdgeRouter docs about routes and BGP
* https://docs.projectcalico.org/v3.10/release-notes/
* https://docs.projectcalico.org/v3.10/networking/advertise-service-ips
2019-10-27 14:13:41 -07:00
d418045929 Switch kube-proxy from iptables mode to ipvs mode
* Kubernetes v1.11 considered kube-proxy IPVS mode GA
* Many problems were found #321
* Since then, major blockers seem to have been addressed
2019-10-27 00:37:41 -07:00
eb7b6d39f2 Improve minor aspects of CoreDNS and nginx-ingress dashboards
* Add default 10s refresh rate to custom dashboards to match
those from Kubernetes
* Show labels for "instance" as "pod" for clarity
* Add cluster filter for internal use
2019-10-20 23:16:55 -07:00
33d4c2fd68 Add explicit annotation for Prometheus port to scrape
* Without the prometheus.io/port annotation, Prometheus
service discovery can scrape other Prometheus ports that
may be available.
* For example, Prometheus sidecars (not included) may
be scraped and that may be unintended
2019-10-20 16:05:09 -07:00
de90cb9246 Remove kube-state-metrics addon-resizer
* addon-resizer is outdated and has been dropped from
kube-state-metrics examples. Those using it should look
to the cluster-proportional-vertical-autoscaler.
* Eliminate addon-resizer log spew
* Remove associated Role and RoleBinding
* Also fix kube-state-metrics readinessProbe port
2019-10-20 16:03:29 -07:00
68da420adc Refresh Prometheus rules/alerts and Grafana dashboards
* Update Prometheus rules/alerts and Grafana dashboards
* Remove dashboards that were moved to node-exporter, they
may be added back later if valuable
* Remove kube-prometheus based rules/alerts (ClockSkew alert)
2019-10-19 17:43:47 -07:00
130c97f8eb Update Prometheus from v2.13.0 to v2.13.1
* https://github.com/prometheus/prometheus/releases/tag/v2.13.1
2019-10-18 00:10:25 -07:00
271d2f6b52 Update Grafana from v6.4.2 to v6.4.3
* https://github.com/grafana/grafana/releases/tag/v6.4.3
2019-10-18 00:08:39 -07:00
0595915a19 Cleanup CHANGES notes 2019-10-15 23:25:45 -07:00
e6bc5143aa Default to Calico as the CNI provider on Azure/DigitalOcean
* Change `networking` default from flannel to calico on
Azure and DigitalOcean
* AWS, bare-metal, and Google Cloud continue to default
to Calico (as they have since v1.7.5)
* Typhoon now defaults to using Calico and supporting
NetworkPolicy on all platforms
2019-10-15 23:15:40 -07:00
e4ac1027c8 Update Grafana from v6.4.1 to v6.4.2
* https://github.com/grafana/grafana/releases/tag/v6.4.2
2019-10-15 22:58:43 -07:00
24fc440d83 Update Kubernetes from v1.16.1 to v1.16.2
* Update Calico from v3.9.1 to v3.9.2
2019-10-15 22:42:52 -07:00
a6702573a2 Update etcd from v3.4.1 to v3.4.2
* https://github.com/etcd-io/etcd/releases/tag/v3.4.2
2019-10-15 00:06:15 -07:00
69188af565 Rename CLUO label from "app" to "name"
* Match the labeling pattern in other addons
2019-10-15 00:05:02 -07:00
d874bdd17d Update bootstrap module control plane manifests and type constraints
* Remove unneeded control plane flags that correspond to defaults
* Adopt Terraform v0.12 type constraints in bootstrap module
2019-10-06 21:09:30 -07:00
5b9dab6659 Introduce list of detail objects for bare-metal machines
* Define bare-metal `controllers` and `workers` as a complex type
list(object{name=string, mac=string, domain=string}) to allow
clusters with many machines to be defined more cleanly
* Remove `controller_names` list variable
* Remove `controller_macs` list variable
* Remove `controller_domains` list variable
* Remove `worker_names` list variable
* Remove `worker_macs` list variable
* Remove `worker_domains` list variable
2019-10-06 20:22:45 -07:00
5196709fe0 Update docs, CHANGES, and mkdocs-material
* Update mkdocs-material from v4.4.2 to v4.4.3
* Update recommended Terraform provider versions
* Cleanup the changelog before release
2019-10-06 18:41:25 -07:00
ab72f1ab2d Update Prometheus from v2.12.0 to v2.13.0
* https://github.com/prometheus/prometheus/releases/tag/v2.13.0
2019-10-06 18:22:20 -07:00
5ef4155e08 Detect most recent Fedora CoreOS AMI in region
* Detect the most recent Fedora CoreOS AMI to allow usage
of Fedora CoreOS in supported regions (previously just
us-east-1)
* Unpin the Fedora CoreOS AMI image which was pinned to
images that had been checked. This does mean if Fedora
publishes a broken image, it will be selected
* Filter out "dev" images which have similar naming
2019-10-06 18:13:55 -07:00
15c4b793c3 Use new Fedora CoreOS kernel/initrd/raw asset names
* Fedora CoreOS changed the kernel, initramfs, and raw
image asset download paths and names in 30.20191002.0
2019-10-06 17:31:21 -07:00
36ed53924f Add stricter types for bare-metal modules
* Review variables available in bare-metal kubernetes modules
for Container Linux and Fedora CoreOS
* Deprecate cluster_domain_suffix variable
* Remove deprecated container_linux_oem variable
2019-10-06 17:18:50 -07:00
19de38b30d Fix Prometheus etcd metrics scraping
* Prometheus was configured to use kubernetes discovery
of etcd targets based on nodes matching the node label
node-role.kubernetes.io/controller=true
* Kubernetes v1.16 stopped permitting node role labels
node-role.kubernetes.io/* so Typhoon renamed these labels
(no longer any association with roles) to
node.kubermetes.io/controller=true
* As a result, Prometheus didn't discover etcd targets,
etcd metrics were missing, etcd alerts were ineffective,
and the etcd Grafana dashboard was empty
* Introduced: https://github.com/poseidon/typhoon/pull/543
2019-10-03 19:07:05 -07:00
995824fa6d Add stricter types for DigitalOcean module
* Review variables available in DigitalOcean kubernetes
module and sync with documentation
* Promote Calico for DigitalOcean and Azure beyond experimental
(its the primary mode I've used since it was introduced)
2019-10-02 21:48:24 -07:00
1c5ed84fc2 Update Kubernetes from v1.16.0 to v1.16.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161
2019-10-02 21:31:55 -07:00
ca7d62720e Update Grafana from v6.3.6 to v6.4.1
* https://github.com/grafana/grafana/releases/tag/v6.4.1
2019-10-02 20:36:05 -07:00
26f8d76755 Update kube-state-metrics from v1.7.2 to v1.8.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.8.0
2019-10-01 20:50:33 -07:00
fdd6882a87 Add stricter types to Azure modules
* Review variables available in Azure kubernetes and workers
modules and sync with documentation
* Fix internal workers module default type to Standard_DS1_v2
2019-09-30 22:20:20 -07:00
f82266ac8c Add stricter types for GCP modules
* Review variables available in google-cloud kubernetes
and workers modules and in documentation
2019-09-30 22:04:35 -07:00
7bcf2d7831 Update nginx-ingress from v0.25.1 to v0.26.1
* Add lifecycle hook to allow draining connections for
up to 5 minutes
2019-09-30 22:01:07 -07:00
78bfff0afe Update Fedora CoreOS to testing 30.20190905.0
* Fix duplicated cluster_domain_suffix variable
2019-09-29 11:34:31 -07:00
a6de245d8a Rename bootkube.tf to bootstrap.tf
* Typhoon no longer uses the bootkube project
2019-09-29 11:30:49 -07:00
96afa6a531 Update Calico from v3.8.2 to v3.9.1
* https://docs.projectcalico.org/v3.9/release-notes/
2019-09-29 11:22:53 -07:00
a407ff72df Add stricter types for AWS modules and update docs
* Review variables available in AWS kubernetes and workers
modules and documentation
* Switching between spot and on-demand has worked since
Terraform v0.12
* Generally, there are too many knobs. Less useful ones
should be de-emphasized or removed
* Remove `cluster_domain_suffix` documentation
2019-09-29 11:19:38 -07:00
f453c54956 Update Grafana from v6.3.5 to v6.3.6
* https://github.com/grafana/grafana/releases/tag/v6.3.6
2019-09-28 15:13:46 -07:00
3e34fb075b Update etcd from v3.4.0 to v3.4.1
* https://github.com/etcd-io/etcd/releases/tag/v3.4.1
2019-09-28 15:09:57 -07:00
9bfb1c5faf Update docs and variable types for worker node_labels
* Document worker pools `node_labels` variable to set the
initial node labels for a homogeneous set of workers
* Document `worker_node_labels` convenience variable to
set the initial node labels for default worker nodes
2019-09-28 15:05:12 -07:00
99ab81f79c Add node_labels variable in workers modules to set initial node labels (#550)
* Also add `worker_node_labels` variable in `kubernetes` modules to set
initial node labels for the default workers
2019-09-28 14:59:24 -07:00
8703f2c3c5 Fix missing comma separator on bare-metal and DO
* Introduced in bare-metal and DigitalOcean in #544
while addressing possible ordering race, but after
the v1.16 upgrade validation
2019-09-23 11:05:26 -07:00
113 changed files with 11343 additions and 7574 deletions

View File

@ -4,6 +4,125 @@ Notable changes between versions.
## Latest
## v1.17.0
* Kubernetes [v1.17.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1170)
* Manage clusters without using a local `asset_dir` ([#595](https://github.com/poseidon/typhoon/pull/595))
* Change `asset_dir` to be optional. Remove the variable to skip writing assets locally (**action recommended**)
* Allow keeping cluster assets only in Terraform state ([pluggable](https://www.terraform.io/docs/backends/types/remote.html), encryption) and allow `terraform apply` from stateless automation systems
* Improve asset unpacking on controllers
* Obtain kubeconfig from Terraform module outputs
* Replace usage of `template_dir` with `templatefile` function ([#587](https://github.com/poseidon/typhoon/pull/587))
* Require Terraform version v0.12.6+ (**action required**)
* Update CoreDNS from v1.6.2 to v1.6.5 ([#588](https://github.com/poseidon/typhoon/pull/588))
* Add health `lameduck` option to wait before shutdown
* Update Calico from v3.10.1 to v3.10.2 ([#599](https://github.com/poseidon/typhoon/pull/599))
* Reduce pod eviction timeout for deleting pods on unready nodes from 5m to 1m ([#597](https://github.com/poseidon/typhoon/pull/597))
* Present since [v1.13.3](#v1133), but mistakenly removed in v1.16.0
* Add CPU requests for control plane static pods ([#589](https://github.com/poseidon/typhoon/pull/589))
* May provide slight edge case benefits and aligns with upstream
#### Google
* Use new `google_compute_region_instance_group_manager` version block format
* Fixes warning that `instance_template` is deprecated
* Require `terraform-provider-google` v2.19.0+ (**action required**)
#### Addons
* Update Grafana from v6.4.4 to [v6.5.1](https://grafana.com/docs/guides/whats-new-in-v6-5/)
* Add pod networking details in dashboards ([#593](https://github.com/poseidon/typhoon/pull/593))
* Add node alerts and Grafana dashboard from node-exporter ([#591](https://github.com/poseidon/typhoon/pull/591))
* Reduce Prometheus high cardinality time series ([#596](https://github.com/poseidon/typhoon/pull/596))
## v1.16.3
* Kubernetes [v1.16.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163)
* Update etcd from v3.4.2 to v3.4.3 ([#582](https://github.com/poseidon/typhoon/pull/582))
* Upgrade Calico from v3.9.2 to [v3.10.1](https://docs.projectcalico.org/v3.10/release-notes/)
* Allow advertising service ClusterIPs to peer routers via a [BGPConfiguration](https://docs.projectcalico.org/v3.10/networking/advertise-service-ips)
* Switch `kube-proxy` from iptables to ipvs mode ([#574](https://github.com/poseidon/typhoon/pull/574))
#### Addons
* Update Prometheus from v2.13.0 to [v2.14.0](https://github.com/prometheus/prometheus/releases/tag/v2.14.0)
* Refresh rules, alerts, and dashboards from upstreams
* Remove addon-resizer from kube-state-metrics ([#575](https://github.com/poseidon/typhoon/pull/575))
* Update Grafana from v6.4.2 to v6.4.4
## v1.16.2
* Kubernetes [v1.16.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1162)
* Update etcd from v3.4.1 to v3.4.2 ([#570](https://github.com/poseidon/typhoon/pull/570))
* Update Calico from v3.9.1 to [v3.9.2](https://docs.projectcalico.org/v3.9/release-notes/)
* Default to using Calico and supporting NetworkPolicy on all platforms
#### Azure
* Change default networking provider from "flannel" to "calico" ([#573](https://github.com/poseidon/typhoon/pull/573))
#### Bare-Metal
* Add `controllers` and `workers` as typed lists of machine detail objects ([#566](https://github.com/poseidon/typhoon/pull/566))
* Define clusters' machines cleanly and with Terraform v0.12 type constraints (**action required**, see PR example)
* Remove `controller_names`, `controller_macs`, and `controller_domains` variables
* Remove `worker_names`, `worker_macs`, and `worker_domains` variables
#### DigitalOcean
* Change default networking provider from "flannel" to "calico" ([#573](https://github.com/poseidon/typhoon/pull/573))
#### Addons
* Update Grafana from v6.4.1 to [v6.4.2](https://github.com/grafana/grafana/releases/tag/v6.4.2)
* Change CLUO label from "app" to "name"
## v1.16.1
* Kubernetes [v1.16.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161)
* Update etcd from v3.4.0 to [v3.4.1](https://github.com/etcd-io/etcd/releases/tag/v3.4.1)
* Update Calico from v3.8.2 to [v3.9.1](https://docs.projectcalico.org/v3.9/release-notes/)
* Add Terraform v0.12 variables types ([#553](https://github.com/poseidon/typhoon/pull/553), [#557](https://github.com/poseidon/typhoon/pull/557), [#560](https://github.com/poseidon/typhoon/pull/560), [#556](https://github.com/poseidon/typhoon/pull/556), [#562](https://github.com/poseidon/typhoon/pull/562))
* Deprecate `cluster_domain_suffix` variable
#### AWS
* Add `worker_node_labels` variable to set initial worker node labels ([#550](https://github.com/poseidon/typhoon/pull/550))
* Add `node_labels` variable to internal `workers` pool module ([#550](https://github.com/poseidon/typhoon/pull/550))
* For Fedora CoreOS, detect most recent AMI in the region
#### Azure
* Promote `networking` provider Calico VXLAN out of experimental (set `networking = "calico"`)
* Add `worker_node_labels` variable to set initial worker node labels ([#550](https://github.com/poseidon/typhoon/pull/550))
* Add `node_labels` variable to internal `workers` pool module ([#550](https://github.com/poseidon/typhoon/pull/550))
* Change `workers` module default `vm_type` to `Standard_DS1_v2` (followup to [#539](https://github.com/poseidon/typhoon/pull/539))
#### Bare-Metal
* For Fedora CoreOS, use new kernel, initrd, and raw paths ([#563](https://github.com/poseidon/typhoon/pull/563))
* Fix Terraform missing comma error ([#549](https://github.com/poseidon/typhoon/pull/549))
* Remove deprecated `container_linux_oem` variable ([#562](https://github.com/poseidon/typhoon/pull/562))
#### DigitalOcean
* Promote `networking` provider Calico VXLAN out of experimental (set `networking = "calico"`)
* Fix Terraform missing comma error ([#549](https://github.com/poseidon/typhoon/pull/549))
#### Google Cloud
* Add `worker_node_labels` variable to set initial worker node labels ([#550](https://github.com/poseidon/typhoon/pull/550))
* Add `node_labels` variable to internal `workers` module ([#550](https://github.com/poseidon/typhoon/pull/550))
#### Addons
* Update Prometheus from v2.12.0 to [v2.13.0](https://github.com/prometheus/prometheus/releases/tag/v2.13.0)
* Fix Prometheus etcd target discovery and scraping ([#561](https://github.com/poseidon/typhoon/pull/561), regressed with Kubernetes v1.16.0)
* Update kube-state-metrics from v1.7.2 to v1.8.0
* Update nginx-ingress from v0.25.1 to [v0.26.1](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.26.1) ([#555](https://github.com/poseidon/typhoon/pull/555))
* Add lifecycle hook to allow draining for up to 5 minutes
* Update Grafana from v6.3.5 to [v6.4.1](https://github.com/grafana/grafana/releases/tag/v6.4.1)
## v1.16.0
* Kubernetes [v1.16.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1160) ([#543](https://github.com/poseidon/typhoon/pull/543))

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.0 (upstream)
* Kubernetes v1.17.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/cl/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
@ -47,8 +47,8 @@ A preview of Typhoon for [Fedora CoreOS](https://getfedora.org/coreos/) is avail
Define a Kubernetes cluster by using the Terraform module for your chosen platform and operating system. Here's a minimal example:
```tf
module "google-cloud-yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.16.0"
module "yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.17.0"
# Google Cloud
cluster_name = "yavin"
@ -58,12 +58,17 @@ module "google-cloud-yavin" {
# configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
asset_dir = "/home/user/.secrets/clusters/yavin"
# optional
worker_count = 2
worker_preemptible = true
}
# Obtain cluster kubeconfig
resource "local_file" "kubeconfig-yavin" {
content = module.yavin.kubeconfig-admin
filename = "/home/user/.kube/configs/yavin-config"
}
```
Initialize modules, plan the changes to be made, and apply the changes.
@ -71,20 +76,20 @@ Initialize modules, plan the changes to be made, and apply the changes.
```sh
$ terraform init
$ terraform plan
Plan: 64 to add, 0 to change, 0 to destroy.
Plan: 62 to add, 0 to change, 0 to destroy.
$ terraform apply
Apply complete! Resources: 64 added, 0 changed, 0 destroyed.
Apply complete! Resources: 62 added, 0 changed, 0 destroyed.
```
In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.
```sh
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
$ export KUBECONFIG=/home/user/.kube/configs/yavin-config
$ kubectl get nodes
NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.16.0
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.16.0
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.16.0
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.17.0
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.17.0
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.17.0
```
List the pods.

View File

@ -10,11 +10,11 @@ spec:
maxUnavailable: 1
selector:
matchLabels:
app: container-linux-update-agent
name: container-linux-update-agent
template:
metadata:
labels:
app: container-linux-update-agent
name: container-linux-update-agent
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:

View File

@ -7,11 +7,11 @@ spec:
replicas: 1
selector:
matchLabels:
app: container-linux-update-operator
name: container-linux-update-operator
template:
metadata:
labels:
app: container-linux-update-operator
name: container-linux-update-operator
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-coredns
namespace: monitoring
data:
coredns.json: |-
{
@ -26,7 +22,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -976,17 +972,43 @@ data:
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": null,
"label": "pod",
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(coredns_build_info{job=\"coredns\"}, instance)",
"query": "label_values(coredns_build_info{cluster=\"$cluster\", job=\"coredns\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
@ -1034,3 +1056,7 @@ data:
"uid": "2f3f749259235f58698ea949170d3bd5",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-coredns
namespace: monitoring

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-etcd
namespace: monitoring
data:
etcd.json: |-
{
@ -1229,3 +1225,7 @@ data:
"uid": "c2f4e12cdf69feb95caa41a5a1b423d9",
"version": 215
}
kind: ConfigMap
metadata:
name: grafana-dashboards-etcd
namespace: monitoring

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s-nodes
namespace: monitoring
data:
kubelet.json: |-
{
@ -92,7 +88,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(up{job=\"kubelet\"})",
"expr": "sum(up{cluster=\"$cluster\", job=\"kubelet\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -176,7 +172,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(kubelet_running_pod_count{job=\"kubelet\", instance=~\"$instance\"})",
"expr": "sum(kubelet_running_pod_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -260,7 +256,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(kubelet_running_container_count{job=\"kubelet\", instance=~\"$instance\"})",
"expr": "sum(kubelet_running_container_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -344,7 +340,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(volume_manager_total_volumes{job=\"kubelet\", instance=~\"$instance\", state=\"actual_state_of_world\"})",
"expr": "sum(volume_manager_total_volumes{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\", state=\"actual_state_of_world\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -428,7 +424,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(volume_manager_total_volumes{job=\"kubelet\", instance=~\"$instance\",state=\"desired_state_of_world\"})",
"expr": "sum(volume_manager_total_volumes{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\",state=\"desired_state_of_world\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -512,7 +508,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(kubelet_node_config_error{job=\"kubelet\", instance=~\"$instance\"}[5m]))",
"expr": "sum(rate(kubelet_node_config_error{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -592,7 +588,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_runtime_operations_total{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (operation_type, instance)",
"expr": "sum(rate(kubelet_runtime_operations_total{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (operation_type, instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
@ -683,7 +679,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_runtime_operations_errors_total{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_type)",
"expr": "sum(rate(kubelet_runtime_operations_errors_total{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_type)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
@ -787,7 +783,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_runtime_operations_duration_seconds_bucket{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_type, le))",
"expr": "histogram_quantile(0.99, sum(rate(kubelet_runtime_operations_duration_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_type, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
@ -891,14 +887,14 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_pod_start_duration_seconds_count{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance)",
"expr": "sum(rate(kubelet_pod_start_duration_seconds_count{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} pod",
"refId": "A"
},
{
"expr": "sum(rate(kubelet_pod_worker_duration_seconds_count{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance)",
"expr": "sum(rate(kubelet_pod_worker_duration_seconds_count{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} worker",
@ -989,14 +985,14 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_start_duration_seconds_count{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_start_duration_seconds_count{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} pod",
"refId": "A"
},
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} worker",
@ -1102,7 +1098,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(storage_operation_duration_seconds_count{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_name, volume_plugin)",
"expr": "sum(rate(storage_operation_duration_seconds_count{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_name, volume_plugin)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_name}} {{volume_plugin}}",
@ -1195,7 +1191,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(storage_operation_errors_total{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_name, volume_plugin)",
"expr": "sum(rate(storage_operation_errors_total{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_name, volume_plugin)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_name}} {{volume_plugin}}",
@ -1301,7 +1297,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(storage_operation_duration_seconds_bucket{job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, operation_name, volume_plugin, le))",
"expr": "histogram_quantile(0.99, sum(rate(storage_operation_duration_seconds_bucket{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, operation_name, volume_plugin, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_name}} {{volume_plugin}}",
@ -1405,7 +1401,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_cgroup_manager_duration_seconds_count{job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, operation_type)",
"expr": "sum(rate(kubelet_cgroup_manager_duration_seconds_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, operation_type)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{operation_type}}",
@ -1496,7 +1492,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_cgroup_manager_duration_seconds_bucket{job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, operation_type, le))",
"expr": "histogram_quantile(0.99, sum(rate(kubelet_cgroup_manager_duration_seconds_bucket{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, operation_type, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
@ -1601,7 +1597,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_pleg_relist_duration_seconds_count{job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance)",
"expr": "sum(rate(kubelet_pleg_relist_duration_seconds_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -1692,7 +1688,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_interval_seconds_bucket{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_interval_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -1796,7 +1792,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_duration_seconds_bucket{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_duration_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -1900,28 +1896,28 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(rest_client_requests_total{job=\"kubelet\", instance=~\"$instance\",code=~\"2..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\",code=~\"2..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "2xx",
"refId": "A"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kubelet\", instance=~\"$instance\",code=~\"3..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\",code=~\"3..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "3xx",
"refId": "B"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kubelet\", instance=~\"$instance\",code=~\"4..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\",code=~\"4..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "4xx",
"refId": "C"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kubelet\", instance=~\"$instance\",code=~\"5..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\",code=~\"5..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5xx",
@ -2025,7 +2021,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, verb, url, le))",
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, verb, url, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{verb}} {{url}}",
@ -2129,7 +2125,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes{job=\"kubelet\",instance=~\"$instance\"}",
"expr": "process_resident_memory_bytes{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -2220,7 +2216,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(process_cpu_seconds_total{job=\"kubelet\",instance=~\"$instance\"}[5m])",
"expr": "rate(process_cpu_seconds_total{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -2311,7 +2307,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "go_goroutines{job=\"kubelet\",instance=~\"$instance\"}",
"expr": "go_goroutines{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -2395,6 +2391,32 @@ data:
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
@ -2405,10 +2427,10 @@ data:
"options": [
],
"query": "label_values(kubelet_runtime_operations{job=\"kubelet\"}, instance)",
"query": "label_values(kubelet_runtime_operations{cluster=\"$cluster\", job=\"kubelet\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -2453,1367 +2475,6 @@ data:
"uid": "3138fa155d5915769fbded898ac09fd9",
"version": 0
}
nodes.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(node_load1{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "load 1m",
"refId": "A"
},
{
"expr": "max(node_load5{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "load 5m",
"refId": "B"
},
{
"expr": "max(node_load15{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "load 15m",
"refId": "C"
},
{
"expr": "count(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\", mode=\"user\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "logical cores",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "System load",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (cpu) (irate(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", mode!=\"idle\", instance=\"$instance\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cpu}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Usage Per Core",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": "true",
"avg": "true",
"current": "true",
"max": "false",
"min": "false",
"rightSide": "true",
"show": "true",
"total": "false",
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max (sum by (cpu) (irate(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", mode!=\"idle\", instance=\"$instance\"}[2m])) ) * 100\n",
"format": "time_series",
"intervalFactor": 10,
"legendFormat": "{{ cpu }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilization",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percent",
"label": null,
"logBase": 1,
"max": 100,
"min": 0,
"show": true
},
{
"format": "percent",
"label": null,
"logBase": 1,
"max": 100,
"min": 0,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "avg(sum by (cpu) (irate(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", mode!=\"idle\", instance=\"$instance\"}[2m]))) * 100\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "CPU Usage",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(\n node_memory_MemTotal_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_MemFree_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Buffers_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Cached_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory used",
"refId": "A"
},
{
"expr": "max(node_memory_Buffers_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory buffers",
"refId": "B"
},
{
"expr": "max(node_memory_Cached_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory cached",
"refId": "C"
},
{
"expr": "max(node_memory_MemFree_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory free",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 7,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(\n (\n (\n node_memory_MemTotal_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_MemFree_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Buffers_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Cached_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n )\n / node_memory_MemTotal_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n ) * 100)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "Memory Usage",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "read",
"yaxis": 1
},
{
"alias": "io time",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(rate(node_disk_read_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "read",
"refId": "A"
},
{
"expr": "max(rate(node_disk_written_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "written",
"refId": "B"
},
{
"expr": "max(rate(node_disk_io_time_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "io time",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk I/O",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_filesystem_usage:{cluster=\"$cluster\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}} disk used",
"refId": "A"
},
{
"expr": "node:node_filesystem_usage:{cluster=\"$cluster\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}} disk free",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Space Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_receive_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\", device!~\"lo\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Received",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_transmit_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\", device!~\"lo\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Transmitted",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(\n node_filesystem_files{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_filesystem_files_free{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "inodes used",
"refId": "A"
},
{
"expr": "max(node_filesystem_files_free{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "inodes free",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Inodes Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 13,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(\n (\n (\n node_filesystem_files{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_filesystem_files_free{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n )\n / node_filesystem_files{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n ) * 100)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "Inodes Usage",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(node_boot_time_seconds{cluster=\"$cluster\", job=\"node-exporter\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Nodes",
"uid": "fa49a4706d07a042595b664c87fb33ea",
"version": 0
}
proxy.json: |-
{
"__inputs": [
@ -4958,7 +3619,7 @@ data:
"query": "label_values(kubeproxy_network_programming_duration_seconds_bucket{job=\"kube-proxy\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -5003,3 +3664,7 @@ data:
"uid": "632e265de029684c40b21cb76bca4f94",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s-nodes
namespace: monitoring

View File

@ -0,0 +1,5375 @@
apiVersion: v1
data:
k8s-resources-cluster.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "100px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "1 - avg(rate(node_cpu_seconds_total{mode=\"idle\", cluster=\"$cluster\"}[1m]))",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_cpu_cores{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Requests Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_cpu_cores{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Limits Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "1 - sum(:node_memory_MemAvailable_bytes:sum{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_memory_bytes{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_memory_bytes{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Requests Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_memory_bytes{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Limits Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Headlines",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Pods",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workloads",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to workloads",
"linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count(mixin_pod_workload{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "count(avg(mixin_pod_workload{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Pods",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workloads",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to workloads",
"linkUrl": "./d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count(mixin_pod_workload{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "count(avg(mixin_pod_workload{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace) / sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace) / sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Requests by Namespace",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Requests",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 11,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Current Receive Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Current Transmit Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Rate of Received Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Received Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Network Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 12,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 13,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 14,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Container Bandwidth by Namespace: Received",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 15,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Container Bandwidth by Namespace: Transmitted",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 16,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 17,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 18,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 19,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(node_cpu_seconds_total, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Cluster",
"uid": "efa86fd1d0c121a26444b636a3f509a8",
"version": 0
}
k8s-resources-namespace.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Usage (RSS)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Cache)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Swap)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(container_memory_cache{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sum(container_memory_swap{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Current Receive Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Current Transmit Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Rate of Received Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Received Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Network Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 11,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Namespace (Pods)",
"uid": "85a562078cdf77779eaa1add43ccec1e",
"version": 0
}
k8s-resources-node.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", node=\"$node\"}) by (pod) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\", container!=\"\"}) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Usage (RSS)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Cache)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Swap)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_memory_working_set_bytes{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{node=\"$node\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_memory_rss{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_memory_cache{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_memory_swap{cluster=\"$cluster\", node=\"$node\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "node",
"multi": false,
"name": "node",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, node)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Node (Pods)",
"uid": "200ac8fdbfbb74b39aff88118e4d1c2c",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s-resources-1
namespace: monitoring

View File

@ -1,4308 +1,5 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s-resources
namespace: monitoring
data:
k8s-cluster-rsrc-use.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:cluster_cpu_utilisation:ratio{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_cpu_saturation_load1:{cluster=\"$cluster\"} / scalar(sum(min(kube_pod_info{cluster=\"$cluster\"}) by (node)))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Saturation (Load1)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:cluster_memory_utilisation:ratio{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_memory_swap_io_bytes:sum_rate{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Saturation (Swap I/O)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_disk_utilisation:avg_irate{cluster=\"$cluster\"} / scalar(:kube_pod_info_node_count:{cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk IO Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_disk_saturation:avg_irate{cluster=\"$cluster\"} / scalar(:kube_pod_info_node_count:{cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk IO Saturation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Disk",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_net_utilisation:sum_irate{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Net Utilisation (Transmitted)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "node:node_net_saturation:sum_irate{cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Net Saturation (Dropped)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(max(node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\", cluster=\"$cluster\"} - node_filesystem_avail_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\", cluster=\"$cluster\"}) by (device,pod,namespace)) by (pod,namespace)\n/ scalar(sum(max(node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\", cluster=\"$cluster\"}) by (device,pod,namespace)))\n* on (namespace, pod) group_left (node) node_namespace_pod:kube_pod_info:{cluster=\"$cluster\"}\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{node}}",
"legendLink": "/d/4ac4f123aae0ff6dbaf4f4f66120033b/k8s-node-rsrc-use",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Capacity",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Storage",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_node_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / USE Method / Cluster",
"uid": "a6e7d1362e1ddbb79db21d5bb40d7137",
"version": 0
}
k8s-node-rsrc-use.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_cpu_utilisation:avg1m{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Utilisation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_cpu_saturation_load1:{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Saturation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Saturation (Load1)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_memory_utilisation:{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Memory",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_memory_swap_io_bytes:sum_rate{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Swap IO",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Saturation (Swap I/O)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_disk_utilisation:avg_irate{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Utilisation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk IO Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_disk_saturation:avg_irate{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Saturation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk IO Saturation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Disk",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_net_utilisation:sum_irate{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Utilisation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Net Utilisation (Transmitted)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_net_saturation:sum_irate{cluster=\"$cluster\", node=\"$node\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Saturation",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Net Saturation (Dropped)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Net",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_filesystem_usage:{cluster=\"$cluster\"}\n* on (namespace, pod) group_left (node) node_namespace_pod:kube_pod_info:{cluster=\"$cluster\", node=\"$node\"}\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Disk",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_node_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "node",
"multi": false,
"name": "node",
"options": [
],
"query": "label_values(kube_node_info{cluster=\"$cluster\"}, node)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / USE Method / Node",
"uid": "4ac4f123aae0ff6dbaf4f4f66120033b",
"version": 0
}
k8s-resources-cluster.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "100px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "1 - avg(rate(node_cpu_seconds_total{mode=\"idle\", cluster=\"$cluster\"}[1m]))",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_cpu_cores{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Requests Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_cpu_cores{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Limits Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "1 - sum(:node_memory_MemFreeCachedBuffers_bytes:sum{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_memory_bytes{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilisation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_memory_bytes{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Requests Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 2,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) / sum(kube_node_status_allocatable_memory_bytes{cluster=\"$cluster\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Limits Commitment",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Headlines",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Pods",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "/d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workloads",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to workloads",
"linkUrl": "/d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "/d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count(mixin_pod_workload{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "count(avg(mixin_pod_workload{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\"}) by (namespace) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Pods",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "/d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workloads",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": true,
"linkTooltip": "Drill down to workloads",
"linkUrl": "/d/a87fb0d919ec0ea5f6543124e16c42a5/k8s-resources-workloads-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell_1",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "/d/85a562078cdf77779eaa1add43ccec1e/k8s-resources-namespace?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$__cell",
"pattern": "namespace",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count(mixin_pod_workload{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "count(avg(mixin_pod_workload{cluster=\"$cluster\"}) by (workload, namespace)) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace) / sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", container!=\"\"}) by (namespace) / sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\"}) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Requests by Namespace",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Requests",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(node_cpu_seconds_total, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Cluster",
"uid": "efa86fd1d0c121a26444b636a3f509a8",
"version": 0
}
k8s-resources-namespace.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "/d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (w/o cache)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Usage (RSS)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Cache)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Swap",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "/d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(container_memory_cache{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sum(container_memory_swap{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Compute Resources / Namespace (Pods)",
"uid": "85a562078cdf77779eaa1add43ccec1e",
"version": 0
}
k8s-resources-pod.json: |-
{
"annotations": {
@ -4361,7 +58,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", cluster=\"$cluster\"}) by (container)",
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", cluster=\"$cluster\"}) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}}",
@ -4590,7 +287,7 @@ data:
],
"targets": [
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\"}) by (container)",
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -4608,7 +305,7 @@ data:
"step": 10
},
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -4626,7 +323,7 @@ data:
"step": 10
},
{
"expr": "sum(namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -4972,7 +669,7 @@ data:
"unit": "bytes"
},
{
"alias": "Memory Usage (Swap",
"alias": "Memory Usage (Swap)",
"colorMode": null,
"colors": [
@ -5025,7 +722,7 @@ data:
],
"targets": [
{
"expr": "sum(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)",
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -5043,7 +740,7 @@ data:
"step": 10
},
{
"expr": "sum(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -5061,7 +758,7 @@ data:
"step": 10
},
{
"expr": "sum(container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\"}) by (container) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\"}) by (container) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -5145,6 +842,594 @@ data:
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
}
],
"schemaVersion": 14,
@ -5250,6 +1535,41 @@ data:
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
}
]
},
@ -5345,7 +1665,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(\n namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -5548,7 +1868,7 @@ data:
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "/d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
@ -5574,7 +1894,7 @@ data:
],
"targets": [
{
"expr": "sum(\n namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -5592,7 +1912,7 @@ data:
"step": 10
},
{
"expr": "sum(\n namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -5610,7 +1930,7 @@ data:
"step": 10
},
{
"expr": "sum(\n namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -5710,7 +2030,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(\n container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -5913,7 +2233,7 @@ data:
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "/d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
@ -5939,7 +2259,7 @@ data:
],
"targets": [
{
"expr": "sum(\n container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -5957,7 +2277,7 @@ data:
"step": 10
},
{
"expr": "sum(\n container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -5975,7 +2295,7 @@ data:
"step": 10
},
{
"expr": "sum(\n container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6032,6 +2352,1084 @@ data:
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Current Receive Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Current Transmit Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Rate of Received Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Received Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "./d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Network Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Container Bandwidth by Pod: Received",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Container Bandwidth by Pod: Transmitted",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 11,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 12,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 13,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod) \ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
}
],
"schemaVersion": 14,
@ -6164,6 +3562,41 @@ data:
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
}
]
},
@ -6259,7 +3692,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(\n namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}} - {{workload_type}}",
@ -6480,7 +3913,7 @@ data:
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "/d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"pattern": "workload",
"thresholds": [
@ -6524,7 +3957,7 @@ data:
],
"targets": [
{
"expr": "count(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}) by (workload, workload_type)",
"expr": "count(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6533,7 +3966,7 @@ data:
"step": 10
},
{
"expr": "sum(\n namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6542,7 +3975,7 @@ data:
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6551,7 +3984,7 @@ data:
"step": 10
},
{
"expr": "sum(\n namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6560,7 +3993,7 @@ data:
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6569,7 +4002,7 @@ data:
"step": 10
},
{
"expr": "sum(\n namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6669,7 +4102,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(\n container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}} - {{workload_type}}",
@ -6890,7 +4323,7 @@ data:
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "/d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"pattern": "workload",
"thresholds": [
@ -6934,7 +4367,7 @@ data:
],
"targets": [
{
"expr": "count(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}) by (workload, workload_type)",
"expr": "count(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6943,7 +4376,7 @@ data:
"step": 10
},
{
"expr": "sum(\n container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6952,7 +4385,7 @@ data:
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6961,7 +4394,7 @@ data:
"step": 10
},
{
"expr": "sum(\n container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6970,7 +4403,7 @@ data:
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -6979,7 +4412,7 @@ data:
"step": 10
},
{
"expr": "sum(\n container_memory_usage_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}\n) by (workload, workload_type)\n",
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -7036,6 +4469,1102 @@ data:
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Current Receive Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Current Transmit Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Rate of Received Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Received Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Workload",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down to pods",
"linkUrl": "./d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$type",
"pattern": "workload",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workload Type",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "workload_type",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Network Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Container Bandwidth by Workload: Received",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Container Bandwidth by Workload: Transmitted",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 11,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 12,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 13,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod) \ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
}
],
"schemaVersion": 14,
@ -7110,6 +5639,73 @@ data:
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "deployment",
"value": "deployment"
},
"datasource": "$datasource",
"definition": "label_values(mixin_pod_workload{namespace=~\"$namespace\", workload=~\".+\"}, workload_type)",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "type",
"options": [
],
"query": "label_values(mixin_pod_workload{namespace=~\"$namespace\", workload=~\".+\"}, workload_type)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
@ -7151,3 +5747,7 @@ data:
"uid": "a87fb0d919ec0ea5f6543124e16c42a5",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s-resources-2
namespace: monitoring

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s
namespace: monitoring
data:
apiserver.json: |-
{
@ -1240,7 +1236,7 @@ data:
"query": "label_values(apiserver_request_total{job=\"apiserver\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -2351,7 +2347,7 @@ data:
"query": "label_values(process_cpu_seconds_total{job=\"kube-controller-manager\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -2850,7 +2846,7 @@ data:
"query": "label_values(kubelet_volume_stats_capacity_bytes, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -2876,7 +2872,7 @@ data:
"query": "label_values(kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -2902,7 +2898,7 @@ data:
"query": "label_values(kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\"}, persistentvolumeclaim)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -3052,7 +3048,7 @@ data:
"refId": "C"
},
{
"expr": "sum by(container) (container_memory_cache{job=\"kubernetes-cadvisor\", namespace=\"$namespace\", pod=~\"$pod\", container=~\"$container\", container!=\"POD\"})",
"expr": "sum by(container) (container_memory_cache{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\", container=~\"$container\", container!=\"POD\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Cache: {{ container }}",
@ -3482,7 +3478,7 @@ data:
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -3508,7 +3504,7 @@ data:
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -3534,7 +3530,7 @@ data:
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=~\"$namespace\"}, pod)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -3560,7 +3556,7 @@ data:
"query": "label_values(kube_pod_container_info{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}, container)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -4596,7 +4592,7 @@ data:
"query": "label_values(process_cpu_seconds_total{job=\"kube-scheduler\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -5448,7 +5444,7 @@ data:
"query": "label_values(kube_statefulset_metadata_generation, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -5471,10 +5467,10 @@ data:
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\"}, namespace)",
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -5497,10 +5493,10 @@ data:
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", namespace=\"$namespace\"}, statefulset)",
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\"}, statefulset)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -5545,3 +5541,7 @@ data:
"uid": "a31c1f46e6f727cb37c0d731a7245005",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s
namespace: monitoring

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-nginx-ingress
namespace: monitoring
data:
nginx.json: |-
{
@ -26,7 +22,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -94,7 +90,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "round(sum(irate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",namespace=~\"$namespace\"}[2m])), 0.01)",
"expr": "round(sum(irate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\", controller_namespace=~\"$namespace\"}[2m])), 0.01)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -176,7 +172,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(avg_over_time(nginx_ingress_controller_nginx_process_connections{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"expr": "sum(avg_over_time(nginx_ingress_controller_nginx_process_connections{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -258,7 +254,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",namespace=~\"$namespace\",status!~\"[4-5].*\"}[2m])) / sum(rate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",namespace=~\"$namespace\"}[2m]))",
"expr": "sum(rate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",status!~\"[4-5].*\"}[2m])) / sum(rate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -335,7 +331,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "round(sum(irate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (ingress), 0.01)",
"expr": "round(sum(irate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (ingress), 0.01)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ingress}}",
@ -426,7 +422,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",namespace=~\"$namespace\",ingress=~\"$ingress\",status!~\"[4-5].*\"}[2m])) by (ingress) / sum(rate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (ingress)",
"expr": "sum(rate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\",status!~\"[4-5].*\"}[2m])) by (ingress) / sum(rate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (ingress)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ingress}}",
@ -530,21 +526,21 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"expr": "histogram_quantile(0.99, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{cluster=~\"$cluster\", ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ingress}} 99%",
"refId": "A"
},
{
"expr": "histogram_quantile(0.90, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"expr": "histogram_quantile(0.90, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{cluster=~\"$cluster\", ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ingress}} 90%",
"refId": "B"
},
{
"expr": "histogram_quantile(0.50, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"expr": "histogram_quantile(0.50, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{cluster=~\"$cluster\", ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ingress}} 50%",
@ -648,14 +644,14 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum (irate (nginx_ingress_controller_request_size_sum{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"expr": "sum (irate (nginx_ingress_controller_request_size_sum{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "received",
"refId": "A"
},
{
"expr": "sum (irate (nginx_ingress_controller_response_size_sum{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"expr": "sum (irate (nginx_ingress_controller_response_size_sum{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "sent",
@ -746,7 +742,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "avg(nginx_ingress_controller_nginx_process_resident_memory_bytes{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}) by (controller_pod)",
"expr": "avg(nginx_ingress_controller_nginx_process_resident_memory_bytes{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}) by (controller_pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{controller_pod}}",
@ -837,7 +833,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(nginx_ingress_controller_nginx_process_cpu_seconds_total{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m])) by (controller_pod)",
"expr": "sum(rate(nginx_ingress_controller_nginx_process_cpu_seconds_total{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m])) by (controller_pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{controller_pod}}",
@ -921,6 +917,32 @@ data:
"allValue": ".*",
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": true,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".*",
"current": {
},
"datasource": "$datasource",
"hide": 0,
@ -931,7 +953,7 @@ data:
"options": [
],
"query": "label_values(nginx_ingress_controller_config_hash, controller_namespace)",
"query": "label_values(nginx_ingress_controller_config_hash{cluster=~\"$cluster\"}, controller_namespace)",
"refresh": 2,
"regex": "",
"sort": 0,
@ -957,7 +979,7 @@ data:
"options": [
],
"query": "label_values(nginx_ingress_controller_config_hash{namespace=~\"$namespace\"}, controller_class)",
"query": "label_values(nginx_ingress_controller_config_hash{cluster=~\"$cluster\", namespace=~\"$namespace\"}, controller_class)",
"refresh": 2,
"regex": "",
"sort": 0,
@ -983,7 +1005,7 @@ data:
"options": [
],
"query": "label_values(nginx_ingress_controller_config_hash{namespace=~\"$namespace\",controller_class=~\"$controller_class\"}, controller_pod)",
"query": "label_values(nginx_ingress_controller_config_hash{cluster=~\"$cluster\", namespace=~\"$namespace\", controller_class=~\"$controller_class\"}, controller_pod)",
"refresh": 2,
"regex": "",
"sort": 0,
@ -1009,7 +1031,7 @@ data:
"options": [
],
"query": "label_values(nginx_ingress_controller_requests{namespace=~\"$namespace\",controller_class=~\"$controller_class\",controller=~\"$controller\"}, ingress)",
"query": "label_values(nginx_ingress_controller_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", controller_class=~\"$controller_class\", controller=~\"$controller\"}, ingress)",
"refresh": 2,
"regex": "",
"sort": 0,
@ -1057,3 +1079,7 @@ data:
"uid": "f4af03eca476c08ecf2b5cf15fd60168",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-nginx-ingress
namespace: monitoring

View File

@ -0,0 +1,968 @@
apiVersion: v1
data:
nodes.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(\n (1 - rate(node_cpu_seconds_total{job=\"node-exporter\", mode=\"idle\", instance=\"$instance\"}[$__interval]))\n/ ignoring(cpu) group_left\n count without (cpu)( node_cpu_seconds_total{job=\"node-exporter\", mode=\"idle\", instance=\"$instance\"})\n)\n",
"format": "time_series",
"interval": "1m",
"intervalFactor": 5,
"legendFormat": "{{cpu}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node_load1{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "1m load average",
"refId": "A"
},
{
"expr": "node_load5{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5m load average",
"refId": "B"
},
{
"expr": "node_load15{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "15m load average",
"refId": "C"
},
{
"expr": "count(node_cpu_seconds_total{job=\"node-exporter\", instance=\"$instance\", mode=\"idle\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "logical cores",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Load Average",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(\n node_memory_MemTotal_bytes{job=\"node-exporter\", instance=\"$instance\"}\n-\n node_memory_MemFree_bytes{job=\"node-exporter\", instance=\"$instance\"}\n-\n node_memory_Buffers_bytes{job=\"node-exporter\", instance=\"$instance\"}\n-\n node_memory_Cached_bytes{job=\"node-exporter\", instance=\"$instance\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory used",
"refId": "A"
},
{
"expr": "node_memory_Buffers_bytes{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory buffers",
"refId": "B"
},
{
"expr": "node_memory_Cached_bytes{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory cached",
"refId": "C"
},
{
"expr": "node_memory_MemFree_bytes{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory free",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "100 -\n(\n node_memory_MemAvailable_bytes{job=\"node-exporter\", instance=\"$instance\"}\n/\n node_memory_MemTotal_bytes{job=\"node-exporter\", instance=\"$instance\"}\n* 100\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "Memory Usage",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"gridPos": {
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "/ read| written/",
"yaxis": 1
},
{
"alias": "/ io time/",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_disk_read_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!~\"dm.*\"}[$__interval])",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "{{device}} read",
"refId": "A"
},
{
"expr": "rate(node_disk_written_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!~\"dm.*\"}[$__interval])",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "{{device}} written",
"refId": "B"
},
{
"expr": "rate(node_disk_io_time_seconds_total{job=\"node-exporter\", instance=\"$instance\", device!~\"dm.*\"}[$__interval])",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "{{device}} io time",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk I/O",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 7,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "used",
"color": "#E0B400"
},
{
"alias": "available",
"color": "#73BF69"
}
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n max by (device) (\n node_filesystem_size_bytes{job=\"node-exporter\", instance=\"$instance\", fstype!~\"tmpfs|nsfs|vfat\"}\n -\n node_filesystem_avail_bytes{job=\"node-exporter\", instance=\"$instance\", fstype!~\"tmpfs|nsfs|vfat\"}\n )\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "used",
"refId": "A"
},
{
"expr": "sum(\n max by (device) (\n node_filesystem_avail_bytes{job=\"node-exporter\", instance=\"$instance\", fstype!~\"tmpfs|nsfs|vfat\"}\n )\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "available",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Space Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_receive_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!=\"lo\"}[$__interval])",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Received",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_transmit_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!=\"lo\"}[$__interval])",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Transmitted",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(node_exporter_build_info{job=\"node-exporter\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Nodes",
"uid": "fa49a4706d07a042595b664c87fb33ea",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-node-exporter
namespace: monitoring

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-prom
namespace: monitoring
data:
prometheus-remote-write.json: |-
{
@ -2158,3 +2154,7 @@ data:
"uid": "",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-prom
namespace: monitoring

View File

@ -23,7 +23,7 @@ spec:
spec:
containers:
- name: grafana
image: docker.io/grafana/grafana:6.3.5
image: docker.io/grafana/grafana:6.5.1
env:
- name: GF_PATHS_CONFIG
value: "/etc/grafana/custom.ini"
@ -56,14 +56,18 @@ spec:
mountPath: /etc/grafana/provisioning/dashboards
- name: dashboards-etcd
mountPath: /etc/grafana/dashboards/etcd
- name: dashboards-node-exporter
mountPath: /etc/grafana/dashboards/node-exporter
- name: dashboards-prom
mountPath: /etc/grafana/dashboards/prom
- name: dashboards-k8s
mountPath: /etc/grafana/dashboards/k8s
- name: dashboards-k8s-nodes
mountPath: /etc/grafana/dashboards/k8s-nodes
- name: dashboards-k8s-resources
mountPath: /etc/grafana/dashboards/k8s-resources
- name: dashboards-k8s-resources-1
mountPath: /etc/grafana/dashboards/k8s-resources-1
- name: dashboards-k8s-resources-2
mountPath: /etc/grafana/dashboards/k8s-resources-2
- name: dashboards-coredns
mountPath: /etc/grafana/dashboards/coredns
- name: dashboards-nginx-ingress
@ -81,6 +85,9 @@ spec:
- name: dashboards-etcd
configMap:
name: grafana-dashboards-etcd
- name: dashboards-node-exporter
configMap:
name: grafana-dashboards-node-exporter
- name: dashboards-prom
configMap:
name: grafana-dashboards-prom
@ -90,9 +97,12 @@ spec:
- name: dashboards-k8s-nodes
configMap:
name: grafana-dashboards-k8s-nodes
- name: dashboards-k8s-resources
- name: dashboards-k8s-resources-1
configMap:
name: grafana-dashboards-k8s-resources
name: grafana-dashboards-k8s-resources-1
- name: dashboards-k8s-resources-2
configMap:
name: grafana-dashboards-k8s-resources-2
- name: dashboards-coredns
configMap:
name: grafana-dashboards-coredns

View File

@ -22,7 +22,7 @@ spec:
spec:
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1
args:
- /nginx-ingress-controller
- --ingress-class=public
@ -65,6 +65,11 @@ spec:
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
securityContext:
capabilities:
add:
@ -73,4 +78,4 @@ spec:
- ALL
runAsUser: 33 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 60
terminationGracePeriodSeconds: 300

View File

@ -22,7 +22,7 @@ spec:
spec:
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1
args:
- /nginx-ingress-controller
- --ingress-class=public
@ -65,6 +65,11 @@ spec:
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
securityContext:
capabilities:
add:
@ -73,4 +78,4 @@ spec:
- ALL
runAsUser: 33 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 60
terminationGracePeriodSeconds: 300

View File

@ -22,7 +22,7 @@ spec:
spec:
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1
args:
- /nginx-ingress-controller
- --ingress-class=public
@ -62,6 +62,11 @@ spec:
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
securityContext:
capabilities:
add:
@ -70,5 +75,5 @@ spec:
- ALL
runAsUser: 33 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 60
terminationGracePeriodSeconds: 300

View File

@ -22,7 +22,7 @@ spec:
spec:
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1
args:
- /nginx-ingress-controller
- --ingress-class=public
@ -65,6 +65,11 @@ spec:
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
securityContext:
capabilities:
add:
@ -73,4 +78,4 @@ spec:
- ALL
runAsUser: 33 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 60
terminationGracePeriodSeconds: 300

View File

@ -22,7 +22,7 @@ spec:
spec:
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1
args:
- /nginx-ingress-controller
- --ingress-class=public
@ -65,6 +65,11 @@ spec:
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
securityContext:
capabilities:
add:
@ -73,4 +78,4 @@ spec:
- ALL
runAsUser: 33 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 60
terminationGracePeriodSeconds: 300

View File

@ -65,6 +65,9 @@ data:
- source_labels: [__name__]
action: drop
regex: apiserver_admission_step_admission_latencies_seconds_.*
- source_labels: [__name__, group]
regex: apiserver_request_duration_seconds_bucket;.+
action: drop
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
# metrics from a node by scraping kubelet (127.0.0.1:10250/metrics).
@ -81,7 +84,7 @@ data:
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
regex: __meta_kubernetes_node_name
# Scrape config for Kubelet cAdvisor. Explore metrics from a node by
# scraping kubelet (127.0.0.1:10250/metrics/cadvisor).
@ -99,7 +102,7 @@ data:
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
regex: __meta_kubernetes_node_name
metric_relabel_configs:
- source_labels: [__name__, image]
action: drop
@ -115,15 +118,15 @@ data:
- role: node
scheme: http
relabel_configs:
- source_labels: [__meta_kubernetes_node_label_node_role_kubernetes_io_controller]
action: keep
regex: 'true'
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_address_InternalIP]
action: replace
target_label: __address__
replacement: '${1}:2381'
- source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_controller]
action: keep
regex: 'true'
- action: labelmap
regex: __meta_kubernetes_node_name
- source_labels: [__meta_kubernetes_node_address_InternalIP]
action: replace
target_label: __address__
replacement: '${1}:2381'
# Scrape config for service endpoints.
#

View File

@ -20,7 +20,7 @@ spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: quay.io/prometheus/prometheus:v2.12.0
image: quay.io/prometheus/prometheus:v2.14.0
args:
- --web.listen-address=0.0.0.0:9090
- --config.file=/etc/prometheus/prometheus.yaml

View File

@ -24,7 +24,7 @@ spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.7.2
image: quay.io/coreos/kube-state-metrics:v1.8.0
ports:
- name: metrics
containerPort: 8080
@ -37,33 +37,6 @@ spec:
readinessProbe:
httpGet:
path: /
port: 8080
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
- name: addon-resizer
image: k8s.gcr.io/addon-resizer:1.8.5
resources:
limits:
cpu: 100m
memory: 30Mi
requests:
cpu: 100m
memory: 30Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --container=kube-state-metrics
- --cpu=100m
- --extra-cpu=1m
- --memory=100Mi
- --extra-memory=2Mi
- --threshold=5
- --deployment=kube-state-metrics

View File

@ -1,13 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kube-state-metrics
namespace: monitoring
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: monitoring

View File

@ -1,31 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kube-state-metrics
namespace: monitoring
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- extensions
resources:
- deployments
resourceNames:
- kube-state-metrics
verbs:
- get
- update
- apiGroups:
- apps
resources:
- deployments
resourceNames:
- kube-state-metrics
verbs:
- get
- update

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: monitoring
data:
etcd.yaml: |-
{
@ -10,6 +6,17 @@ data:
{
"name": "etcd",
"rules": [
{
"alert": "etcdMembersDown",
"annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": members are down ({{ $value }})."
},
"expr": "max by (job) (\n sum by (job) (up{job=~\".*etcd.*\"} == bool 0)\nor\n count by (job,endpoint) (\n sum by (job,endpoint,To) (rate(etcd_network_peer_sent_failures_total{job=~\".*etcd.*\"}[3m])) > 0.01\n )\n)\n> 0\n",
"for": "3m",
"labels": {
"severity": "critical"
}
},
{
"alert": "etcdInsufficientMembers",
"annotations": {
@ -135,30 +142,35 @@ data:
}
]
}
extra.yaml: |-
{
"groups": [
{
"name": "extra.rules",
"rules": [
{
"alert": "InactiveRAIDDisk",
"annotations": {
"message": "{{ $value }} RAID disk(s) on node {{ $labels.instance }} are inactive."
},
"expr": "node_md_disks - node_md_disks_active > 0",
"for": "10m",
"labels": {
"severity": "warning"
}
}
]
}
]
}
kube.yaml: |-
{
"groups": [
{
"name": "kube-apiserver.rules",
"rules": [
{
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.99"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.9"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.5, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.5"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
}
]
},
{
"name": "k8s.rules",
"rules": [
@ -167,27 +179,35 @@ data:
"record": "namespace:container_cpu_usage_seconds_total:sum_rate"
},
{
"expr": "sum by (namespace, pod, container) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])\n)\n",
"record": "namespace_pod_container:container_cpu_usage_seconds_total:sum_rate"
"expr": "sum by (namespace, pod, container) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])\n) * on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n",
"record": "node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate"
},
{
"expr": "container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n",
"record": "node_namespace_pod_container:container_memory_working_set_bytes"
},
{
"expr": "container_memory_rss{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n",
"record": "node_namespace_pod_container:container_memory_rss"
},
{
"expr": "container_memory_cache{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n",
"record": "node_namespace_pod_container:container_memory_cache"
},
{
"expr": "container_memory_swap{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n",
"record": "node_namespace_pod_container:container_memory_swap"
},
{
"expr": "sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}) by (namespace)\n",
"record": "namespace:container_memory_usage_bytes:sum"
},
{
"expr": "sum by (namespace, label_name) (\n sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"record": "namespace:container_cpu_usage_seconds_total:sum_rate"
},
{
"expr": "sum by (namespace, label_name) (\n sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\",image!=\"\", container!=\"POD\"}) by (pod, namespace)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"record": "namespace:container_memory_usage_bytes:sum"
},
{
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~\"^(Pending|Running)$\"} == 1)) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~\"Pending|Running\"} == 1)) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"record": "namespace:kube_pod_container_resource_requests_memory_bytes:sum"
},
{
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~\"^(Pending|Running)$\"} == 1)) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~\"Pending|Running\"} == 1)) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"record": "namespace:kube_pod_container_resource_requests_cpu_cores:sum"
},
{
@ -281,32 +301,6 @@ data:
}
]
},
{
"name": "kube-apiserver.rules",
"rules": [
{
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.99"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.9"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.5, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.5"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
}
]
},
{
"name": "node.rules",
"rules": [
@ -323,169 +317,8 @@ data:
"record": "node:node_num_cpu:sum"
},
{
"expr": "1 - avg(rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m]))\n",
"record": ":node_cpu_utilisation:avg1m"
},
{
"expr": "1 - avg by (node) (\n rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m])\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:)\n",
"record": "node:node_cpu_utilisation:avg1m"
},
{
"expr": "node:node_cpu_utilisation:avg1m\n *\nnode:node_num_cpu:sum\n /\nscalar(sum(node:node_num_cpu:sum))\n",
"record": "node:cluster_cpu_utilisation:ratio"
},
{
"expr": "sum(node_load1{job=\"node-exporter\"})\n/\nsum(node:node_num_cpu:sum)\n",
"record": ":node_cpu_saturation_load1:"
},
{
"expr": "sum by (node) (\n node_load1{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n/\nnode:node_num_cpu:sum\n",
"record": "node:node_cpu_saturation_load1:"
},
{
"expr": "1 -\nsum(node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n/\nsum(node_memory_MemTotal_bytes{job=\"node-exporter\"})\n",
"record": ":node_memory_utilisation:"
},
{
"expr": "sum(node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n",
"record": ":node_memory_MemFreeCachedBuffers_bytes:sum"
},
{
"expr": "sum(node_memory_MemTotal_bytes{job=\"node-exporter\"})\n",
"record": ":node_memory_MemTotal_bytes:sum"
},
{
"expr": "sum by (node) (\n (node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_memory_bytes_available:sum"
},
{
"expr": "sum by (node) (\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_memory_bytes_total:sum"
},
{
"expr": "(node:node_memory_bytes_total:sum - node:node_memory_bytes_available:sum)\n/\nnode:node_memory_bytes_total:sum\n",
"record": "node:node_memory_utilisation:ratio"
},
{
"expr": "(node:node_memory_bytes_total:sum - node:node_memory_bytes_available:sum)\n/\nscalar(sum(node:node_memory_bytes_total:sum))\n",
"record": "node:cluster_memory_utilisation:ratio"
},
{
"expr": "1e3 * sum(\n (rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m])\n + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n)\n",
"record": ":node_memory_swap_io_bytes:sum_rate"
},
{
"expr": "1 -\nsum by (node) (\n (node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n/\nsum by (node) (\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_memory_utilisation:"
},
{
"expr": "1 - (node:node_memory_bytes_available:sum / node:node_memory_bytes_total:sum)\n",
"record": "node:node_memory_utilisation_2:"
},
{
"expr": "1e3 * sum by (node) (\n (rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m])\n + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_memory_swap_io_bytes:sum_rate"
},
{
"expr": "avg(irate(node_disk_io_time_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m]))\n",
"record": ":node_disk_utilisation:avg_irate"
},
{
"expr": "avg by (node) (\n irate(node_disk_io_time_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m])\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_disk_utilisation:avg_irate"
},
{
"expr": "avg(irate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m]))\n",
"record": ":node_disk_saturation:avg_irate"
},
{
"expr": "avg by (node) (\n irate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m])\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_disk_saturation:avg_irate"
},
{
"expr": "max by (instance, namespace, pod, device) ((node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"}\n- node_filesystem_avail_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n/ node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n",
"record": "node:node_filesystem_usage:"
},
{
"expr": "max by (instance, namespace, pod, device) (node_filesystem_avail_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"} / node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n",
"record": "node:node_filesystem_avail:"
},
{
"expr": "sum(irate(node_network_receive_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m])) +\nsum(irate(node_network_transmit_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n",
"record": ":node_net_utilisation:sum_irate"
},
{
"expr": "sum by (node) (\n (irate(node_network_receive_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]) +\n irate(node_network_transmit_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_net_utilisation:sum_irate"
},
{
"expr": "sum(irate(node_network_receive_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m])) +\nsum(irate(node_network_transmit_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n",
"record": ":node_net_saturation:sum_irate"
},
{
"expr": "sum by (node) (\n (irate(node_network_receive_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]) +\n irate(node_network_transmit_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_net_saturation:sum_irate"
},
{
"expr": "max(\n max(\n kube_pod_info{job=\"kube-state-metrics\", host_ip!=\"\"}\n ) by (node, host_ip)\n * on (host_ip) group_right (node)\n label_replace(\n (max(node_filesystem_files{job=\"node-exporter\", mountpoint=\"/\"}) by (instance)), \"host_ip\", \"$1\", \"instance\", \"(.*):.*\"\n )\n) by (node)\n",
"record": "node:node_inodes_total:"
},
{
"expr": "max(\n max(\n kube_pod_info{job=\"kube-state-metrics\", host_ip!=\"\"}\n ) by (node, host_ip)\n * on (host_ip) group_right (node)\n label_replace(\n (max(node_filesystem_files_free{job=\"node-exporter\", mountpoint=\"/\"}) by (instance)), \"host_ip\", \"$1\", \"instance\", \"(.*):.*\"\n )\n) by (node)\n",
"record": "node:node_inodes_free:"
}
]
},
{
"name": "kubernetes-absent",
"rules": [
{
"alert": "KubeAPIDown",
"annotations": {
"message": "KubeAPI has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown"
},
"expr": "absent(up{job=\"apiserver\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeControllerManagerDown",
"annotations": {
"message": "KubeControllerManager has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown"
},
"expr": "absent(up{job=\"kube-controller-manager\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeSchedulerDown",
"annotations": {
"message": "KubeScheduler has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown"
},
"expr": "absent(up{job=\"kube-scheduler\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeletDown",
"annotations": {
"message": "Kubelet has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown"
},
"expr": "absent(up{job=\"kubelet\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
"expr": "sum(\n node_memory_MemAvailable_bytes{job=\"node-exporter\"} or\n (\n node_memory_Buffers_bytes{job=\"node-exporter\"} +\n node_memory_Cached_bytes{job=\"node-exporter\"} +\n node_memory_MemFree_bytes{job=\"node-exporter\"} +\n node_memory_Slab_bytes{job=\"node-exporter\"}\n )\n)\n",
"record": ":node_memory_MemAvailable_bytes:sum"
}
]
},
@ -499,7 +332,7 @@ data:
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping"
},
"expr": "rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\"}[15m]) * 60 * 5 > 0\n",
"for": "1h",
"for": "15m",
"labels": {
"severity": "critical"
}
@ -507,11 +340,11 @@ data:
{
"alert": "KubePodNotReady",
"annotations": {
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than an hour.",
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"
},
"expr": "sum by (namespace, pod) (kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Failed|Pending|Unknown\"}) > 0\n",
"for": "1h",
"expr": "sum by (namespace, pod) (kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Failed|Pending|Unknown\"} * on(namespace, pod) group_left(owner_kind) kube_pod_owner{owner_kind!=\"Job\"}) > 0\n",
"for": "15m",
"labels": {
"severity": "critical"
}
@ -531,11 +364,11 @@ data:
{
"alert": "KubeDeploymentReplicasMismatch",
"annotations": {
"message": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than an hour.",
"message": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"
},
"expr": "kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n !=\nkube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n",
"for": "1h",
"for": "15m",
"labels": {
"severity": "critical"
}
@ -579,15 +412,27 @@ data:
{
"alert": "KubeDaemonSetRolloutStuck",
"annotations": {
"message": "Only {{ $value }}% of the desired Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are scheduled and ready.",
"message": "Only {{ $value | humanizePercentage }} of the desired Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are scheduled and ready.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck"
},
"expr": "kube_daemonset_status_number_ready{job=\"kube-state-metrics\"}\n /\nkube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"} * 100 < 100\n",
"expr": "kube_daemonset_status_number_ready{job=\"kube-state-metrics\"}\n /\nkube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"} < 1.00\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeContainerWaiting",
"annotations": {
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{ $labels.container}} has been in waiting state for longer than 1 hour.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontainerwaiting"
},
"expr": "sum by (namespace, pod, container) (kube_pod_container_status_waiting_reason{job=\"kube-state-metrics\"}) > 0\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeDaemonSetNotScheduled",
"annotations": {
@ -642,8 +487,32 @@ data:
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobfailed"
},
"expr": "kube_job_status_failed{job=\"kube-state-metrics\"} > 0\n",
"for": "1h",
"expr": "kube_job_failed{job=\"kube-state-metrics\"} > 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeHpaReplicasMismatch",
"annotations": {
"message": "HPA {{ $labels.namespace }}/{{ $labels.hpa }} has not matched the desired number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpareplicasmismatch"
},
"expr": "(kube_hpa_status_desired_replicas{job=\"kube-state-metrics\"}\n !=\nkube_hpa_status_current_replicas{job=\"kube-state-metrics\"})\n and\nchanges(kube_hpa_status_current_replicas[15m]) == 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeHpaMaxedOut",
"annotations": {
"message": "HPA {{ $labels.namespace }}/{{ $labels.hpa }} has been running at max replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpamaxedout"
},
"expr": "kube_hpa_status_current_replicas{job=\"kube-state-metrics\"}\n ==\nkube_hpa_spec_max_replicas{job=\"kube-state-metrics\"}\n",
"for": "15m",
"labels": {
"severity": "warning"
}
@ -704,10 +573,10 @@ data:
{
"alert": "KubeQuotaExceeded",
"annotations": {
"message": "Namespace {{ $labels.namespace }} is using {{ printf \"%0.0f\" $value }}% of its {{ $labels.resource }} quota.",
"message": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded"
},
"expr": "100 * kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 90\n",
"expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 0.90\n",
"for": "15m",
"labels": {
"severity": "warning"
@ -716,10 +585,10 @@ data:
{
"alert": "CPUThrottlingHigh",
"annotations": {
"message": "{{ printf \"%0.0f\" $value }}% throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.",
"message": "{{ $value | humanizePercentage }} throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh"
},
"expr": "100 * sum(increase(container_cpu_cfs_throttled_periods_total{container!=\"\", }[5m])) by (container, pod, namespace)\n /\nsum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)\n > 100 \n",
"expr": "sum(increase(container_cpu_cfs_throttled_periods_total{container!=\"\", }[5m])) by (container, pod, namespace)\n /\nsum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)\n > ( 100 / 100 )\n",
"for": "15m",
"labels": {
"severity": "warning"
@ -733,10 +602,10 @@ data:
{
"alert": "KubePersistentVolumeUsageCritical",
"annotations": {
"message": "The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is only {{ printf \"%0.2f\" $value }}% free.",
"message": "The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is only {{ $value | humanizePercentage }} free.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeusagecritical"
},
"expr": "100 * kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\nkubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n < 3\n",
"expr": "kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\nkubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n < 0.03\n",
"for": "1m",
"labels": {
"severity": "critical"
@ -745,10 +614,10 @@ data:
{
"alert": "KubePersistentVolumeFullInFourDays",
"annotations": {
"message": "Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is expected to fill up within four days. Currently {{ printf \"%0.2f\" $value }}% is available.",
"message": "Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is expected to fill up within four days. Currently {{ $value | humanizePercentage }} is available.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefullinfourdays"
},
"expr": "100 * (\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 15\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\n",
"expr": "(\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 0.15\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\n",
"for": "5m",
"labels": {
"severity": "critical"
@ -771,37 +640,13 @@ data:
{
"name": "kubernetes-system",
"rules": [
{
"alert": "KubeNodeNotReady",
"annotations": {
"message": "{{ $labels.node }} has been unready for more than an hour.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready"
},
"expr": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeVersionMismatch",
"annotations": {
"message": "There are {{ $value }} different semantic versions of Kubernetes components running.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch"
},
"expr": "count(count by (gitVersion) (label_replace(kubernetes_build_info{job!=\"coredns\"},\"gitVersion\",\"$1\",\"gitVersion\",\"(v[0-9]*.[0-9]*.[0-9]*).*\"))) > 1\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeClientErrors",
"annotations": {
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ printf \"%0.0f\" $value }}% errors.'",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors"
},
"expr": "(sum(rate(rest_client_requests_total{code=~\"5..\"}[5m])) by (instance, job)\n /\nsum(rate(rest_client_requests_total[5m])) by (instance, job))\n* 100 > 1\n",
"expr": "count(count by (gitVersion) (label_replace(kubernetes_build_info{job!~\"kube-dns|coredns\"},\"gitVersion\",\"$1\",\"gitVersion\",\"(v[0-9]*.[0-9]*.[0-9]*).*\"))) > 1\n",
"for": "15m",
"labels": {
"severity": "warning"
@ -810,34 +655,27 @@ data:
{
"alert": "KubeClientErrors",
"annotations": {
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ printf \"%0.0f\" $value }} errors / second.",
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ $value | humanizePercentage }} errors.'",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors"
},
"expr": "sum(rate(ksm_scrape_error_total{job=\"kube-state-metrics\"}[5m])) by (instance, job) > 0.1\n",
"expr": "(sum(rate(rest_client_requests_total{code=~\"5..\"}[5m])) by (instance, job)\n /\nsum(rate(rest_client_requests_total[5m])) by (instance, job))\n> 0.01\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletTooManyPods",
"annotations": {
"message": "Kubelet {{ $labels.instance }} is running {{ $value }} Pods, close to the limit of 110.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods"
},
"expr": "kubelet_running_pod_count{job=\"kubelet\"} > 110 * 0.9\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
}
]
},
{
"name": "kubernetes-system-apiserver",
"rules": [
{
"alert": "KubeAPILatencyHigh",
"annotations": {
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
},
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"^(?:LIST|WATCH|WATCHLIST|PROXY|CONNECT)$\"} > 1\n",
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|PROXY|CONNECT\"} > 1\n",
"for": "10m",
"labels": {
"severity": "warning"
@ -849,7 +687,7 @@ data:
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
},
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"^(?:LIST|WATCH|WATCHLIST|PROXY|CONNECT)$\"} > 4\n",
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|PROXY|CONNECT\"} > 4\n",
"for": "10m",
"labels": {
"severity": "critical"
@ -858,10 +696,10 @@ data:
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests.",
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m]))\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) * 100 > 3\n",
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m]))\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) > 0.03\n",
"for": "10m",
"labels": {
"severity": "critical"
@ -870,10 +708,10 @@ data:
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests.",
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m]))\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) * 100 > 1\n",
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m]))\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) > 0.01\n",
"for": "10m",
"labels": {
"severity": "warning"
@ -882,10 +720,10 @@ data:
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (resource,subresource,verb) * 100 > 10\n",
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (resource,subresource,verb) > 0.10\n",
"for": "10m",
"labels": {
"severity": "critical"
@ -894,10 +732,10 @@ data:
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (resource,subresource,verb) * 100 > 5\n",
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (resource,subresource,verb) > 0.05\n",
"for": "10m",
"labels": {
"severity": "warning"
@ -924,68 +762,15 @@ data:
"labels": {
"severity": "critical"
}
}
]
}
]
}
kubeprom.yaml: |-
{
"groups": [
{
"name": "kube-prometheus-node-recording.rules",
"rules": [
{
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[3m])) BY (instance)",
"record": "instance:node_cpu:rate:sum"
},
{
"expr": "sum((node_filesystem_size_bytes{mountpoint=\"/\"} - node_filesystem_free_bytes{mountpoint=\"/\"})) BY (instance)",
"record": "instance:node_filesystem_usage:sum"
},
{
"expr": "sum(rate(node_network_receive_bytes_total[3m])) BY (instance)",
"record": "instance:node_network_receive_bytes:rate:sum"
},
{
"expr": "sum(rate(node_network_transmit_bytes_total[3m])) BY (instance)",
"record": "instance:node_network_transmit_bytes:rate:sum"
},
{
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[5m])) WITHOUT (cpu, mode) / ON(instance) GROUP_LEFT() count(sum(node_cpu_seconds_total) BY (instance, cpu)) BY (instance)",
"record": "instance:node_cpu:ratio"
},
{
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[5m]))",
"record": "cluster:node_cpu:sum_rate5m"
},
{
"expr": "cluster:node_cpu_seconds_total:rate5m / count(sum(node_cpu_seconds_total) BY (instance, cpu))",
"record": "cluster:node_cpu:ratio"
}
]
},
{
"name": "kube-prometheus-node-alerting.rules",
"rules": [
{
"alert": "NodeDiskRunningFull",
"alert": "KubeAPIDown",
"annotations": {
"message": "Device {{ $labels.device }} of node-exporter {{ $labels.namespace }}/{{ $labels.pod }} will be full within the next 24 hours."
"message": "KubeAPI has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown"
},
"expr": "(node:node_filesystem_usage: > 0.85) and (predict_linear(node:node_filesystem_avail:[6h], 3600 * 24) < 0)\n",
"for": "30m",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeDiskRunningFull",
"annotations": {
"message": "Device {{ $labels.device }} of node-exporter {{ $labels.namespace }}/{{ $labels.pod }} will be full within the next 2 hours."
},
"expr": "(node:node_filesystem_usage: > 0.85) and (predict_linear(node:node_filesystem_avail:[30m], 3600 * 2) < 0)\n",
"for": "10m",
"expr": "absent(up{job=\"apiserver\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
@ -993,69 +778,215 @@ data:
]
},
{
"name": "node-time",
"name": "kubernetes-system-kubelet",
"rules": [
{
"alert": "ClockSkewDetected",
"alert": "KubeNodeNotReady",
"annotations": {
"message": "Clock skew detected on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}. Ensure NTP is configured correctly on this host."
"message": "{{ $labels.node }} has been unready for more than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready"
},
"expr": "abs(node_timex_offset_seconds{job=\"node-exporter\"}) > 0.03\n",
"for": "2m",
"labels": {
"severity": "warning"
}
}
]
},
{
"name": "node-network",
"rules": [
{
"alert": "NetworkReceiveErrors",
"annotations": {
"message": "Network interface \"{{ $labels.device }}\" showing receive errors on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}\""
},
"expr": "rate(node_network_receive_errs_total{job=\"node-exporter\",device!~\"veth.+|tunl.+\"}[2m]) > 0\n",
"for": "2m",
"expr": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "NetworkTransmitErrors",
"alert": "KubeNodeUnreachable",
"annotations": {
"message": "Network interface \"{{ $labels.device }}\" showing transmit errors on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}\""
"message": "{{ $labels.node }} is unreachable and some workloads may be rescheduled.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodeunreachable"
},
"expr": "rate(node_network_transmit_errs_total{job=\"node-exporter\",device!~\"veth.+|tunl.+\"}[2m]) > 0\n",
"for": "2m",
"expr": "kube_node_spec_taint{job=\"kube-state-metrics\",key=\"node.kubernetes.io/unreachable\",effect=\"NoSchedule\"} == 1\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeNetworkInterfaceFlapping",
"alert": "KubeletTooManyPods",
"annotations": {
"message": "Network interface \"{{ $labels.device }}\" changing it's up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}\""
"message": "Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage }} of its Pod capacity.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods"
},
"expr": "changes(node_network_up{job=\"node-exporter\",device!~\"veth.+|tunl.+\"}[2m]) > 2\n",
"for": "2m",
"expr": "max(max(kubelet_running_pod_count{job=\"kubelet\"}) by(instance) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"}) by(node) / max(kube_node_status_capacity_pods{job=\"kube-state-metrics\"}) by(node) > 0.95\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletDown",
"annotations": {
"message": "Kubelet has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown"
},
"expr": "absent(up{job=\"kubelet\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
}
]
},
{
"name": "general.rules",
"name": "kubernetes-system-scheduler",
"rules": [
{
"alert": "TargetDown",
"alert": "KubeSchedulerDown",
"annotations": {
"message": "{{ $value }}% of the {{ $labels.job }} targets are down."
"message": "KubeScheduler has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown"
},
"expr": "100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10",
"for": "10m",
"expr": "absent(up{job=\"kube-scheduler\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
}
]
},
{
"name": "kubernetes-system-controller-manager",
"rules": [
{
"alert": "KubeControllerManagerDown",
"annotations": {
"message": "KubeControllerManager has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown"
},
"expr": "absent(up{job=\"kube-controller-manager\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
}
]
}
]
}
node-exporter.yaml: |-
{
"groups": [
{
"name": "node-exporter",
"rules": [
{
"alert": "NodeFilesystemSpaceFillingUp",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left and is filling up.",
"summary": "Filesystem is predicted to run out of space within the next 24 hours."
},
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 40\nand\n predict_linear(node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 24*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFilesystemSpaceFillingUp",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left and is filling up fast.",
"summary": "Filesystem is predicted to run out of space within the next 4 hours."
},
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 20\nand\n predict_linear(node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 4*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeFilesystemAlmostOutOfSpace",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left.",
"summary": "Filesystem has less than 5% space left."
},
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 5\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFilesystemAlmostOutOfSpace",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left.",
"summary": "Filesystem has less than 3% space left."
},
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 3\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeFilesystemFilesFillingUp",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left and is filling up.",
"summary": "Filesystem is predicted to run out of inodes within the next 24 hours."
},
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 40\nand\n predict_linear(node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 24*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFilesystemFilesFillingUp",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left and is filling up fast.",
"summary": "Filesystem is predicted to run out of inodes within the next 4 hours."
},
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 20\nand\n predict_linear(node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 4*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeFilesystemAlmostOutOfFiles",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left.",
"summary": "Filesystem has less than 5% inodes left."
},
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 5\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFilesystemAlmostOutOfFiles",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left.",
"summary": "Filesystem has less than 3% inodes left."
},
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 3\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeNetworkReceiveErrs",
"annotations": {
"description": "{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} receive errors in the last two minutes.",
"summary": "Network interface is reporting many receive errors."
},
"expr": "increase(node_network_receive_errs_total[2m]) > 10\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeNetworkTransmitErrs",
"annotations": {
"description": "{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} transmit errors in the last two minutes.",
"summary": "Network interface is reporting many transmit errors."
},
"expr": "increase(node_network_transmit_errs_total[2m]) > 10\n",
"for": "1h",
"labels": {
"severity": "warning"
}
@ -1154,18 +1085,6 @@ data:
"severity": "warning"
}
},
{
"alert": "PrometheusTSDBWALCorruptions",
"annotations": {
"description": "Prometheus {{$labels.instance}} has detected {{$value | humanize}} corruptions of the write-ahead log (WAL) over the last 3h.",
"summary": "Prometheus is detecting WAL corruptions."
},
"expr": "increase(tsdb_wal_corruptions_total{job=\"prometheus\"}[3h]) > 0\n",
"for": "4h",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusNotIngestingSamples",
"annotations": {
@ -1181,7 +1100,7 @@ data:
{
"alert": "PrometheusDuplicateTimestamps",
"annotations": {
"description": "Prometheus {{$labels.instance}} is dropping {{$value | humanize}} samples/s with different values but duplicated timestamp.",
"description": "Prometheus {{$labels.instance}} is dropping {{ printf \"%.4g\" $value }} samples/s with different values but duplicated timestamp.",
"summary": "Prometheus is dropping samples with duplicate timestamps."
},
"expr": "rate(prometheus_target_scrapes_sample_duplicate_timestamp_total{job=\"prometheus\"}[5m]) > 0\n",
@ -1193,7 +1112,7 @@ data:
{
"alert": "PrometheusOutOfOrderTimestamps",
"annotations": {
"description": "Prometheus {{$labels.instance}} is dropping {{$value | humanize}} samples/s with timestamps arriving out of order.",
"description": "Prometheus {{$labels.instance}} is dropping {{ printf \"%.4g\" $value }} samples/s with timestamps arriving out of order.",
"summary": "Prometheus drops samples with out-of-order timestamps."
},
"expr": "rate(prometheus_target_scrapes_sample_out_of_order_total{job=\"prometheus\"}[5m]) > 0\n",
@ -1226,6 +1145,18 @@ data:
"severity": "critical"
}
},
{
"alert": "PrometheusRemoteWriteDesiredShards",
"annotations": {
"description": "Prometheus {{$labels.instance}} remote write desired shards calculation wants to run {{ printf $value }} shards, which is more than the max of {{ printf `prometheus_remote_storage_shards_max{instance=\"%s\",job=\"prometheus\"}` $labels.instance | query | first | value }}.",
"summary": "Prometheus remote write desired shards calculation wants to run more than configured max shards."
},
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_shards_desired{job=\"prometheus\"}[5m])\n> on(job, instance) group_right\n max_over_time(prometheus_remote_storage_shards_max{job=\"prometheus\"}[5m])\n)\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusRuleFailures",
"annotations": {
@ -1254,3 +1185,44 @@ data:
}
]
}
typhoon.yaml: |-
{
"groups": [
{
"name": "general.rules",
"rules": [
{
"alert": "TargetDown",
"annotations": {
"message": "{{ printf \"%.4g\" $value }}% of the {{ $labels.job }} targets are down."
},
"expr": "100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, namespace, service)) > 10",
"for": "10m",
"labels": {
"severity": "warning"
}
}
]
},
{
"name": "extra.rules",
"rules": [
{
"alert": "InactiveRAIDDisk",
"annotations": {
"message": "{{ $value }} RAID disk(s) on node {{ $labels.instance }} are inactive."
},
"expr": "node_md_disks - node_md_disks_active > 0",
"for": "10m",
"labels": {
"severity": "warning"
}
}
]
}
]
}
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: monitoring

View File

@ -5,6 +5,7 @@ metadata:
namespace: monitoring
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9090'
spec:
type: ClusterIP
selector:

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.0 (upstream)
* Kubernetes v1.17.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=539b725093c8cd94ba46603adb25ac5280562ec8"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=24e5513ee6da4888005aac02101f73fc9e5e829f"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]

View File

@ -7,7 +7,7 @@ systemd:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.4.0"
Environment="ETCD_IMAGE_TAG=v3.4.3"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -108,12 +108,14 @@ systemd:
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \
--mount volume=config,target=/etc/kubernetes/secrets \
--volume assets,kind=host,source=/opt/bootstrap/assets \
--mount volume=assets,target=/assets \
--volume script,kind=host,source=/opt/bootstrap/apply \
--mount volume=script,target=/apply \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.16.0 \
docker://k8s.gcr.io/hyperkube:v1.17.0 \
--net=host \
--dns=host \
--exec=/apply
@ -134,14 +136,37 @@ storage:
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.0
KUBELET_IMAGE_TAG=v1.17.0
KUBELET_IMAGE_ARGS="--exec=/usr/local/bin/kubelet"
- path: /opt/bootstrap/layout
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/bootstrap-secrets
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/bootstrap-secrets/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
mv auth/kubeconfig /etc/kubernetes/bootstrap-secrets/
mv tls/k8s/* /etc/kubernetes/bootstrap-secrets/
sudo mkdir -p /etc/kubernetes/manifests
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/assets/auth/kubeconfig
export KUBECONFIG=/etc/kubernetes/secrets/kubeconfig
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5

View File

@ -1,7 +1,16 @@
locals {
# format assets for distribution
assets_bundle = [
# header with the unpack location
for key, value in module.bootstrap.assets_dist:
format("##### %s\n%s", key, value)
]
}
# Secure copy assets to controllers.
resource "null_resource" "copy-controller-secrets" {
count = var.controller_count
depends_on = [
module.bootstrap,
]
@ -14,63 +23,13 @@ resource "null_resource" "copy-controller-secrets" {
}
provisioner "file" {
content = module.bootstrap.etcd_ca_cert
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_cert
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_key
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = module.bootstrap.etcd_server_cert
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_server_key
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_cert
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_key
destination = "$HOME/etcd-peer.key"
}
provisioner "file" {
source = var.asset_dir
content = join("\n", local.assets_bundle)
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
"sudo mv $HOME/assets /opt/bootstrap/assets",
"sudo mkdir -p /etc/kubernetes/manifests",
"sudo mkdir -p /etc/kubernetes/bootstrap-secrets",
"sudo cp -r /opt/bootstrap/assets/tls/* /etc/kubernetes/bootstrap-secrets/",
"sudo cp /opt/bootstrap/assets/auth/kubeconfig /etc/kubernetes/bootstrap-secrets/",
"sudo cp -r /opt/bootstrap/assets/static-manifests/* /etc/kubernetes/manifests/",
"sudo /opt/bootstrap/layout",
]
}
}

View File

@ -18,57 +18,57 @@ variable "dns_zone_id" {
# instances
variable "controller_count" {
type = string
default = "1"
type = number
description = "Number of controllers (i.e. masters)"
default = 1
}
variable "worker_count" {
type = string
default = "1"
type = number
description = "Number of workers"
default = 1
}
variable "controller_type" {
type = string
default = "t3.small"
description = "EC2 instance type for controllers"
default = "t3.small"
}
variable "worker_type" {
type = string
default = "t3.small"
description = "EC2 instance type for workers"
default = "t3.small"
}
variable "os_image" {
type = string
default = "coreos-stable"
description = "AMI channel for a Container Linux derivative (coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)"
default = "coreos-stable"
}
variable "disk_size" {
type = string
default = "40"
type = number
description = "Size of the EBS volume in GB"
default = 40
}
variable "disk_type" {
type = string
default = "gp2"
description = "Type of the EBS volume (e.g. standard, gp2, io1)"
default = "gp2"
}
variable "disk_iops" {
type = string
default = "0"
type = number
description = "IOPS of the EBS volume (e.g. 100)"
default = 0
}
variable "worker_price" {
type = string
default = ""
description = "Spot price in USD for autoscaling group spot instances. Leave as default empty string for autoscaling group to use on-demand instances. Note, switching in-place from spot to on-demand is not possible: https://github.com/terraform-providers/terraform-provider-aws/issues/4320"
type = number
description = "Spot price in USD for worker instances or 0 to use on-demand instances"
default = 0
}
variable "worker_target_groups" {
@ -97,60 +97,67 @@ variable "ssh_authorized_key" {
}
variable "asset_dir" {
description = "Path to a directory where generated assets should be placed (contains secrets)"
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "networking" {
description = "Choice of networking provider (calico or flannel)"
type = string
description = "Choice of networking provider (calico or flannel)"
default = "calico"
}
variable "network_mtu" {
type = number
description = "CNI interface MTU (applies to calico only). Use 8981 if using instances types with Jumbo frames."
type = string
default = "1480"
default = 1480
}
variable "host_cidr" {
description = "CIDR IPv4 range to assign to EC2 nodes"
type = string
description = "CIDR IPv4 range to assign to EC2 nodes"
default = "10.0.0.0/16"
}
variable "pod_cidr" {
description = "CIDR IPv4 range to assign Kubernetes pods"
type = string
description = "CIDR IPv4 range to assign Kubernetes pods"
default = "10.2.0.0/16"
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
type = string
default = "10.3.0.0/16"
}
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
default = "10.3.0.0/16"
}
variable "enable_reporting" {
type = string
type = bool
description = "Enable usage or analytics reporting to upstreams (Calico)"
default = "false"
default = false
}
variable "enable_aggregation" {
type = bool
description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
type = string
default = "false"
default = false
}
variable "worker_node_labels" {
type = list(string)
description = "List of initial worker node labels"
default = []
}
# unofficial, undocumented, unsupported
variable "cluster_domain_suffix" {
type = string
description = "Queries for domains with the suffix will be answered by CoreDNS. Default is cluster.local (e.g. foo.default.svc.cluster.local)"
default = "cluster.local"
}

View File

@ -1,7 +1,7 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.12.0"
required_version = "~> 0.12.6"
required_providers {
aws = "~> 2.23"
ct = "~> 0.3"

View File

@ -19,5 +19,6 @@ module "workers" {
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
clc_snippets = var.worker_clc_snippets
node_labels = var.worker_node_labels
}

View File

@ -61,6 +61,9 @@ systemd:
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{ for label in split(",", node_labels) }
--node-labels=${label} \
%{ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
@ -95,7 +98,8 @@ storage:
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.0
KUBELET_IMAGE_TAG=v1.17.0
KUBELET_IMAGE_ARGS="--exec=/usr/local/bin/kubelet"
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
@ -113,10 +117,11 @@ storage:
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.16.0 \
docker://k8s.gcr.io/hyperkube:v1.17.0 \
--net=host \
--dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
-- \
kubectl --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
passwd:
users:
- name: core

View File

@ -23,45 +23,45 @@ variable "security_groups" {
# instances
variable "worker_count" {
type = string
default = "1"
type = number
description = "Number of instances"
default = 1
}
variable "instance_type" {
type = string
default = "t3.small"
description = "EC2 instance type"
default = "t3.small"
}
variable "os_image" {
type = string
default = "coreos-stable"
description = "AMI channel for a Container Linux derivative (coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)"
default = "coreos-stable"
}
variable "disk_size" {
type = string
default = "40"
type = number
description = "Size of the EBS volume in GB"
default = 40
}
variable "disk_type" {
type = string
default = "gp2"
description = "Type of the EBS volume (e.g. standard, gp2, io1)"
default = "gp2"
}
variable "disk_iops" {
type = string
default = "0"
type = number
description = "IOPS of the EBS volume (required for io1)"
default = 0
}
variable "spot_price" {
type = string
default = ""
description = "Spot price in USD for autoscaling group spot instances. Leave as default empty string for autoscaling group to use on-demand instances. Note, switching in-place from spot to on-demand is not possible: https://github.com/terraform-providers/terraform-provider-aws/issues/4320"
type = number
description = "Spot price in USD for worker instances or 0 to use on-demand instances"
default = 0
}
variable "target_groups" {
@ -89,19 +89,22 @@ variable "ssh_authorized_key" {
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
type = string
default = "10.3.0.0/16"
default = "10.3.0.0/16"
}
variable "cluster_domain_suffix" {
type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
default = "cluster.local"
}
variable "node_labels" {
type = list(string)
description = "List of initial node labels"
default = []
}

View File

@ -46,7 +46,7 @@ resource "aws_autoscaling_group" "workers" {
resource "aws_launch_configuration" "worker" {
image_id = local.ami_id
instance_type = var.instance_type
spot_price = var.spot_price
spot_price = var.spot_price > 0 ? var.spot_price : null
enable_monitoring = false
user_data = data.ct_config.worker-ignition.rendered
@ -86,6 +86,7 @@ data "template_file" "worker-config" {
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
cgroup_driver = local.flavor == "flatcar" && local.channel == "edge" ? "systemd" : "cgroupfs"
node_labels = join(",", var.node_labels)
}
}

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.0 (upstream)
* Kubernetes v1.17.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -13,9 +13,11 @@ data "aws_ami" "fedora-coreos" {
values = ["hvm"]
}
// pin on known ok versions as preview matures
filter {
name = "name"
values = ["fedora-coreos-30.20190801.0-hvm"]
values = ["fedora-coreos-30.*.*-hvm"]
}
# try to filter out dev images (AWS filters can't)
name_regex = "^fedora-coreos-30.[0-9]*.[0-9]*-hvm*"
}

View File

@ -1,6 +1,6 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=539b725093c8cd94ba46603adb25ac5280562ec8"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=24e5513ee6da4888005aac02101f73fc9e5e829f"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]

View File

@ -28,7 +28,7 @@ systemd:
--network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.4.0
quay.io/coreos/etcd:v3.4.3
ExecStop=/usr/bin/podman stop etcd
[Install]
WantedBy=multi-user.target
@ -80,7 +80,7 @@ systemd:
--volume /var/run:/var/run \
--volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \
k8s.gcr.io/hyperkube:v1.16.0 /hyperkube kubelet \
k8s.gcr.io/hyperkube:v1.17.0 kubelet \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -119,10 +119,11 @@ systemd:
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/podman run --name bootstrap \
--network host \
--volume /etc/kubernetes/bootstrap-secrets:/etc/kubernetes/secrets:ro,Z \
--volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \
k8s.gcr.io/hyperkube:v1.16.0 \
/apply
--entrypoint=/apply \
k8s.gcr.io/hyperkube:v1.17.0
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap
storage:
@ -135,12 +136,33 @@ storage:
contents:
inline: |
${kubeconfig}
- path: /opt/bootstrap/layout
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/bootstrap-secrets
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/bootstrap-secrets/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
mv auth/kubeconfig /etc/kubernetes/bootstrap-secrets/
mv tls/k8s/* /etc/kubernetes/bootstrap-secrets/
sudo mkdir -p /etc/kubernetes/manifests
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
- path: /opt/bootstrap/apply
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/assets/auth/kubeconfig
export KUBECONFIG=/etc/kubernetes/secrets/kubeconfig
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5

View File

@ -1,7 +1,16 @@
locals {
# format assets for distribution
assets_bundle = [
# header with the unpack location
for key, value in module.bootstrap.assets_dist:
format("##### %s\n%s", key, value)
]
}
# Secure copy assets to controllers.
resource "null_resource" "copy-controller-secrets" {
count = var.controller_count
depends_on = [
module.bootstrap,
]
@ -12,65 +21,15 @@ resource "null_resource" "copy-controller-secrets" {
user = "core"
timeout = "15m"
}
provisioner "file" {
content = module.bootstrap.etcd_ca_cert
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_cert
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_key
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = module.bootstrap.etcd_server_cert
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_server_key
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_cert
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_key
destination = "$HOME/etcd-peer.key"
}
provisioner "file" {
source = var.asset_dir
content = join("\n", local.assets_bundle)
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
"sudo mv $HOME/assets /opt/bootstrap/assets",
"sudo mkdir -p /etc/kubernetes/manifests",
"sudo mkdir -p /etc/kubernetes/bootstrap-secrets",
"sudo cp -r /opt/bootstrap/assets/tls/* /etc/kubernetes/bootstrap-secrets/",
"sudo cp /opt/bootstrap/assets/auth/kubeconfig /etc/kubernetes/bootstrap-secrets/",
"sudo cp -r /opt/bootstrap/assets/static-manifests/* /etc/kubernetes/manifests/"
"sudo /opt/bootstrap/layout",
]
}
}

View File

@ -18,57 +18,57 @@ variable "dns_zone_id" {
# instances
variable "controller_count" {
type = string
default = "1"
type = number
description = "Number of controllers (i.e. masters)"
default = 1
}
variable "worker_count" {
type = string
default = "1"
type = number
description = "Number of workers"
default = 1
}
variable "controller_type" {
type = string
default = "t3.small"
description = "EC2 instance type for controllers"
default = "t3.small"
}
variable "worker_type" {
type = string
default = "t3.small"
description = "EC2 instance type for workers"
default = "t3.small"
}
variable "os_image" {
type = string
default = "coreos-stable"
description = "AMI channel for Fedora CoreOS (not yet used)"
default = "coreos-stable"
}
variable "disk_size" {
type = string
default = "40"
type = number
description = "Size of the EBS volume in GB"
default = 40
}
variable "disk_type" {
type = string
default = "gp2"
description = "Type of the EBS volume (e.g. standard, gp2, io1)"
default = "gp2"
}
variable "disk_iops" {
type = string
default = "0"
type = number
description = "IOPS of the EBS volume (e.g. 100)"
default = 0
}
variable "worker_price" {
type = string
default = ""
description = "Spot price in USD for autoscaling group spot instances. Leave as default empty string for autoscaling group to use on-demand instances. Note, switching in-place from spot to on-demand is not possible: https://github.com/terraform-providers/terraform-provider-aws/issues/4320"
type = number
description = "Spot price in USD for worker instances or 0 to use on-demand instances"
default = 0
}
variable "worker_target_groups" {
@ -97,60 +97,67 @@ variable "ssh_authorized_key" {
}
variable "asset_dir" {
description = "Path to a directory where generated assets should be placed (contains secrets)"
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "networking" {
description = "Choice of networking provider (calico or flannel)"
type = string
description = "Choice of networking provider (calico or flannel)"
default = "calico"
}
variable "network_mtu" {
type = number
description = "CNI interface MTU (applies to calico only). Use 8981 if using instances types with Jumbo frames."
type = string
default = "1480"
default = 1480
}
variable "host_cidr" {
description = "CIDR IPv4 range to assign to EC2 nodes"
type = string
description = "CIDR IPv4 range to assign to EC2 nodes"
default = "10.0.0.0/16"
}
variable "pod_cidr" {
description = "CIDR IPv4 range to assign Kubernetes pods"
type = string
description = "CIDR IPv4 range to assign Kubernetes pods"
default = "10.2.0.0/16"
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
type = string
default = "10.3.0.0/16"
}
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
default = "10.3.0.0/16"
}
variable "enable_reporting" {
type = string
type = bool
description = "Enable usage or analytics reporting to upstreams (Calico)"
default = "false"
default = false
}
variable "enable_aggregation" {
type = bool
description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
type = string
default = "false"
default = false
}
variable "worker_node_labels" {
type = list(string)
description = "List of initial worker node labels"
default = []
}
# unofficial, undocumented, unsupported
variable "cluster_domain_suffix" {
type = string
description = "Queries for domains with the suffix will be answered by CoreDNS. Default is cluster.local (e.g. foo.default.svc.cluster.local)"
default = "cluster.local"
}

View File

@ -1,7 +1,7 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.12.0"
required_version = "~> 0.12.6"
required_providers {
aws = "~> 2.23"
ct = "~> 0.4"

View File

@ -19,5 +19,6 @@ module "workers" {
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
snippets = var.worker_snippets
node_labels = var.worker_node_labels
}

View File

@ -13,9 +13,11 @@ data "aws_ami" "fedora-coreos" {
values = ["hvm"]
}
// pin on known ok versions as preview matures
filter {
name = "name"
values = ["fedora-coreos-30.20190801.0-hvm"]
values = ["fedora-coreos-30.*.*-hvm"]
}
# try to filter out dev images (AWS filters can't)
name_regex = "^fedora-coreos-30.[0-9]*.[0-9]*-hvm*"
}

View File

@ -50,7 +50,7 @@ systemd:
--volume /var/run:/var/run \
--volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \
k8s.gcr.io/hyperkube:v1.16.0 /hyperkube kubelet \
k8s.gcr.io/hyperkube:v1.17.0 kubelet \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -66,6 +66,9 @@ systemd:
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{ for label in split(",", node_labels) }
--node-labels=${label} \
%{ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins

View File

@ -23,45 +23,45 @@ variable "security_groups" {
# instances
variable "worker_count" {
type = string
default = "1"
type = number
description = "Number of instances"
default = 1
}
variable "instance_type" {
type = string
default = "t3.small"
description = "EC2 instance type"
default = "t3.small"
}
variable "os_image" {
type = string
default = "coreos-stable"
description = "AMI channel for Fedora CoreOS (not yet used)"
default = "coreos-stable"
}
variable "disk_size" {
type = string
default = "40"
type = number
description = "Size of the EBS volume in GB"
default = 40
}
variable "disk_type" {
type = string
default = "gp2"
description = "Type of the EBS volume (e.g. standard, gp2, io1)"
default = "gp2"
}
variable "disk_iops" {
type = string
default = "0"
type = number
description = "IOPS of the EBS volume (required for io1)"
default = 0
}
variable "spot_price" {
type = string
default = ""
description = "Spot price in USD for autoscaling group spot instances. Leave as default empty string for autoscaling group to use on-demand instances. Note, switching in-place from spot to on-demand is not possible: https://github.com/terraform-providers/terraform-provider-aws/issues/4320"
type = number
description = "Spot price in USD for worker instances or 0 to use on-demand instances"
default = 0
}
variable "target_groups" {
@ -89,19 +89,22 @@ variable "ssh_authorized_key" {
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
type = string
default = "10.3.0.0/16"
default = "10.3.0.0/16"
}
variable "cluster_domain_suffix" {
type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
default = "cluster.local"
}
variable "node_labels" {
type = list(string)
description = "List of initial node labels"
default = []
}

View File

@ -46,7 +46,7 @@ resource "aws_autoscaling_group" "workers" {
resource "aws_launch_configuration" "worker" {
image_id = data.aws_ami.fedora-coreos.image_id
instance_type = var.instance_type
spot_price = var.spot_price
spot_price = var.spot_price > 0 ? var.spot_price : null
enable_monitoring = false
user_data = data.ct_config.worker-ignition.rendered
@ -71,9 +71,9 @@ resource "aws_launch_configuration" "worker" {
# Worker Ignition config
data "ct_config" "worker-ignition" {
content = data.template_file.worker-config.rendered
strict = true
snippets = var.snippets
content = data.template_file.worker-config.rendered
strict = true
snippets = var.snippets
}
# Worker Fedora CoreOS config
@ -85,6 +85,7 @@ data "template_file" "worker-config" {
ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
node_labels = join(",", var.node_labels)
}
}

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.0 (upstream)
* Kubernetes v1.17.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [low-priority](https://typhoon.psdn.io/cl/azure/#low-priority) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=539b725093c8cd94ba46603adb25ac5280562ec8"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=24e5513ee6da4888005aac02101f73fc9e5e829f"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]

View File

@ -7,7 +7,7 @@ systemd:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.4.0"
Environment="ETCD_IMAGE_TAG=v3.4.3"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -106,12 +106,14 @@ systemd:
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \
--mount volume=config,target=/etc/kubernetes/secrets \
--volume assets,kind=host,source=/opt/bootstrap/assets \
--mount volume=assets,target=/assets \
--volume script,kind=host,source=/opt/bootstrap/apply \
--mount volume=script,target=/apply \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.16.0 \
docker://k8s.gcr.io/hyperkube:v1.17.0 \
--net=host \
--dns=host \
--exec=/apply
@ -132,14 +134,37 @@ storage:
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.0
KUBELET_IMAGE_TAG=v1.17.0
KUBELET_IMAGE_ARGS="--exec=/usr/local/bin/kubelet"
- path: /opt/bootstrap/layout
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/bootstrap-secrets
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/bootstrap-secrets/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
mv auth/kubeconfig /etc/kubernetes/bootstrap-secrets/
mv tls/k8s/* /etc/kubernetes/bootstrap-secrets/
sudo mkdir -p /etc/kubernetes/manifests
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/assets/auth/kubeconfig
export KUBECONFIG=/etc/kubernetes/secrets/kubeconfig
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5

View File

@ -19,6 +19,10 @@ output "resource_group_name" {
value = azurerm_resource_group.cluster.name
}
output "resource_group_id" {
value = azurerm_resource_group.cluster.id
}
output "subnet_id" {
value = azurerm_subnet.worker.id
}
@ -53,4 +57,3 @@ output "backend_address_pool_id" {
description = "ID of the worker backend address pool"
value = azurerm_lb_backend_address_pool.worker.id
}

View File

@ -1,3 +1,12 @@
locals {
# format assets for distribution
assets_bundle = [
# header with the unpack location
for key, value in module.bootstrap.assets_dist:
format("##### %s\n%s", key, value)
]
}
# Secure copy assets to controllers.
resource "null_resource" "copy-controller-secrets" {
count = var.controller_count
@ -13,65 +22,15 @@ resource "null_resource" "copy-controller-secrets" {
user = "core"
timeout = "15m"
}
provisioner "file" {
content = module.bootstrap.etcd_ca_cert
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_cert
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_key
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = module.bootstrap.etcd_server_cert
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_server_key
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_cert
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_key
destination = "$HOME/etcd-peer.key"
}
provisioner "file" {
source = var.asset_dir
content = join("\n", local.assets_bundle)
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
"sudo mv $HOME/assets /opt/bootstrap/assets",
"sudo mkdir -p /etc/kubernetes/manifests",
"sudo mkdir -p /etc/kubernetes/bootstrap-secrets",
"sudo cp -r /opt/bootstrap/assets/tls/* /etc/kubernetes/bootstrap-secrets/",
"sudo cp /opt/bootstrap/assets/auth/kubeconfig /etc/kubernetes/bootstrap-secrets/",
"sudo cp -r /opt/bootstrap/assets/static-manifests/* /etc/kubernetes/manifests/",
"sudo /opt/bootstrap/layout",
]
}
}

View File

@ -23,27 +23,27 @@ variable "dns_zone_group" {
# instances
variable "controller_count" {
type = string
default = "1"
type = number
description = "Number of controllers (i.e. masters)"
default = 1
}
variable "worker_count" {
type = string
default = "1"
type = number
description = "Number of workers"
default = 1
}
variable "controller_type" {
type = string
default = "Standard_B2s"
description = "Machine type for controllers (see `az vm list-skus --location centralus`)"
default = "Standard_B2s"
}
variable "worker_type" {
type = string
default = "Standard_DS1_v2"
description = "Machine type for workers (see `az vm list-skus --location centralus`)"
default = "Standard_DS1_v2"
}
variable "os_image" {
@ -53,15 +53,15 @@ variable "os_image" {
}
variable "disk_size" {
type = string
default = "40"
type = number
description = "Size of the disk in GB"
default = 40
}
variable "worker_priority" {
type = string
default = "Regular"
description = "Set worker priority to Low to use reduced cost surplus capacity, with the tradeoff that instances can be deallocated at any time."
default = "Regular"
}
variable "controller_clc_snippets" {
@ -84,54 +84,61 @@ variable "ssh_authorized_key" {
}
variable "asset_dir" {
description = "Path to a directory where generated assets should be placed (contains secrets)"
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "networking" {
description = "Choice of networking provider (flannel or calico)"
type = string
default = "flannel"
description = "Choice of networking provider (flannel or calico)"
default = "calico"
}
variable "host_cidr" {
description = "CIDR IPv4 range to assign to instances"
type = string
description = "CIDR IPv4 range to assign to instances"
default = "10.0.0.0/16"
}
variable "pod_cidr" {
description = "CIDR IPv4 range to assign Kubernetes pods"
type = string
description = "CIDR IPv4 range to assign Kubernetes pods"
default = "10.2.0.0/16"
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
type = string
default = "10.3.0.0/16"
}
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
default = "10.3.0.0/16"
}
variable "enable_reporting" {
type = string
type = bool
description = "Enable usage or analytics reporting to upstreams (Calico)"
default = "false"
default = false
}
variable "enable_aggregation" {
type = bool
description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
type = string
default = "false"
default = false
}
variable "worker_node_labels" {
type = list(string)
description = "List of initial worker node labels"
default = []
}
# unofficial, undocumented, unsupported
variable "cluster_domain_suffix" {
type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
default = "cluster.local"
}

View File

@ -1,7 +1,7 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.12.0"
required_version = "~> 0.12.6"
required_providers {
azurerm = "~> 1.27"
ct = "~> 0.3"

View File

@ -20,5 +20,6 @@ module "workers" {
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
clc_snippets = var.worker_clc_snippets
node_labels = var.worker_node_labels
}

View File

@ -59,6 +59,9 @@ systemd:
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{ for label in split(",", node_labels) }
--node-labels=${label} \
%{ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
@ -93,7 +96,8 @@ storage:
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.0
KUBELET_IMAGE_TAG=v1.17.0
KUBELET_IMAGE_ARGS="--exec=/usr/local/bin/kubelet"
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
@ -111,10 +115,11 @@ storage:
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.16.0 \
docker://k8s.gcr.io/hyperkube:v1.17.0 \
--net=host \
--dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname | tr '[:upper:]' '[:lower:]')
-- \
kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname | tr '[:upper:]' '[:lower:]')
passwd:
users:
- name: core

View File

@ -33,27 +33,27 @@ variable "backend_address_pool_id" {
# instances
variable "worker_count" {
type = string
default = "1"
type = number
description = "Number of instances"
default = 1
}
variable "vm_type" {
type = string
default = "Standard_F1"
description = "Machine type for instances (see `az vm list-skus --location centralus`)"
default = "Standard_DS1_v2"
}
variable "os_image" {
type = string
default = "coreos-stable"
description = "Channel for a Container Linux derivative (coreos-stable, coreos-beta, coreos-alpha)"
default = "coreos-stable"
}
variable "priority" {
type = string
default = "Regular"
description = "Set priority to Low to use reduced cost surplus capacity, with the tradeoff that instances can be evicted at any time."
default = "Regular"
}
variable "clc_snippets" {
@ -75,19 +75,25 @@ variable "ssh_authorized_key" {
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
type = string
default = "10.3.0.0/16"
default = "10.3.0.0/16"
}
variable "node_labels" {
type = list(string)
description = "List of initial node labels"
default = []
}
# unofficial, undocumented, unsupported
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
type = string
default = "cluster.local"
}

View File

@ -69,8 +69,8 @@ resource "azurerm_virtual_machine_scale_set" "workers" {
# lifecycle
upgrade_policy_mode = "Manual"
# eviction policy may only be set when priority is Low
priority = var.priority
eviction_policy = var.priority == "Low" ? "Delete" : null
priority = var.priority
eviction_policy = var.priority == "Low" ? "Delete" : null
}
# Scale up or down to maintain desired number, tolerating deallocations.
@ -111,6 +111,7 @@ data "template_file" "worker-config" {
ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
node_labels = join(",", var.node_labels)
}
}

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.0 (upstream)
* Kubernetes v1.17.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,10 +1,10 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=539b725093c8cd94ba46603adb25ac5280562ec8"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=24e5513ee6da4888005aac02101f73fc9e5e829f"
cluster_name = var.cluster_name
api_servers = [var.k8s_domain_name]
etcd_servers = var.controller_domains
etcd_servers = var.controllers.*.domain
asset_dir = var.asset_dir
networking = var.networking
network_mtu = var.network_mtu

View File

@ -7,7 +7,7 @@ systemd:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.4.0"
Environment="ETCD_IMAGE_TAG=v3.4.3"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
@ -121,12 +121,14 @@ systemd:
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \
--mount volume=config,target=/etc/kubernetes/secrets \
--volume assets,kind=host,source=/opt/bootstrap/assets \
--mount volume=assets,target=/assets \
--volume script,kind=host,source=/opt/bootstrap/apply \
--mount volume=script,target=/apply \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.16.0 \
docker://k8s.gcr.io/hyperkube:v1.17.0 \
--net=host \
--dns=host \
--exec=/apply
@ -141,20 +143,43 @@ storage:
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.0
KUBELET_IMAGE_TAG=v1.17.0
KUBELET_IMAGE_ARGS="--exec=/usr/local/bin/kubelet"
- path: /etc/hostname
filesystem: root
mode: 0644
contents:
inline:
${domain_name}
- path: /opt/bootstrap/layout
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/bootstrap-secrets
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/bootstrap-secrets/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
mv auth/kubeconfig /etc/kubernetes/bootstrap-secrets/
mv tls/k8s/* /etc/kubernetes/bootstrap-secrets/
sudo mkdir -p /etc/kubernetes/manifests
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/assets/auth/kubeconfig
export KUBECONFIG=/etc/kubernetes/secrets/kubeconfig
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5

View File

@ -35,7 +35,6 @@ storage:
-d ${install_disk} \
-C ${os_channel} \
-V ${os_version} \
-o "${container_linux_oem}" \
${baseurl_flag} \
-i ignition.json
udevadm settle

View File

@ -91,7 +91,8 @@ storage:
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.0
KUBELET_IMAGE_TAG=v1.17.0
KUBELET_IMAGE_ARGS="--exec=/usr/local/bin/kubelet"
- path: /etc/hostname
filesystem: root
mode: 0644

View File

@ -1,34 +1,34 @@
resource "matchbox_group" "install" {
count = length(var.controller_names) + length(var.worker_names)
count = length(var.controllers) + length(var.workers)
name = format("install-%s", element(concat(var.controller_names, var.worker_names), count.index))
name = format("install-%s", concat(var.controllers.*.name, var.workers.*.name)[count.index])
# pick one of 4 Matchbox profiles (Container Linux or Flatcar, cached or non-cached)
profile = local.flavor == "flatcar" ? var.cached_install == "true" ? element(matchbox_profile.cached-flatcar-linux-install.*.name, count.index) : element(matchbox_profile.flatcar-install.*.name, count.index) : var.cached_install == "true" ? element(matchbox_profile.cached-container-linux-install.*.name, count.index) : element(matchbox_profile.container-linux-install.*.name, count.index)
profile = local.flavor == "flatcar" ? var.cached_install ? matchbox_profile.cached-flatcar-linux-install.*.name[count.index] : matchbox_profile.flatcar-install.*.name[count.index] : var.cached_install ? matchbox_profile.cached-container-linux-install.*.name[count.index] : matchbox_profile.container-linux-install.*.name[count.index]
selector = {
mac = element(concat(var.controller_macs, var.worker_macs), count.index)
mac = concat(var.controllers.*.mac, var.workers.*.mac)[count.index]
}
}
resource "matchbox_group" "controller" {
count = length(var.controller_names)
name = format("%s-%s", var.cluster_name, element(var.controller_names, count.index))
profile = element(matchbox_profile.controllers.*.name, count.index)
count = length(var.controllers)
name = format("%s-%s", var.cluster_name, var.controllers[count.index].name)
profile = matchbox_profile.controllers.*.name[count.index]
selector = {
mac = element(var.controller_macs, count.index)
mac = var.controllers[count.index].mac
os = "installed"
}
}
resource "matchbox_group" "worker" {
count = length(var.worker_names)
name = format("%s-%s", var.cluster_name, element(var.worker_names, count.index))
profile = element(matchbox_profile.workers.*.name, count.index)
count = length(var.workers)
name = format("%s-%s", var.cluster_name, var.workers[count.index].name)
profile = matchbox_profile.workers.*.name[count.index]
selector = {
mac = element(var.worker_macs, count.index)
mac = var.workers[count.index].mac
os = "installed"
}
}

View File

@ -1,15 +1,14 @@
locals {
# coreos-stable -> coreos flavor, stable channel
# flatcar-stable -> flatcar flavor, stable channel
flavor = element(split("-", var.os_channel), 0)
channel = element(split("-", var.os_channel), 1)
flavor = split("-", var.os_channel)[0]
channel = split("-", var.os_channel)[1]
}
// Container Linux Install profile (from release.core-os.net)
resource "matchbox_profile" "container-linux-install" {
count = length(var.controller_names) + length(var.worker_names)
name = format("%s-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))
count = length(var.controllers) + length(var.workers)
name = format("%s-container-linux-install-%s", var.cluster_name, concat(var.controllers.*.name, var.workers.*.name)[count.index])
kernel = "${var.download_protocol}://${local.channel}.release.core-os.net/amd64-usr/${var.os_version}/coreos_production_pxe.vmlinuz"
@ -26,22 +25,21 @@ resource "matchbox_profile" "container-linux-install" {
var.kernel_args,
])
container_linux_config = element(data.template_file.container-linux-install-configs.*.rendered, count.index)
container_linux_config = data.template_file.container-linux-install-configs.*.rendered[count.index]
}
data "template_file" "container-linux-install-configs" {
count = length(var.controller_names) + length(var.worker_names)
count = length(var.controllers) + length(var.workers)
template = file("${path.module}/cl/install.yaml.tmpl")
vars = {
os_flavor = local.flavor
os_channel = local.channel
os_version = var.os_version
ignition_endpoint = format("%s/ignition", var.matchbox_http_endpoint)
install_disk = var.install_disk
container_linux_oem = var.container_linux_oem
ssh_authorized_key = var.ssh_authorized_key
os_flavor = local.flavor
os_channel = local.channel
os_version = var.os_version
ignition_endpoint = format("%s/ignition", var.matchbox_http_endpoint)
install_disk = var.install_disk
ssh_authorized_key = var.ssh_authorized_key
# only cached-container-linux profile adds -b baseurl
baseurl_flag = ""
}
@ -50,8 +48,8 @@ data "template_file" "container-linux-install-configs" {
// Container Linux Install profile (from matchbox /assets cache)
// Note: Admin must have downloaded os_version into matchbox assets/coreos.
resource "matchbox_profile" "cached-container-linux-install" {
count = length(var.controller_names) + length(var.worker_names)
name = format("%s-cached-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))
count = length(var.controllers) + length(var.workers)
name = format("%s-cached-container-linux-install-%s", var.cluster_name, concat(var.controllers.*.name, var.workers.*.name)[count.index])
kernel = "/assets/coreos/${var.os_version}/coreos_production_pxe.vmlinuz"
@ -68,22 +66,21 @@ resource "matchbox_profile" "cached-container-linux-install" {
var.kernel_args,
])
container_linux_config = element(data.template_file.cached-container-linux-install-configs.*.rendered, count.index)
container_linux_config = data.template_file.cached-container-linux-install-configs.*.rendered[count.index]
}
data "template_file" "cached-container-linux-install-configs" {
count = length(var.controller_names) + length(var.worker_names)
count = length(var.controllers) + length(var.workers)
template = file("${path.module}/cl/install.yaml.tmpl")
vars = {
os_flavor = local.flavor
os_channel = local.channel
os_version = var.os_version
ignition_endpoint = format("%s/ignition", var.matchbox_http_endpoint)
install_disk = var.install_disk
container_linux_oem = var.container_linux_oem
ssh_authorized_key = var.ssh_authorized_key
os_flavor = local.flavor
os_channel = local.channel
os_version = var.os_version
ignition_endpoint = format("%s/ignition", var.matchbox_http_endpoint)
install_disk = var.install_disk
ssh_authorized_key = var.ssh_authorized_key
# profile uses -b baseurl to install from matchbox cache
baseurl_flag = "-b ${var.matchbox_http_endpoint}/assets/${local.flavor}"
}
@ -91,8 +88,8 @@ data "template_file" "cached-container-linux-install-configs" {
// Flatcar Linux install profile (from release.flatcar-linux.net)
resource "matchbox_profile" "flatcar-install" {
count = length(var.controller_names) + length(var.worker_names)
name = format("%s-flatcar-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))
count = length(var.controllers) + length(var.workers)
name = format("%s-flatcar-install-%s", var.cluster_name, concat(var.controllers.*.name, var.workers.*.name)[count.index])
kernel = "${var.download_protocol}://${local.channel}.release.flatcar-linux.net/amd64-usr/${var.os_version}/flatcar_production_pxe.vmlinuz"
@ -109,14 +106,14 @@ resource "matchbox_profile" "flatcar-install" {
var.kernel_args,
])
container_linux_config = element(data.template_file.container-linux-install-configs.*.rendered, count.index)
container_linux_config = data.template_file.container-linux-install-configs.*.rendered[count.index]
}
// Flatcar Linux Install profile (from matchbox /assets cache)
// Note: Admin must have downloaded os_version into matchbox assets/flatcar.
resource "matchbox_profile" "cached-flatcar-linux-install" {
count = length(var.controller_names) + length(var.worker_names)
name = format("%s-cached-flatcar-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))
count = length(var.controllers) + length(var.workers)
name = format("%s-cached-flatcar-linux-install-%s", var.cluster_name, concat(var.controllers.*.name, var.workers.*.name)[count.index])
kernel = "/assets/flatcar/${var.os_version}/flatcar_production_pxe.vmlinuz"
@ -133,33 +130,33 @@ resource "matchbox_profile" "cached-flatcar-linux-install" {
var.kernel_args,
])
container_linux_config = element(data.template_file.cached-container-linux-install-configs.*.rendered, count.index)
container_linux_config = data.template_file.cached-container-linux-install-configs.*.rendered[count.index]
}
// Kubernetes Controller profiles
resource "matchbox_profile" "controllers" {
count = length(var.controller_names)
name = format("%s-controller-%s", var.cluster_name, element(var.controller_names, count.index))
raw_ignition = element(data.ct_config.controller-ignitions.*.rendered, count.index)
count = length(var.controllers)
name = format("%s-controller-%s", var.cluster_name, var.controllers.*.name[count.index])
raw_ignition = data.ct_config.controller-ignitions.*.rendered[count.index]
}
data "ct_config" "controller-ignitions" {
count = length(var.controller_names)
content = element(data.template_file.controller-configs.*.rendered, count.index)
count = length(var.controllers)
content = data.template_file.controller-configs.*.rendered[count.index]
pretty_print = false
snippets = local.clc_map[element(var.controller_names, count.index)]
snippets = local.clc_map[var.controllers.*.name[count.index]]
}
data "template_file" "controller-configs" {
count = length(var.controller_names)
count = length(var.controllers)
template = file("${path.module}/cl/controller.yaml.tmpl")
vars = {
domain_name = element(var.controller_domains, count.index)
etcd_name = element(var.controller_names, count.index)
etcd_initial_cluster = join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))
cgroup_driver = var.os_channel == "flatcar-edge" ? "systemd" : "cgroupfs"
domain_name = var.controllers.*.domain[count.index]
etcd_name = var.controllers.*.name[count.index]
etcd_initial_cluster = join(",", formatlist("%s=https://%s:2380", var.controllers.*.name, var.controllers.*.domain))
cgroup_driver = var.os_channel == "flatcar-edge" ? "systemd" : "cgroupfs"
cluster_dns_service_ip = module.bootstrap.cluster_dns_service_ip
cluster_domain_suffix = var.cluster_domain_suffix
ssh_authorized_key = var.ssh_authorized_key
@ -168,26 +165,26 @@ data "template_file" "controller-configs" {
// Kubernetes Worker profiles
resource "matchbox_profile" "workers" {
count = length(var.worker_names)
name = format("%s-worker-%s", var.cluster_name, element(var.worker_names, count.index))
raw_ignition = element(data.ct_config.worker-ignitions.*.rendered, count.index)
count = length(var.workers)
name = format("%s-worker-%s", var.cluster_name, var.workers.*.name[count.index])
raw_ignition = data.ct_config.worker-ignitions.*.rendered[count.index]
}
data "ct_config" "worker-ignitions" {
count = length(var.worker_names)
content = element(data.template_file.worker-configs.*.rendered, count.index)
count = length(var.workers)
content = data.template_file.worker-configs.*.rendered[count.index]
pretty_print = false
snippets = local.clc_map[element(var.worker_names, count.index)]
snippets = local.clc_map[var.workers.*.name[count.index]]
}
data "template_file" "worker-configs" {
count = length(var.worker_names)
count = length(var.workers)
template = file("${path.module}/cl/worker.yaml.tmpl")
vars = {
domain_name = element(var.worker_domains, count.index)
cgroup_driver = var.os_channel == "flatcar-edge" ? "systemd" : "cgroupfs"
domain_name = var.workers.*.domain[count.index]
cgroup_driver = var.os_channel == "flatcar-edge" ? "systemd" : "cgroupfs"
cluster_dns_service_ip = module.bootstrap.cluster_dns_service_ip
cluster_domain_suffix = var.cluster_domain_suffix
ssh_authorized_key = var.ssh_authorized_key
@ -200,7 +197,7 @@ locals {
# Default Container Linux config snippets map every node names to list("\n") so
# all lookups succeed
clc_defaults = zipmap(
concat(var.controller_names, var.worker_names),
concat(var.controllers.*.name, var.workers.*.name),
chunklist(data.template_file.clc-default-snippets.*.rendered, 1),
)
@ -210,7 +207,7 @@ locals {
// Horrible hack to generate a Terraform list of node count length
data "template_file" "clc-default-snippets" {
count = length(var.controller_names) + length(var.worker_names)
count = length(var.controllers) + length(var.workers)
template = "\n"
}

View File

@ -1,6 +1,15 @@
locals {
# format assets for distribution
assets_bundle = [
# header with the unpack location
for key, value in module.bootstrap.assets_dist:
format("##### %s\n%s", key, value)
]
}
# Secure copy assets to controllers. Activates kubelet.service
resource "null_resource" "copy-controller-secrets" {
count = length(var.controller_names)
count = length(var.controllers)
# Without depends_on, remote-exec could start and wait for machines before
# matchbox groups are written, causing a deadlock.
@ -13,7 +22,7 @@ resource "null_resource" "copy-controller-secrets" {
connection {
type = "ssh"
host = var.controller_domains[count.index]
host = var.controllers.*.domain[count.index]
user = "core"
timeout = "60m"
}
@ -24,71 +33,21 @@ resource "null_resource" "copy-controller-secrets" {
}
provisioner "file" {
content = module.bootstrap.etcd_ca_cert
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_cert
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_key
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = module.bootstrap.etcd_server_cert
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_server_key
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_cert
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_key
destination = "$HOME/etcd-peer.key"
}
provisioner "file" {
source = var.asset_dir
content = join("\n", local.assets_bundle)
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
"sudo mv $HOME/assets /opt/bootstrap/assets",
"sudo mkdir -p /etc/kubernetes/manifests"
"sudo mkdir -p /etc/kubernetes/bootstrap-secrets",
"sudo mv $HOME/kubeconfig /etc/kubernetes/kubeconfig",
"sudo cp -r /opt/bootstrap/assets/tls/* /etc/kubernetes/bootstrap-secrets/",
"sudo cp /opt/bootstrap/assets/auth/kubeconfig /etc/kubernetes/bootstrap-secrets/",
"sudo cp -r /opt/bootstrap/assets/static-manifests/* /etc/kubernetes/manifests/",
"sudo /opt/bootstrap/layout",
]
}
}
# Secure copy kubeconfig to all workers. Activates kubelet.service
resource "null_resource" "copy-worker-secrets" {
count = length(var.worker_names)
count = length(var.workers)
# Without depends_on, remote-exec could start and wait for machines before
# matchbox groups are written, causing a deadlock.
@ -100,7 +59,7 @@ resource "null_resource" "copy-worker-secrets" {
connection {
type = "ssh"
host = var.worker_domains[count.index]
host = var.workers.*.domain[count.index]
user = "core"
timeout = "60m"
}
@ -129,7 +88,7 @@ resource "null_resource" "bootstrap" {
connection {
type = "ssh"
host = var.controller_domains[0]
host = var.controllers[0].domain
user = "core"
timeout = "15m"
}

View File

@ -21,36 +21,32 @@ variable "os_version" {
}
# machines
# Terraform's crude "type system" does not properly support lists of maps so we do this.
variable "controller_names" {
type = list(string)
description = "Ordered list of controller names (e.g. [node1])"
variable "controllers" {
type = list(object({
name = string
mac = string
domain = string
}))
description = <<EOD
List of controller machine details (unique name, identifying MAC address, FQDN)
[{ name = "node1", mac = "52:54:00:a1:9c:ae", domain = "node1.example.com"}]
EOD
}
variable "controller_macs" {
type = list(string)
description = "Ordered list of controller identifying MAC addresses (e.g. [52:54:00:a1:9c:ae])"
}
variable "controller_domains" {
type = list(string)
description = "Ordered list of controller FQDNs (e.g. [node1.example.com])"
}
variable "worker_names" {
type = list(string)
description = "Ordered list of worker names (e.g. [node2, node3])"
}
variable "worker_macs" {
type = list(string)
description = "Ordered list of worker identifying MAC addresses (e.g. [52:54:00:b2:2f:86, 52:54:00:c3:61:77])"
}
variable "worker_domains" {
type = list(string)
description = "Ordered list of worker FQDNs (e.g. [node2.example.com, node3.example.com])"
variable "workers" {
type = list(object({
name = string
mac = string
domain = string
}))
description = <<EOD
List of worker machine details (unique name, identifying MAC address, FQDN)
[
{ name = "node2", mac = "52:54:00:b2:2f:86", domain = "node2.example.com"},
{ name = "node3", mac = "52:54:00:c3:61:77", domain = "node3.example.com"}
]
EOD
}
variable "clc_snippets" {
@ -62,8 +58,8 @@ variable "clc_snippets" {
# configuration
variable "k8s_domain_name" {
description = "Controller DNS name which resolves to a controller instance. Workers and kubeconfig's will communicate with this endpoint (e.g. cluster.example.com)"
type = string
description = "Controller DNS name which resolves to a controller instance. Workers and kubeconfig's will communicate with this endpoint (e.g. cluster.example.com)"
}
variable "ssh_authorized_key" {
@ -72,92 +68,87 @@ variable "ssh_authorized_key" {
}
variable "asset_dir" {
description = "Path to a directory where generated assets should be placed (contains secrets)"
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "networking" {
description = "Choice of networking provider (flannel or calico)"
type = string
description = "Choice of networking provider (flannel or calico)"
default = "calico"
}
variable "network_mtu" {
type = number
description = "CNI interface MTU (applies to calico only)"
type = string
default = "1480"
default = 1480
}
variable "network_ip_autodetection_method" {
description = "Method to autodetect the host IPv4 address (applies to calico only)"
type = string
description = "Method to autodetect the host IPv4 address (applies to calico only)"
default = "first-found"
}
variable "pod_cidr" {
description = "CIDR IPv4 range to assign Kubernetes pods"
type = string
description = "CIDR IPv4 range to assign Kubernetes pods"
default = "10.2.0.0/16"
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
type = string
default = "10.3.0.0/16"
default = "10.3.0.0/16"
}
# optional
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
}
variable "download_protocol" {
type = string
default = "https"
type = string
description = "Protocol iPXE should use to download the kernel and initrd. Defaults to https, which requires iPXE compiled with crypto support. Unused if cached_install is true."
default = "https"
}
variable "cached_install" {
type = string
default = "false"
type = bool
description = "Whether Container Linux should PXE boot and install from matchbox /assets cache. Note that the admin must have downloaded the os_version into matchbox assets."
default = false
}
variable "install_disk" {
type = string
default = "/dev/sda"
type = string
default = "/dev/sda"
description = "Disk device to which the install profiles should install Container Linux (e.g. /dev/sda)"
}
variable "container_linux_oem" {
type = string
default = ""
description = "DEPRECATED: Specify an OEM image id to use as base for the installation (e.g. ami, vmware_raw, xen) or leave blank for the default image"
}
variable "kernel_args" {
type = list(string)
description = "Additional kernel arguments to provide at PXE boot."
type = list(string)
default = []
default = []
}
variable "enable_reporting" {
type = string
type = bool
description = "Enable usage or analytics reporting to upstreams (Calico)"
default = "false"
default = false
}
variable "enable_aggregation" {
type = bool
description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
type = string
default = "false"
default = false
}
# unofficial, undocumented, unsupported
variable "cluster_domain_suffix" {
type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
default = "cluster.local"
}

View File

@ -1,7 +1,7 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.12.0"
required_version = "~> 0.12.6"
required_providers {
matchbox = "~> 0.3.0"
ct = "~> 0.3"

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.0 (upstream)
* Kubernetes v1.17.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,10 +1,10 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=539b725093c8cd94ba46603adb25ac5280562ec8"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=24e5513ee6da4888005aac02101f73fc9e5e829f"
cluster_name = var.cluster_name
api_servers = [var.k8s_domain_name]
etcd_servers = var.controller_domains
etcd_servers = var.controllers.*.domain
asset_dir = var.asset_dir
networking = var.networking
network_mtu = var.network_mtu

View File

@ -28,7 +28,7 @@ systemd:
--network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.4.0
quay.io/coreos/etcd:v3.4.3
ExecStop=/usr/bin/podman stop etcd
[Install]
WantedBy=multi-user.target
@ -81,7 +81,7 @@ systemd:
--volume /opt/cni/bin:/opt/cni/bin:z \
--volume /etc/iscsi:/etc/iscsi \
--volume /sbin/iscsiadm:/sbin/iscsiadm \
k8s.gcr.io/hyperkube:v1.16.0 /hyperkube kubelet \
k8s.gcr.io/hyperkube:v1.17.0 kubelet \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -130,10 +130,11 @@ systemd:
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/podman run --name bootstrap \
--network host \
--volume /etc/kubernetes/bootstrap-secrets:/etc/kubernetes/secrets:ro,Z \
--volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \
k8s.gcr.io/hyperkube:v1.16.0 \
/apply
--entrypoint=/apply \
k8s.gcr.io/hyperkube:v1.17.0
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap
storage:
@ -146,12 +147,33 @@ storage:
contents:
inline:
${domain_name}
- path: /opt/bootstrap/layout
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/bootstrap-secrets
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/bootstrap-secrets/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
mv auth/kubeconfig /etc/kubernetes/bootstrap-secrets/
mv tls/k8s/* /etc/kubernetes/bootstrap-secrets/
sudo mkdir -p /etc/kubernetes/manifests
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
- path: /opt/bootstrap/apply
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/assets/auth/kubeconfig
export KUBECONFIG=/etc/kubernetes/secrets/kubeconfig
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5

View File

@ -51,7 +51,7 @@ systemd:
--volume /opt/cni/bin:/opt/cni/bin:z \
--volume /etc/iscsi:/etc/iscsi \
--volume /sbin/iscsiadm:/sbin/iscsiadm \
k8s.gcr.io/hyperkube:v1.16.0 /hyperkube kubelet \
k8s.gcr.io/hyperkube:v1.17.0 kubelet \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \

View File

@ -1,22 +1,22 @@
# Match each controller or worker to a profile
resource "matchbox_group" "controller" {
count = length(var.controller_names)
name = format("%s-%s", var.cluster_name, var.controller_names[count.index])
count = length(var.controllers)
name = format("%s-%s", var.cluster_name, var.controllers.*.name[count.index])
profile = matchbox_profile.controllers.*.name[count.index]
selector = {
mac = var.controller_macs[count.index]
mac = var.controllers.*.mac[count.index]
}
}
resource "matchbox_group" "worker" {
count = length(var.worker_names)
name = format("%s-%s", var.cluster_name, var.worker_names[count.index])
count = length(var.workers)
name = format("%s-%s", var.cluster_name, var.workers.*.name[count.index])
profile = matchbox_profile.workers.*.name[count.index]
selector = {
mac = var.worker_macs[count.index]
mac = var.workers.*.mac[count.index]
}
}

View File

@ -1,36 +1,36 @@
locals {
remote_kernel = "https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-installer-kernel"
remote_initrd = "https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-installer-initramfs.img"
remote_kernel = "https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-installer-kernel-x86_64"
remote_initrd = "https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-installer-initramfs.x86_64.img"
remote_args = [
"ip=dhcp",
"rd.neednet=1",
"coreos.inst=yes",
"coreos.inst.image_url=https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-metal.raw.xz",
"coreos.inst.image_url=https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-metal.x86_64.raw.xz",
"coreos.inst.ignition_url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
"coreos.inst.install_dev=${var.install_disk}"
]
cached_kernel = "/assets/fedora-coreos/fedora-coreos-${var.os_version}-installer-kernel"
cached_initrd = "/assets/fedora-coreos/fedora-coreos-${var.os_version}-installer-initramfs.img"
cached_kernel = "/assets/fedora-coreos/fedora-coreos-${var.os_version}-installer-kernel-x86_64"
cached_initrd = "/assets/fedora-coreos/fedora-coreos-${var.os_version}-installer-initramfs.x86_64.img"
cached_args = [
"ip=dhcp",
"rd.neednet=1",
"coreos.inst=yes",
"coreos.inst.image_url=${var.matchbox_http_endpoint}/assets/fedora-coreos/fedora-coreos-${var.os_version}-metal.raw.xz",
"coreos.inst.image_url=${var.matchbox_http_endpoint}/assets/fedora-coreos/fedora-coreos-${var.os_version}-metal.x86_64.raw.xz",
"coreos.inst.ignition_url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
"coreos.inst.install_dev=${var.install_disk}"
]
kernel = var.cached_install == "true" ? local.cached_kernel : local.remote_kernel
initrd = var.cached_install == "true" ? local.cached_initrd : local.remote_initrd
args = var.cached_install == "true" ? local.cached_args : local.remote_args
kernel = var.cached_install ? local.cached_kernel : local.remote_kernel
initrd = var.cached_install ? local.cached_initrd : local.remote_initrd
args = var.cached_install ? local.cached_args : local.remote_args
}
// Fedora CoreOS controller profile
resource "matchbox_profile" "controllers" {
count = length(var.controller_names)
name = format("%s-controller-%s", var.cluster_name, var.controller_names[count.index])
count = length(var.controllers)
name = format("%s-controller-%s", var.cluster_name, var.controllers.*.name[count.index])
kernel = local.kernel
initrd = [
@ -42,20 +42,20 @@ resource "matchbox_profile" "controllers" {
}
data "ct_config" "controller-ignitions" {
count = length(var.controller_names)
count = length(var.controllers)
content = data.template_file.controller-configs.*.rendered[count.index]
strict = true
}
data "template_file" "controller-configs" {
count = length(var.controller_names)
count = length(var.controllers)
template = file("${path.module}/fcc/controller.yaml")
vars = {
domain_name = var.controller_domains[count.index]
etcd_name = var.controller_names[count.index]
etcd_initial_cluster = join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))
domain_name = var.controllers.*.domain[count.index]
etcd_name = var.controllers.*.name[count.index]
etcd_initial_cluster = join(",", formatlist("%s=https://%s:2380", var.controllers.*.name, var.controllers.*.domain))
cluster_dns_service_ip = module.bootstrap.cluster_dns_service_ip
cluster_domain_suffix = var.cluster_domain_suffix
ssh_authorized_key = var.ssh_authorized_key
@ -64,8 +64,8 @@ data "template_file" "controller-configs" {
// Fedora CoreOS worker profile
resource "matchbox_profile" "workers" {
count = length(var.worker_names)
name = format("%s-worker-%s", var.cluster_name, var.worker_names[count.index])
count = length(var.workers)
name = format("%s-worker-%s", var.cluster_name, var.workers.*.name[count.index])
kernel = local.kernel
initrd = [
@ -77,18 +77,18 @@ resource "matchbox_profile" "workers" {
}
data "ct_config" "worker-ignitions" {
count = length(var.worker_names)
count = length(var.workers)
content = data.template_file.worker-configs.*.rendered[count.index]
strict = true
}
data "template_file" "worker-configs" {
count = length(var.worker_names)
count = length(var.workers)
template = file("${path.module}/fcc/worker.yaml")
vars = {
domain_name = var.worker_domains[count.index]
domain_name = var.workers.*.domain[count.index]
cluster_dns_service_ip = module.bootstrap.cluster_dns_service_ip
cluster_domain_suffix = var.cluster_domain_suffix
ssh_authorized_key = var.ssh_authorized_key

View File

@ -1,6 +1,15 @@
locals {
# format assets for distribution
assets_bundle = [
# header with the unpack location
for key, value in module.bootstrap.assets_dist:
format("##### %s\n%s", key, value)
]
}
# Secure copy assets to controllers. Activates kubelet.service
resource "null_resource" "copy-controller-secrets" {
count = length(var.controller_names)
count = length(var.controllers)
# Without depends_on, remote-exec could start and wait for machines before
# matchbox groups are written, causing a deadlock.
@ -12,7 +21,7 @@ resource "null_resource" "copy-controller-secrets" {
connection {
type = "ssh"
host = var.controller_domains[count.index]
host = var.controllers.*.domain[count.index]
user = "core"
timeout = "60m"
}
@ -23,69 +32,21 @@ resource "null_resource" "copy-controller-secrets" {
}
provisioner "file" {
content = module.bootstrap.etcd_ca_cert
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_cert
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_key
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = module.bootstrap.etcd_server_cert
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_server_key
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_cert
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_key
destination = "$HOME/etcd-peer.key"
}
provisioner "file" {
source = var.asset_dir
content = join("\n", local.assets_bundle)
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo mv $HOME/assets /opt/bootstrap/assets",
"sudo mkdir -p /etc/kubernetes/manifests"
"sudo mkdir -p /etc/kubernetes/bootstrap-secrets",
"sudo mv $HOME/kubeconfig /etc/kubernetes/kubeconfig",
"sudo cp -r /opt/bootstrap/assets/tls/* /etc/kubernetes/bootstrap-secrets/",
"sudo cp /opt/bootstrap/assets/auth/kubeconfig /etc/kubernetes/bootstrap-secrets/",
"sudo cp -r /opt/bootstrap/assets/static-manifests/* /etc/kubernetes/manifests/"
"sudo /opt/bootstrap/layout",
]
}
}
# Secure copy kubeconfig to all workers. Activates kubelet.service
resource "null_resource" "copy-worker-secrets" {
count = length(var.worker_names)
count = length(var.workers)
# Without depends_on, remote-exec could start and wait for machines before
# matchbox groups are written, causing a deadlock.
@ -96,7 +57,7 @@ resource "null_resource" "copy-worker-secrets" {
connection {
type = "ssh"
host = var.worker_domains[count.index]
host = var.workers.*.domain[count.index]
user = "core"
timeout = "60m"
}
@ -125,7 +86,7 @@ resource "null_resource" "bootstrap" {
connection {
type = "ssh"
host = var.controller_domains[0]
host = var.controllers[0].domain
user = "core"
timeout = "15m"
}

View File

@ -22,36 +22,32 @@ variable "os_version" {
}
# machines
# Terraform's crude "type system" does not properly support lists of maps so we do this.
variable "controller_names" {
type = list(string)
description = "Ordered list of controller names (e.g. [node1])"
variable "controllers" {
type = list(object({
name = string
mac = string
domain = string
}))
description = <<EOD
List of controller machine details (unique name, identifying MAC address, FQDN)
[{ name = "node1", mac = "52:54:00:a1:9c:ae", domain = "node1.example.com"}]
EOD
}
variable "controller_macs" {
type = list(string)
description = "Ordered list of controller identifying MAC addresses (e.g. [52:54:00:a1:9c:ae])"
}
variable "controller_domains" {
type = list(string)
description = "Ordered list of controller FQDNs (e.g. [node1.example.com])"
}
variable "worker_names" {
type = list(string)
description = "Ordered list of worker names (e.g. [node2, node3])"
}
variable "worker_macs" {
type = list(string)
description = "Ordered list of worker identifying MAC addresses (e.g. [52:54:00:b2:2f:86, 52:54:00:c3:61:77])"
}
variable "worker_domains" {
type = list(string)
description = "Ordered list of worker FQDNs (e.g. [node2.example.com, node3.example.com])"
variable "workers" {
type = list(object({
name = string
mac = string
domain = string
}))
description = <<EOD
List of worker machine details (unique name, identifying MAC address, FQDN)
[
{ name = "node2", mac = "52:54:00:b2:2f:86", domain = "node2.example.com"},
{ name = "node3", mac = "52:54:00:c3:61:77", domain = "node3.example.com"}
]
EOD
}
variable "snippets" {
@ -63,8 +59,8 @@ variable "snippets" {
# configuration
variable "k8s_domain_name" {
description = "Controller DNS name which resolves to a controller instance. Workers and kubeconfig's will communicate with this endpoint (e.g. cluster.example.com)"
type = string
description = "Controller DNS name which resolves to a controller instance. Workers and kubeconfig's will communicate with this endpoint (e.g. cluster.example.com)"
}
variable "ssh_authorized_key" {
@ -73,80 +69,81 @@ variable "ssh_authorized_key" {
}
variable "asset_dir" {
description = "Path to a directory where generated assets should be placed (contains secrets)"
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "networking" {
description = "Choice of networking provider (flannel or calico)"
type = string
description = "Choice of networking provider (flannel or calico)"
default = "calico"
}
variable "network_mtu" {
type = number
description = "CNI interface MTU (applies to calico only)"
type = string
default = "1480"
default = 1480
}
variable "network_ip_autodetection_method" {
description = "Method to autodetect the host IPv4 address (applies to calico only)"
type = string
description = "Method to autodetect the host IPv4 address (applies to calico only)"
default = "first-found"
}
variable "pod_cidr" {
description = "CIDR IPv4 range to assign Kubernetes pods"
type = string
description = "CIDR IPv4 range to assign Kubernetes pods"
default = "10.2.0.0/16"
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
type = string
default = "10.3.0.0/16"
default = "10.3.0.0/16"
}
# optional
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
}
variable "cached_install" {
type = string
default = "false"
type = bool
description = "Whether Fedora CoreOS should PXE boot and install from matchbox /assets cache. Note that the admin must have downloaded the os_version into matchbox assets."
default = false
}
variable "install_disk" {
type = string
default = "sda"
type = string
description = "Disk device to install Fedora CoreOS (e.g. sda)"
default = "sda"
}
variable "kernel_args" {
type = list(string)
description = "Additional kernel arguments to provide at PXE boot."
type = list(string)
default = []
default = []
}
variable "enable_reporting" {
type = string
type = bool
description = "Enable usage or analytics reporting to upstreams (Calico)"
default = "false"
default = false
}
variable "enable_aggregation" {
type = bool
description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
type = string
default = "false"
default = false
}
# unofficial, undocumented, unsupported
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
}

View File

@ -1,7 +1,7 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.12.0"
required_version = "~> 0.12.6"
required_providers {
matchbox = "~> 0.3.0"
ct = "~> 0.4"

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.0 (upstream)
* Kubernetes v1.17.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=539b725093c8cd94ba46603adb25ac5280562ec8"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=24e5513ee6da4888005aac02101f73fc9e5e829f"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]

View File

@ -7,7 +7,7 @@ systemd:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.4.0"
Environment="ETCD_IMAGE_TAG=v3.4.3"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -118,12 +118,14 @@ systemd:
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \
--mount volume=config,target=/etc/kubernetes/secrets \
--volume assets,kind=host,source=/opt/bootstrap/assets \
--mount volume=assets,target=/assets \
--volume script,kind=host,source=/opt/bootstrap/apply \
--mount volume=script,target=/apply \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.16.0 \
docker://k8s.gcr.io/hyperkube:v1.17.0 \
--net=host \
--dns=host \
--exec=/apply
@ -138,14 +140,37 @@ storage:
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.0
KUBELET_IMAGE_TAG=v1.17.0
KUBELET_IMAGE_ARGS="--exec=/usr/local/bin/kubelet"
- path: /opt/bootstrap/layout
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/bootstrap-secrets
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/bootstrap-secrets/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
mv auth/kubeconfig /etc/kubernetes/bootstrap-secrets/
mv tls/k8s/* /etc/kubernetes/bootstrap-secrets/
sudo mkdir -p /etc/kubernetes/manifests
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/assets/auth/kubeconfig
export KUBECONFIG=/etc/kubernetes/secrets/kubeconfig
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5

View File

@ -99,7 +99,8 @@ storage:
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.16.0
KUBELET_IMAGE_TAG=v1.17.0
KUBELET_IMAGE_ARGS="--exec=/usr/local/bin/kubelet"
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
@ -117,7 +118,8 @@ storage:
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.16.0 \
docker://k8s.gcr.io/hyperkube:v1.17.0 \
--net=host \
--dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
-- \
kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)

View File

@ -73,7 +73,7 @@ resource "digitalocean_firewall" "controllers" {
port_range = "6443"
source_addresses = ["0.0.0.0/0", "::/0"]
}
# kube-scheduler metrics, kube-controller-manager metrics
inbound_rule {
protocol = "tcp"

View File

@ -12,19 +12,19 @@ output "workers_dns" {
}
output "controllers_ipv4" {
value = [digitalocean_droplet.controllers.*.ipv4_address]
value = digitalocean_droplet.controllers.*.ipv4_address
}
output "controllers_ipv6" {
value = [digitalocean_droplet.controllers.*.ipv6_address]
value = digitalocean_droplet.controllers.*.ipv6_address
}
output "workers_ipv4" {
value = [digitalocean_droplet.workers.*.ipv4_address]
value = digitalocean_droplet.workers.*.ipv4_address
}
output "workers_ipv6" {
value = [digitalocean_droplet.workers.*.ipv6_address]
value = digitalocean_droplet.workers.*.ipv6_address
}
# Outputs for custom firewalls

View File

@ -1,3 +1,12 @@
locals {
# format assets for distribution
assets_bundle = [
# header with the unpack location
for key, value in module.bootstrap.assets_dist:
format("##### %s\n%s", key, value)
]
}
# Secure copy assets to controllers. Activates kubelet.service
resource "null_resource" "copy-controller-secrets" {
count = var.controller_count
@ -20,64 +29,14 @@ resource "null_resource" "copy-controller-secrets" {
}
provisioner "file" {
content = module.bootstrap.etcd_ca_cert
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_cert
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_client_key
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = module.bootstrap.etcd_server_cert
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_server_key
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_cert
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = module.bootstrap.etcd_peer_key
destination = "$HOME/etcd-peer.key"
}
provisioner "file" {
source = var.asset_dir
content = join("\n", local.assets_bundle)
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
"sudo mv $HOME/assets /opt/bootstrap/assets",
"sudo mkdir -p /etc/kubernetes/manifests"
"sudo mkdir -p /etc/kubernetes/bootstrap-secrets",
"sudo mv $HOME/kubeconfig /etc/kubernetes/kubeconfig",
"sudo cp -r /opt/bootstrap/assets/tls/* /etc/kubernetes/bootstrap-secrets/",
"sudo cp /opt/bootstrap/assets/auth/kubeconfig /etc/kubernetes/bootstrap-secrets/",
"sudo cp -r /opt/bootstrap/assets/static-manifests/* /etc/kubernetes/manifests/",
"sudo /opt/bootstrap/layout",
]
}
}

View File

@ -18,33 +18,33 @@ variable "dns_zone" {
# instances
variable "controller_count" {
type = string
default = "1"
type = number
description = "Number of controllers (i.e. masters)"
default = 1
}
variable "worker_count" {
type = string
default = "1"
type = number
description = "Number of workers"
default = 1
}
variable "controller_type" {
type = string
default = "s-2vcpu-2gb"
description = "Droplet type for controllers (e.g. s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb)."
default = "s-2vcpu-2gb"
}
variable "worker_type" {
type = string
default = "s-1vcpu-2gb"
description = "Droplet type for workers (e.g. s-1vcpu-2gb, s-2vcpu-2gb)"
default = "s-1vcpu-2gb"
}
variable "image" {
type = string
default = "coreos-stable"
description = "Container Linux image for instances (e.g. coreos-stable)"
default = "coreos-stable"
}
variable "controller_clc_snippets" {
@ -67,48 +67,49 @@ variable "ssh_fingerprints" {
}
variable "asset_dir" {
description = "Path to a directory where generated assets should be placed (contains secrets)"
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "networking" {
description = "Choice of networking provider (flannel or calico)"
type = string
default = "flannel"
description = "Choice of networking provider (flannel or calico)"
default = "calico"
}
variable "pod_cidr" {
description = "CIDR IPv4 range to assign Kubernetes pods"
type = string
description = "CIDR IPv4 range to assign Kubernetes pods"
default = "10.2.0.0/16"
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
type = string
default = "10.3.0.0/16"
}
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
default = "10.3.0.0/16"
}
variable "enable_reporting" {
type = string
type = bool
description = "Enable usage or analytics reporting to upstreams (Calico)"
default = "false"
default = false
}
variable "enable_aggregation" {
type = bool
description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
type = string
default = "false"
default = false
}
# unofficial, undocumented, unsupported
variable "cluster_domain_suffix" {
type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
default = "cluster.local"
}

View File

@ -1,7 +1,7 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.12.0"
required_version = "~> 0.12.6"
required_providers {
digitalocean = "~> 1.3"
ct = "~> 0.3"

View File

@ -31,7 +31,7 @@ resource "google_dns_record_set" "some-application" {
name = "app.example.com."
type = "CNAME"
ttl = 300
rrdatas = ["${module.aws-tempest.ingress_dns_name}."]
rrdatas = ["${module.tempest.ingress_dns_name}."]
}
```
@ -64,7 +64,7 @@ resource "google_dns_record_set" "some-application" {
name = "app.example.com."
type = "A"
ttl = 300
rrdatas = [module.azure-ramius.ingress_static_ipv4]
rrdatas = [module.ramius.ingress_static_ipv4]
}
```
@ -120,7 +120,7 @@ resource "google_dns_record_set" "some-application" {
name = "app.example.com."
type = "CNAME"
ttl = 300
rrdatas = ["${module.digital-ocean-nemo.workers_dns}."]
rrdatas = ["${module.nemo.workers_dns}."]
}
```
@ -158,7 +158,7 @@ resource "google_dns_record_set" "app-record-a" {
name = "app.example.com."
type = "A"
ttl = 300
rrdatas = [module.google-cloud-yavin.ingress_static_ipv4]
rrdatas = [module.yavin.ingress_static_ipv4]
}
resource "google_dns_record_set" "app-record-aaaa" {
@ -169,6 +169,6 @@ resource "google_dns_record_set" "app-record-aaaa" {
name = "app.example.com."
type = "AAAA"
ttl = 300
rrdatas = [module.google-cloud-yavin.ingress_static_ipv6]
rrdatas = [module.yavin.ingress_static_ipv6]
}
```

View File

@ -72,7 +72,7 @@ Write Container Linux Configs *snippets* as files in the repository where you ke
[AWS](/cl/aws/#cluster), [Azure](/cl/azure/#cluster), [DigitalOcean](/cl/digital-ocean/#cluster), and [Google Cloud](/cl/google-cloud/#cluster) clusters allow populating a list of `controller_clc_snippets` or `worker_clc_snippets`.
```
module "digital-ocean-nemo" {
module "nemo" {
...
controller_count = 1
@ -92,7 +92,7 @@ module "digital-ocean-nemo" {
[Bare-Metal](/cl/bare-metal/#cluster) clusters allow different Container Linux snippets to be used for each node (since hardware may be heterogeneous). Populate the optional `clc_snippets` map variable with any controller or worker name keys and lists of snippets.
```
module "bare-metal-mercury" {
module "mercury" {
...
controller_names = ["node1"]
worker_names = [
@ -141,7 +141,7 @@ Container Linux Configs (and the CoreOS Ignition system) create immutable infras
Typhoon chooses variables to expose with purpose. If you must customize clusters in ways that aren't supported by input variables, fork Typhoon and maintain a repository with customizations. Reference the repository by changing the username.
```
module "digital-ocean-nemo" {
module "nemo" {
source = "git::https://github.com/USERNAME/typhoon//digital-ocean/container-linux/kubernetes?ref=myspecialcase"
...
}

View File

@ -17,13 +17,13 @@ module "tempest-worker-pool" {
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes/workers?ref=v1.14.3"
# AWS
vpc_id = module.aws-tempest.vpc_id
subnet_ids = module.aws-tempest.subnet_ids
security_groups = module.aws-tempest.worker_security_groups
vpc_id = module.tempest.vpc_id
subnet_ids = module.tempest.subnet_ids
security_groups = module.tempest.worker_security_groups
# configuration
name = "tempest-pool"
kubeconfig = module.aws-tempest.kubeconfig
kubeconfig = module.tempest.kubeconfig
ssh_authorized_key = var.ssh_authorized_key
# optional
@ -62,11 +62,14 @@ The AWS internal `workers` module supports a number of [variables](https://githu
|:-----|:------------|:--------|:--------|
| worker_count | Number of instances | 1 | 3 |
| instance_type | EC2 instance type | "t3.small" | "t3.medium" |
| os_image | AMI channel for a Container Linux derivative | coreos-stable | coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha |
| disk_size | Size of the disk in GB | 40 | 100 |
| spot_price | Spot price in USD for workers. Leave as default empty string for regular on-demand instances | "" | "0.10" |
| os_image | AMI channel for a Container Linux derivative | "coreos-stable" | coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha |
| disk_size | Size of the EBS volume in GB | 40 | 100 |
| disk_type | Type of the EBS volume | "gp2" | standard, gp2, io1 |
| disk_iops | IOPS of the EBS volume | 0 (i.e. auto) | 400 |
| spot_price | Spot price in USD for worker instances or 0 to use on-demand instances | 0 | 0.10 |
| clc_snippets | Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| service_cidr | Must match `service_cidr` of cluster | "10.3.0.0/16" | "10.3.0.0/24" |
| cluster_domain_suffix | Must match `cluster_domain_suffix` of cluster | "cluster.local" | "k8s.example.com" |
| node_labels | List of initial node labels | [] | ["worker-pool=foo"] |
Check the list of valid [instance types](https://aws.amazon.com/ec2/instance-types/) or per-region and per-type [spot prices](https://aws.amazon.com/ec2/spot/pricing/).
@ -76,18 +79,18 @@ Create a cluster following the Azure [tutorial](../cl/azure.md#cluster). Define
```tf
module "ramius-worker-pool" {
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes/workers?ref=v1.16.0"
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes/workers?ref=v1.17.0"
# Azure
region = module.azure-ramius.region
resource_group_name = module.azure-ramius.resource_group_name
subnet_id = module.azure-ramius.subnet_id
security_group_id = module.azure-ramius.security_group_id
backend_address_pool_id = module.azure-ramius.backend_address_pool_id
region = module.ramius.region
resource_group_name = module.ramius.resource_group_name
subnet_id = module.ramius.subnet_id
security_group_id = module.ramius.security_group_id
backend_address_pool_id = module.ramius.backend_address_pool_id
# configuration
name = "ramius-low-priority"
kubeconfig = module.azure-ramius.kubeconfig
kubeconfig = module.ramius.kubeconfig
ssh_authorized_key = var.ssh_authorized_key
# optional
@ -127,12 +130,12 @@ The Azure internal `workers` module supports a number of [variables](https://git
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| worker_count | Number of instances | 1 | 3 |
| vm_type | Machine type for instances | "Standard_F1" | See below |
| os_image | Channel for a Container Linux derivative | coreos-stable | coreos-stable, coreos-beta, coreos-alpha |
| priority | Set priority to Low to use reduced cost surplus capacity, with the tradeoff that instances can be deallocated at any time | Regular | Low |
| vm_type | Machine type for instances | "Standard_DS1_v2" | See below |
| os_image | Channel for a Container Linux derivative | "coreos-stable" | coreos-stable, coreos-beta, coreos-alpha |
| priority | Set priority to Low to use reduced cost surplus capacity, with the tradeoff that instances can be deallocated at any time | "Regular" | "Low" |
| clc_snippets | Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by coredns. | "cluster.local" | "k8s.example.com" |
| node_labels | List of initial node labels | [] | ["worker-pool=foo"] |
Check the list of valid [machine types](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/) and their [specs](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/sizes-general). Use `az vm list-skus` to get the identifier.
@ -142,16 +145,16 @@ Create a cluster following the Google Cloud [tutorial](../cl/google-cloud.md#clu
```tf
module "yavin-worker-pool" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes/workers?ref=v1.16.0"
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes/workers?ref=v1.17.0"
# Google Cloud
region = "europe-west2"
network = module.google-cloud-yavin.network_name
network = module.yavin.network_name
cluster_name = "yavin"
# configuration
name = "yavin-16x"
kubeconfig = module.google-cloud-yavin.kubeconfig
kubeconfig = module.yavin.kubeconfig
ssh_authorized_key = var.ssh_authorized_key
# optional
@ -173,11 +176,11 @@ Verify a managed instance group of workers joins the cluster within a few minute
```
$ kubectl get nodes
NAME STATUS AGE VERSION
yavin-controller-0.c.example-com.internal Ready 6m v1.16.0
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.16.0
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.16.0
yavin-16x-worker-jrbf.c.example-com.internal Ready 3m v1.16.0
yavin-16x-worker-mzdm.c.example-com.internal Ready 3m v1.16.0
yavin-controller-0.c.example-com.internal Ready 6m v1.17.0
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.17.0
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.17.0
yavin-16x-worker-jrbf.c.example-com.internal Ready 3m v1.17.0
yavin-16x-worker-mzdm.c.example-com.internal Ready 3m v1.17.0
```
### Variables
@ -189,9 +192,9 @@ The Google Cloud internal `workers` module supports a number of [variables](http
| Name | Description | Example |
|:-----|:------------|:--------|
| name | Unique name (distinct from cluster name) | "yavin-16x" |
| cluster_name | Must be set to `cluster_name` of cluster | "yavin" |
| region | Region for the worker pool instances. May differ from the cluster's region | "europe-west2" |
| network | Must be set to `network_name` output by cluster | module.cluster.network_name |
| cluster_name | Must be set to `cluster_name` of cluster | "yavin" |
| kubeconfig | Must be set to `kubeconfig` output by cluster | module.cluster.kubeconfig |
| ssh_authorized_key | SSH public key for user 'core' | "ssh-rsa AAAAB3NZ..." |
@ -206,8 +209,9 @@ Check the list of regions [docs](https://cloud.google.com/compute/docs/regions-z
| os_image | Container Linux image for compute instances | "coreos-stable" | "coreos-alpha", "coreos-beta" |
| disk_size | Size of the disk in GB | 40 | 100 |
| preemptible | If true, Compute Engine will terminate instances randomly within 24 hours | false | true |
| clc_snippets | Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| service_cidr | Must match `service_cidr` of cluster | "10.3.0.0/16" | "10.3.0.0/24" |
| cluster_domain_suffix | Must match `cluster_domain_suffix` of cluster | "cluster.local" | "k8s.example.com" |
| node_labels | List of initial node labels | [] | ["worker-pool=foo"] |
Check the list of valid [machine types](https://cloud.google.com/compute/docs/machine-types).

View File

@ -47,7 +47,7 @@ Terraform [modules](https://www.terraform.io/docs/modules/usage.html) allow a co
Clusters are declared in Terraform by referencing the module.
```tf
module "google-cloud-yavin" {
module "yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes"
cluster_name = "yavin"
...

View File

@ -1,6 +1,6 @@
# AWS
In this tutorial, we'll create a Kubernetes v1.16.0 cluster on AWS with Container Linux.
In this tutorial, we'll create a Kubernetes v1.17.0 cluster on AWS with Container Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a VPC, gateway, subnets, security groups, controller instances, worker auto-scaling group, network load balancer, and TLS assets.
@ -10,15 +10,15 @@ Controller hosts are provisioned to run an `etcd-member` peer and a `kubelet` se
* AWS Account and IAM credentials
* AWS Route53 DNS Zone (registered Domain Name or delegated subdomain)
* Terraform v0.12.x and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
* Terraform v0.12.6+ and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
## Terraform Setup
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.x on your system.
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your system.
```sh
$ terraform version
Terraform v0.12.7
Terraform v0.12.16
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
@ -49,7 +49,7 @@ Configure the AWS provider to use your access key credentials in a `providers.tf
```tf
provider "aws" {
version = "2.29.0"
version = "2.41.0"
region = "eu-central-1"
shared_credentials_file = "/home/user/.config/aws/credentials"
}
@ -70,7 +70,7 @@ Define a Kubernetes cluster using the module `aws/container-linux/kubernetes`.
```tf
module "tempest" {
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.16.0"
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.17.0"
# AWS
cluster_name = "tempest"
@ -79,7 +79,6 @@ module "tempest" {
# configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
asset_dir = "/home/user/.secrets/clusters/tempest"
# optional
worker_count = 2
@ -110,7 +109,7 @@ Plan the resources to be created.
```sh
$ terraform plan
Plan: 98 to add, 0 to change, 0 to destroy.
Plan: 80 to add, 0 to change, 0 to destroy.
```
Apply the changes to create the cluster.
@ -118,9 +117,9 @@ Apply the changes to create the cluster.
```sh
$ terraform apply
...
module.aws-tempest.null_resource.bootstrap: Still creating... (4m50s elapsed)
module.aws-tempest.null_resource.bootstrap: Still creating... (5m0s elapsed)
module.aws-tempest.null_resource.bootstrap: Creation complete after 11m8s (ID: 3961816482286168143)
module.tempest.null_resource.bootstrap: Still creating... (4m50s elapsed)
module.tempest.null_resource.bootstrap: Still creating... (5m0s elapsed)
module.tempest.null_resource.bootstrap: Creation complete after 11m8s (ID: 3961816482286168143)
Apply complete! Resources: 98 added, 0 changed, 0 destroyed.
```
@ -129,15 +128,24 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
## Verify
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Obtain the generated cluster `kubeconfig` from module outputs (e.g. write to a local file).
```
$ export KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
resource "local_file" "kubeconfig-tempest" {
content = module.tempest.kubeconfig-admin
filename = "/home/user/.kube/configs/tempest-config"
}
```
List nodes in the cluster.
```
$ export KUBECONFIG=/home/user/.kube/configs/tempest-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-3-155 Ready <none> 10m v1.16.0
ip-10-0-26-65 Ready <none> 10m v1.16.0
ip-10-0-41-21 Ready <none> 10m v1.16.0
ip-10-0-3-155 Ready <none> 10m v1.17.0
ip-10-0-26-65 Ready <none> 10m v1.17.0
ip-10-0-41-21 Ready <none> 10m v1.17.0
```
List the pods.
@ -177,7 +185,6 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/aws/con
| dns_zone | AWS Route53 DNS zone | "aws.example.com" |
| dns_zone_id | AWS Route53 DNS zone id | "Z3PAABBCFAKEC0" |
| ssh_authorized_key | SSH public key for user 'core' | "ssh-rsa AAAAB3NZ..." |
| asset_dir | Path to a directory where generated assets should be placed (contains secrets) | "/home/user/.secrets/clusters/tempest" |
#### DNS Zone
@ -191,7 +198,7 @@ resource "aws_route53_zone" "zone-for-clusters" {
}
```
Reference the DNS zone id with `"${aws_route53_zone.zone-for-clusters.zone_id}"`.
Reference the DNS zone id with `aws_route53_zone.zone-for-clusters.zone_id`.
!!! tip ""
If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on Route53 (e.g. aws.mydomain.com) and [update nameservers](http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/SOA-NSrecords.html).
@ -200,16 +207,17 @@ Reference the DNS zone id with `"${aws_route53_zone.zone-for-clusters.zone_id}"`
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/tempest" |
| controller_count | Number of controllers (i.e. masters) | 1 | 1 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | EC2 instance type for controllers | "t3.small" | See below |
| worker_type | EC2 instance type for workers | "t3.small" | See below |
| os_image | AMI channel for a Container Linux derivative | coreos-stable | coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge |
| disk_size | Size of the EBS volume in GB | "40" | "100" |
| disk_size | Size of the EBS volume in GB | 40 | 100 |
| disk_type | Type of the EBS volume | "gp2" | standard, gp2, io1 |
| disk_iops | IOPS of the EBS volume | "0" (i.e. auto) | "400" |
| worker_target_groups | Target group ARNs to which worker instances should be added | [] | ["${aws_lb_target_group.app.id}"] |
| worker_price | Spot price in USD for workers. Leave as default empty string for regular on-demand instances | "" | "0.10" |
| disk_iops | IOPS of the EBS volume | 0 (i.e. auto) | 400 |
| worker_target_groups | Target group ARNs to which worker instances should be added | [] | [aws_lb_target_group.app.id] |
| worker_price | Spot price in USD for worker instances or 0 to use on-demand instances | 0/null | 0.10 |
| controller_clc_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/) |
| worker_clc_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/) |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
@ -217,7 +225,7 @@ Reference the DNS zone id with `"${aws_route53_zone.zone-for-clusters.zone_id}"`
| host_cidr | CIDR IPv4 range to assign to EC2 instances | "10.0.0.0/16" | "10.1.0.0/16" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by coredns. | "cluster.local" | "k8s.example.com" |
| worker_node_labels | List of initial worker node labels | [] | ["worker-pool=default"] |
Check the list of valid [instance types](https://aws.amazon.com/ec2/instance-types/).

View File

@ -3,7 +3,7 @@
!!! danger
Typhoon for Azure is alpha. For production, use AWS, Google Cloud, or bare-metal. As Azure matures, check [errata](https://github.com/poseidon/typhoon/wiki/Errata) for known shortcomings.
In this tutorial, we'll create a Kubernetes v1.16.0 cluster on Azure with Container Linux.
In this tutorial, we'll create a Kubernetes v1.17.0 cluster on Azure with Container Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a resource group, virtual network, subnets, security groups, controller availability set, worker scale set, load balancer, and TLS assets.
@ -13,15 +13,15 @@ Controller hosts are provisioned to run an `etcd-member` peer and a `kubelet` se
* Azure account
* Azure DNS Zone (registered Domain Name or delegated subdomain)
* Terraform v0.12.x and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
* Terraform v0.12.6+ and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
## Terraform Setup
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.x on your system.
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your system.
```sh
$ terraform version
Terraform v0.12.7
Terraform v0.12.16
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
@ -50,7 +50,7 @@ Configure the Azure provider in a `providers.tf` file.
```tf
provider "azurerm" {
version = "1.34.0"
version = "1.38.0"
}
provider "ct" {
@ -66,7 +66,7 @@ Define a Kubernetes cluster using the module `azure/container-linux/kubernetes`.
```tf
module "ramius" {
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes?ref=v1.16.0"
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes?ref=v1.17.0"
# Azure
cluster_name = "ramius"
@ -76,7 +76,6 @@ module "ramius" {
# configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
asset_dir = "/home/user/.secrets/clusters/ramius"
# optional
worker_count = 2
@ -115,26 +114,35 @@ Apply the changes to create the cluster.
```sh
$ terraform apply
...
module.azure-ramius.null_resource.bootstrap: Still creating... (6m50s elapsed)
module.azure-ramius.null_resource.bootstrap: Still creating... (7m0s elapsed)
module.azure-ramius.null_resource.bootstrap: Creation complete after 7m8s (ID: 3961816482286168143)
module.ramius.null_resource.bootstrap: Still creating... (6m50s elapsed)
module.ramius.null_resource.bootstrap: Still creating... (7m0s elapsed)
module.ramius.null_resource.bootstrap: Creation complete after 7m8s (ID: 3961816482286168143)
Apply complete! Resources: 86 added, 0 changed, 0 destroyed.
Apply complete! Resources: 69 added, 0 changed, 0 destroyed.
```
In 4-8 minutes, the Kubernetes cluster will be ready.
## Verify
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Obtain the generated cluster `kubeconfig` from module outputs (e.g. write to a local file).
```
$ export KUBECONFIG=/home/user/.secrets/clusters/ramius/auth/kubeconfig
resource "local_file" "kubeconfig-ramius" {
content = module.ramius.kubeconfig-admin
filename = "/home/user/.kube/configs/ramius-config"
}
```
List nodes in the cluster.
```
$ export KUBECONFIG=/home/user/.kube/configs/ramius-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ramius-controller-0 Ready <none> 24m v1.16.0
ramius-worker-000001 Ready <none> 25m v1.16.0
ramius-worker-000002 Ready <none> 24m v1.16.0
ramius-controller-0 Ready <none> 24m v1.17.0
ramius-worker-000001 Ready <none> 25m v1.17.0
ramius-worker-000002 Ready <none> 24m v1.17.0
```
List the pods.
@ -144,9 +152,9 @@ $ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7c6fbb4f4b-b6qzx 1/1 Running 0 26m
kube-system coredns-7c6fbb4f4b-j2k3d 1/1 Running 0 26m
kube-system flannel-bwf24 2/2 Running 0 26m
kube-system flannel-ks5qb 2/2 Running 0 26m
kube-system flannel-tq2wg 2/2 Running 0 26m
kube-system calico-node-1m5bf 2/2 Running 0 26m
kube-system calico-node-7jmr1 2/2 Running 0 26m
kube-system calico-node-bknc8 2/2 Running 0 26m
kube-system kube-apiserver-ramius-controller-0 1/1 Running 0 26m
kube-system kube-controller-manager-ramius-controller-0 1/1 Running 0 26m
kube-system kube-proxy-j4vpq 1/1 Running 0 26m
@ -175,7 +183,6 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/azure/c
| dns_zone | Azure DNS zone | "azure.example.com" |
| dns_zone_group | Resource group where the Azure DNS zone resides | "global" |
| ssh_authorized_key | SSH public key for user 'core' | "ssh-rsa AAAAB3NZ..." |
| asset_dir | Path to a directory where generated assets should be placed (contains secrets) | "/home/user/.secrets/clusters/ramius" |
!!! tip
Regions are shown in [docs](https://azure.microsoft.com/en-us/global-infrastructure/regions/) or with `az account list-locations --output table`.
@ -195,14 +202,14 @@ resource "azurerm_resource_group" "global" {
# DNS zone for clusters
resource "azurerm_dns_zone" "clusters" {
resource_group_name = "${azurerm_resource_group.global.name}"
resource_group_name = azurerm_resource_group.global.name
name = "azure.example.com"
zone_type = "Public"
}
```
Reference the DNS zone with `"${azurerm_dns_zone.clusters.name}"` and its resource group with `"${azurerm_resource_group.global.name}"`.
Reference the DNS zone with `azurerm_dns_zone.clusters.name` and its resource group with `"azurerm_resource_group.global.name`.
!!! tip ""
If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on Azure DNS (e.g. azure.mydomain.com) and [update nameservers](https://docs.microsoft.com/en-us/azure/dns/dns-delegate-domain-azure-dns).
@ -211,20 +218,21 @@ Reference the DNS zone with `"${azurerm_dns_zone.clusters.name}"` and its resour
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/ramius" |
| controller_count | Number of controllers (i.e. masters) | 1 | 1 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | Machine type for controllers | "Standard_DS1_v2" | See below |
| worker_type | Machine type for workers | "Standard_F1" | See below |
| os_image | Channel for a Container Linux derivative | coreos-stable | coreos-stable, coreos-beta, coreos-alpha |
| disk_size | Size of the disk in GB | "40" | "100" |
| controller_type | Machine type for controllers | "Standard_B2s" | See below |
| worker_type | Machine type for workers | "Standard_DS1_v2" | See below |
| os_image | Channel for a Container Linux derivative | "coreos-stable" | coreos-stable, coreos-beta, coreos-alpha |
| disk_size | Size of the disk in GB | 40 | 100 |
| worker_priority | Set priority to Low to use reduced cost surplus capacity, with the tradeoff that instances can be deallocated at any time | Regular | Low |
| controller_clc_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| worker_clc_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| networking | Choice of networking provider | "flannel" | "flannel" or "calico" (experimental) |
| networking | Choice of networking provider | "calico" | "flannel" or "calico" |
| host_cidr | CIDR IPv4 range to assign to instances | "10.0.0.0/16" | "10.0.0.0/20" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by coredns. | "cluster.local" | "k8s.example.com" |
| worker_node_labels | List of initial worker node labels | [] | ["worker-pool=default"] |
Check the list of valid [machine types](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/) and their [specs](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/sizes-general). Use `az vm list-skus` to get the identifier.
@ -232,7 +240,7 @@ Check the list of valid [machine types](https://azure.microsoft.com/en-us/pricin
Unlike AWS and GCP, Azure requires its *virtual* networks to have non-overlapping IPv4 CIDRs (yeah, go figure). Instead of each cluster just using `10.0.0.0/16` for instances, each Azure cluster's `host_cidr` must be non-overlapping (e.g. 10.0.0.0/20 for the 1st cluster, 10.0.16.0/20 for the 2nd cluster, etc).
!!! warning
Do not choose a `controller_type` smaller than `Standard_DS1_v2`. Smaller instances are not sufficient for running a controller.
Do not choose a `controller_type` smaller than `Standard_B2s`. Smaller instances are not sufficient for running a controller.
#### Low Priority

View File

@ -1,6 +1,6 @@
# Bare-Metal
In this tutorial, we'll network boot and provision a Kubernetes v1.16.0 cluster on bare-metal with Container Linux.
In this tutorial, we'll network boot and provision a Kubernetes v1.17.0 cluster on bare-metal with Container Linux.
First, we'll deploy a [Matchbox](https://github.com/poseidon/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Container Linux to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers via Ignition.
@ -12,7 +12,7 @@ Controller hosts are provisioned to run an `etcd-member` peer and a `kubelet` se
* PXE-enabled [network boot](https://coreos.com/matchbox/docs/latest/network-setup.html) environment (with HTTPS support)
* Matchbox v0.6+ deployment with API enabled
* Matchbox credentials `client.crt`, `client.key`, `ca.crt`
* Terraform v0.12.x, [terraform-provider-matchbox](https://github.com/poseidon/terraform-provider-matchbox), and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
* Terraform v0.12.6+, [terraform-provider-matchbox](https://github.com/poseidon/terraform-provider-matchbox), and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
## Machines
@ -107,11 +107,11 @@ Read about the [many ways](https://coreos.com/matchbox/docs/latest/network-setup
## Terraform Setup
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.x on your system.
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your system.
```sh
$ terraform version
Terraform v0.12.7
Terraform v0.12.16
```
Add the [terraform-provider-matchbox](https://github.com/poseidon/terraform-provider-matchbox) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
@ -144,9 +144,9 @@ Configure the Matchbox provider to use your Matchbox API endpoint and client cer
provider "matchbox" {
version = "0.3.0"
endpoint = "matchbox.example.com:8081"
client_cert = "${file("~/.config/matchbox/client.crt")}"
client_key = "${file("~/.config/matchbox/client.key")}"
ca = "${file("~/.config/matchbox/ca.crt")}"
client_cert = file("~/.config/matchbox/client.crt")
client_key = file("~/.config/matchbox/client.key")
ca = file("~/.config/matchbox/ca.crt")
}
provider "ct" {
@ -159,35 +159,36 @@ provider "ct" {
Define a Kubernetes cluster using the module `bare-metal/container-linux/kubernetes`.
```tf
module "bare-metal-mercury" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes?ref=v1.16.0"
module "mercury" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes?ref=v1.17.0"
# bare-metal
cluster_name = "mercury"
matchbox_http_endpoint = "http://matchbox.example.com"
os_channel = "coreos-stable"
os_version = "1632.3.0"
os_version = "2191.5.0"
# configuration
k8s_domain_name = "node1.example.com"
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
asset_dir = "/home/user/.secrets/clusters/mercury"
# machines
controller_names = ["node1"]
controller_macs = ["52:54:00:a1:9c:ae"]
controller_domains = ["node1.example.com"]
worker_names = [
"node2",
"node3",
]
worker_macs = [
"52:54:00:b2:2f:86",
"52:54:00:c3:61:77",
]
worker_domains = [
"node2.example.com",
"node3.example.com",
controllers = [{
name = "node1"
mac = "52:54:00:a1:9c:ae"
domain = "node1.example.com"
}]
workers = [
{
name = "node2",
mac = "52:54:00:b2:2f:86"
domain = "node2.example.com"
},
{
name = "node3",
mac = "52:54:00:c3:61:77"
domain = "node3.example.com"
}
]
# set to http only if you cannot chainload to iPXE firmware with https support
@ -221,12 +222,12 @@ $ terraform plan
Plan: 55 to add, 0 to change, 0 to destroy.
```
Apply the changes. Terraform will generate bootstrap assets to `asset_dir` and create Matchbox profiles (e.g. controller, worker) and matching rules via the Matchbox API.
Apply the changes. Terraform will generate bootstrap assets and create Matchbox profiles (e.g. controller, worker) and matching rules via the Matchbox API.
```sh
$ terraform apply
module.bare-metal-mercury.null_resource.copy-controller-secrets.0: Still creating... (10s elapsed)
module.bare-metal-mercury.null_resource.copy-worker-secrets.0: Still creating... (10s elapsed)
module.mercury.null_resource.copy-controller-secrets.0: Still creating... (10s elapsed)
module.mercury.null_resource.copy-worker-secrets.0: Still creating... (10s elapsed)
...
```
@ -251,11 +252,11 @@ Machines will network boot, install Container Linux to disk, reboot into the dis
Wait for the `bootstrap` step to finish bootstrapping the Kubernetes control plane. This may take 5-15 minutes depending on your network.
```
module.bare-metal-mercury.null_resource.bootstrap: Still creating... (6m10s elapsed)
module.bare-metal-mercury.null_resource.bootstrap: Still creating... (6m20s elapsed)
module.bare-metal-mercury.null_resource.bootstrap: Still creating... (6m30s elapsed)
module.bare-metal-mercury.null_resource.bootstrap: Still creating... (6m40s elapsed)
module.bare-metal-mercury.null_resource.bootstrap: Creation complete (ID: 5441741360626669024)
module.mercury.null_resource.bootstrap: Still creating... (6m10s elapsed)
module.mercury.null_resource.bootstrap: Still creating... (6m20s elapsed)
module.mercury.null_resource.bootstrap: Still creating... (6m30s elapsed)
module.mercury.null_resource.bootstrap: Still creating... (6m40s elapsed)
module.mercury.null_resource.bootstrap: Creation complete (ID: 5441741360626669024)
Apply complete! Resources: 55 added, 0 changed, 0 destroyed.
```
@ -263,9 +264,9 @@ Apply complete! Resources: 55 added, 0 changed, 0 destroyed.
To watch the install to disk (until machines reboot from disk), SSH to port 2222.
```
# before v1.16.0
# before v1.17.0
$ ssh debug@node1.example.com
# after v1.16.0
# after v1.17.0
$ ssh -p 2222 core@node1.example.com
```
@ -274,24 +275,33 @@ To watch the bootstrap process in detail, SSH to the first controller and journa
```
$ ssh core@node1.example.com
$ journalctl -f -u bootstrap
podman[1750]: The connection to the server cluster.example.com:6443 was refused - did you specify the right host or port?
podman[1750]: Waiting for static pod control plane
rkt[1750]: The connection to the server cluster.example.com:6443 was refused - did you specify the right host or port?
rkt[1750]: Waiting for static pod control plane
...
podman[1750]: serviceaccount/calico-node unchanged
rkt[1750]: serviceaccount/calico-node unchanged
systemd[1]: Started Kubernetes control plane.
```
## Verify
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Obtain the generated cluster `kubeconfig` from module outputs (e.g. write to a local file).
```
$ export KUBECONFIG=/home/user/.secrets/clusters/mercury/auth/kubeconfig
resource "local_file" "kubeconfig-mercury" {
content = module.mercury.kubeconfig-admin
filename = "/home/user/.kube/configs/mercury-config"
}
```
List nodes in the cluster.
```
$ export KUBECONFIG=/home/user/.kube/configs/mercury-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1.example.com Ready <none> 10m v1.16.0
node2.example.com Ready <none> 10m v1.16.0
node3.example.com Ready <none> 10m v1.16.0
node1.example.com Ready <none> 10m v1.17.0
node2.example.com Ready <none> 10m v1.17.0
node3.example.com Ready <none> 10m v1.17.0
```
List the pods.
@ -327,33 +337,28 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/bare-me
| Name | Description | Example |
|:-----|:------------|:--------|
| cluster_name | Unique cluster name | mercury |
| matchbox_http_endpoint | Matchbox HTTP read-only endpoint | http://matchbox.example.com:port |
| cluster_name | Unique cluster name | "mercury" |
| matchbox_http_endpoint | Matchbox HTTP read-only endpoint | "http://matchbox.example.com:port" |
| os_channel | Channel for a Container Linux derivative | coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge |
| os_version | Version for a Container Linux derivative to PXE and install | 1632.3.0 |
| os_version | Version for a Container Linux derivative to PXE and install | "1632.3.0" |
| k8s_domain_name | FQDN resolving to the controller(s) nodes. Workers and kubectl will communicate with this endpoint | "myk8s.example.com" |
| ssh_authorized_key | SSH public key for user 'core' | "ssh-rsa AAAAB3Nz..." |
| asset_dir | Path to a directory where generated assets should be placed (contains secrets) | "/home/user/.secrets/clusters/mercury" |
| controller_names | Ordered list of controller short names | ["node1"] |
| controller_macs | Ordered list of controller identifying MAC addresses | ["52:54:00:a1:9c:ae"] |
| controller_domains | Ordered list of controller FQDNs | ["node1.example.com"] |
| worker_names | Ordered list of worker short names | ["node2", "node3"] |
| worker_macs | Ordered list of worker identifying MAC addresses | ["52:54:00:b2:2f:86", "52:54:00:c3:61:77"] |
| worker_domains | Ordered list of worker FQDNs | ["node2.example.com", "node3.example.com"] |
| controllers | List of controller machine detail objects (unique name, identifying MAC address, FQDN) | `[{name="node1", mac="52:54:00:a1:9c:ae", domain="node1.example.com"}]` |
| workers | List of worker machine detail objects (unique name, identifying MAC address, FQDN) | `[{name="node2", mac="52:54:00:b2:2f:86", domain="node2.example.com"}, {name="node3", mac="52:54:00:c3:61:77", domain="node3.example.com"}]` |
### Optional
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/mercury" |
| download_protocol | Protocol iPXE uses to download the kernel and initrd. iPXE must be compiled with [crypto](https://ipxe.org/crypto) support for https. Unused if cached_install is true | "https" | "http" |
| cached_install | PXE boot and install from the Matchbox `/assets` cache. Admin MUST have downloaded Container Linux or Flatcar images into the cache | false | true |
| install_disk | Disk device where Container Linux should be installed | "/dev/sda" | "/dev/sdb" |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
| network_mtu | CNI interface MTU (calico-only) | 1480 | - |
| clc_snippets | Map from machine names to lists of Container Linux Config snippets | {} | [example](/advanced/customization/#usage) |
| network_ip_autodetection_method | Method to detect host IPv4 address (calico-only) | first-found | can-reach=10.0.0.1 |
| network_ip_autodetection_method | Method to detect host IPv4 address (calico-only) | "first-found" | "can-reach=10.0.0.1" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by coredns. | "cluster.local" | "k8s.example.com" |
| kernel_args | Additional kernel args to provide at PXE boot | [] | "kvm-intel.nested=1" |
| kernel_args | Additional kernel args to provide at PXE boot | [] | ["kvm-intel.nested=1"] |

View File

@ -1,6 +1,6 @@
# Digital Ocean
In this tutorial, we'll create a Kubernetes v1.16.0 cluster on DigitalOcean with Container Linux.
In this tutorial, we'll create a Kubernetes v1.17.0 cluster on DigitalOcean with Container Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create controller droplets, worker droplets, DNS records, tags, and TLS assets.
@ -10,15 +10,15 @@ Controller hosts are provisioned to run an `etcd-member` peer and a `kubelet` se
* Digital Ocean Account and Token
* Digital Ocean Domain (registered Domain Name or delegated subdomain)
* Terraform v0.12.x and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
* Terraform v0.12.6+ and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
## Terraform Setup
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.x on your system.
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your system.
```sh
$ terraform version
Terraform v0.12.7
Terraform v0.12.16
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
@ -50,7 +50,7 @@ Configure the DigitalOcean provider to use your token in a `providers.tf` file.
```tf
provider "digitalocean" {
version = "1.7.0"
version = "1.11.0"
token = "${chomp(file("~/.config/digital-ocean/token"))}"
}
@ -64,8 +64,8 @@ provider "ct" {
Define a Kubernetes cluster using the module `digital-ocean/container-linux/kubernetes`.
```tf
module "digital-ocean-nemo" {
source = "git::https://github.com/poseidon/typhoon//digital-ocean/container-linux/kubernetes?ref=v1.16.0"
module "nemo" {
source = "git::https://github.com/poseidon/typhoon//digital-ocean/container-linux/kubernetes?ref=v1.17.0"
# Digital Ocean
cluster_name = "nemo"
@ -74,7 +74,6 @@ module "digital-ocean-nemo" {
# configuration
ssh_fingerprints = ["d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7"]
asset_dir = "/home/user/.secrets/clusters/nemo"
# optional
worker_count = 2
@ -111,28 +110,37 @@ Apply the changes to create the cluster.
```sh
$ terraform apply
module.digital-ocean-nemo.null_resource.bootstrap: Still creating... (30s elapsed)
module.digital-ocean-nemo.null_resource.bootstrap: Provisioning with 'remote-exec'...
module.nemo.null_resource.bootstrap: Still creating... (30s elapsed)
module.nemo.null_resource.bootstrap: Provisioning with 'remote-exec'...
...
module.digital-ocean-nemo.null_resource.bootstrap: Still creating... (6m20s elapsed)
module.digital-ocean-nemo.null_resource.bootstrap: Creation complete (ID: 7599298447329218468)
module.nemo.null_resource.bootstrap: Still creating... (6m20s elapsed)
module.nemo.null_resource.bootstrap: Creation complete (ID: 7599298447329218468)
Apply complete! Resources: 54 added, 0 changed, 0 destroyed.
Apply complete! Resources: 42 added, 0 changed, 0 destroyed.
```
In 3-6 minutes, the Kubernetes cluster will be ready.
## Verify
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Obtain the generated cluster `kubeconfig` from module outputs (e.g. write to a local file).
```
$ export KUBECONFIG=/home/user/.secrets/clusters/nemo/auth/kubeconfig
resource "local_file" "kubeconfig-nemo" {
content = module.nemo.kubeconfig-admin
filename = "/home/user/.kube/configs/nemo-config"
}
```
List nodes in the cluster.
```
$ export KUBECONFIG=/home/user/.kube/configs/nemo-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
10.132.110.130 Ready <none> 10m v1.16.0
10.132.115.81 Ready <none> 10m v1.16.0
10.132.124.107 Ready <none> 10m v1.16.0
10.132.110.130 Ready <none> 10m v1.17.0
10.132.115.81 Ready <none> 10m v1.17.0
10.132.124.107 Ready <none> 10m v1.17.0
```
List the pods.
@ -141,9 +149,9 @@ List the pods.
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-1187388186-ld1j7 1/1 Running 0 11m
kube-system coredns-1187388186-rdhf7 1/1 Running 0 11m
kube-system flannel-1cq1v 2/2 Running 0 11m
kube-system flannel-hq9t0 2/2 Running 0 11m
kube-system flannel-v0g9w 2/2 Running 0 11m
kube-system calico-node-1m5bf 2/2 Running 0 11m
kube-system calico-node-7jmr1 2/2 Running 0 11m
kube-system calico-node-bknc8 2/2 Running 0 11m
kube-system kube-apiserver-ip-10.132.115.81 1/1 Running 0 11m
kube-system kube-controller-manager-ip-10.132.115.81 1/1 Running 0 11m
kube-system kube-proxy-6kxjf 1/1 Running 0 11m
@ -167,11 +175,10 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/digital
| Name | Description | Example |
|:-----|:------------|:--------|
| cluster_name | Unique cluster name (prepended to dns_zone) | nemo |
| region | Digital Ocean region | nyc1, sfo2, fra1, tor1 |
| dns_zone | Digital Ocean domain (i.e. DNS zone) | do.example.com |
| cluster_name | Unique cluster name (prepended to dns_zone) | "nemo" |
| region | Digital Ocean region | "nyc1", "sfo2", "fra1", tor1" |
| dns_zone | Digital Ocean domain (i.e. DNS zone) | "do.example.com" |
| ssh_fingerprints | SSH public key fingerprints | ["d7:9d..."] |
| asset_dir | Path to a directory where generated assets should be placed (contains secrets) | /home/user/.secrets/nemo |
#### DNS Zone
@ -212,17 +219,17 @@ Digital Ocean requires the SSH public key be uploaded to your account, so you ma
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/nemo" |
| controller_count | Number of controllers (i.e. masters) | 1 | 1 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | Droplet type for controllers | s-2vcpu-2gb | s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb, ... |
| worker_type | Droplet type for workers | s-1vcpu-2gb | s-1vcpu-2gb, s-2vcpu-2gb, ... |
| controller_type | Droplet type for controllers | "s-2vcpu-2gb" | s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb, ... |
| worker_type | Droplet type for workers | "s-1vcpu-2gb" | s-1vcpu-2gb, s-2vcpu-2gb, ... |
| image | Container Linux image for instances | "coreos-stable" | coreos-stable, coreos-beta, coreos-alpha |
| controller_clc_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/) |
| worker_clc_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/) |
| networking | Choice of networking provider | "flannel" | "flannel" or "calico" (experimental) |
| networking | Choice of networking provider | "calico" | "flannel" or "calico" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by coredns. | "cluster.local" | "k8s.example.com" |
Check the list of valid [droplet types](https://developers.digitalocean.com/documentation/changelog/api-v2/new-size-slugs-for-droplet-plan-changes/) or use `doctl compute size list`.

View File

@ -1,6 +1,6 @@
# Google Cloud
In this tutorial, we'll create a Kubernetes v1.16.0 cluster on Google Compute Engine with Container Linux.
In this tutorial, we'll create a Kubernetes v1.17.0 cluster on Google Compute Engine with Container Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a network, firewall rules, health checks, controller instances, worker managed instance group, load balancers, and TLS assets.
@ -10,15 +10,15 @@ Controller hosts are provisioned to run an `etcd-member` peer and a `kubelet` se
* Google Cloud Account and Service Account
* Google Cloud DNS Zone (registered Domain Name or delegated subdomain)
* Terraform v0.12.x and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
* Terraform v0.12.6+ and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
## Terraform Setup
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.x on your system.
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your system.
```sh
$ terraform version
Terraform v0.12.7
Terraform v0.12.16
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
@ -49,10 +49,10 @@ Configure the Google Cloud provider to use your service account key, project-id,
```tf
provider "google" {
version = "2.15.0"
version = "2.20.0"
project = "project-id"
region = "us-central1"
credentials = "${file("~/.config/google-cloud/terraform.json")}"
credentials = file("~/.config/google-cloud/terraform.json")
}
provider "ct" {
@ -70,8 +70,8 @@ Additional configuration options are described in the `google` provider [docs](h
Define a Kubernetes cluster using the module `google-cloud/container-linux/kubernetes`.
```tf
module "google-cloud-yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.16.0"
module "yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.17.0"
# Google Cloud
cluster_name = "yavin"
@ -81,7 +81,6 @@ module "google-cloud-yavin" {
# configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
asset_dir = "/home/user/.secrets/clusters/yavin"
# optional
worker_count = 2
@ -118,28 +117,37 @@ Apply the changes to create the cluster.
```sh
$ terraform apply
module.google-cloud-yavin.null_resource.bootstrap: Still creating... (10s elapsed)
module.yavin.null_resource.bootstrap: Still creating... (10s elapsed)
...
module.google-cloud-yavin.null_resource.bootstrap: Still creating... (5m30s elapsed)
module.google-cloud-yavin.null_resource.bootstrap: Still creating... (5m40s elapsed)
module.google-cloud-yavin.null_resource.bootstrap: Creation complete (ID: 5768638456220583358)
module.yavin.null_resource.bootstrap: Still creating... (5m30s elapsed)
module.yavin.null_resource.bootstrap: Still creating... (5m40s elapsed)
module.yavin.null_resource.bootstrap: Creation complete (ID: 5768638456220583358)
Apply complete! Resources: 64 added, 0 changed, 0 destroyed.
Apply complete! Resources: 62 added, 0 changed, 0 destroyed.
```
In 4-8 minutes, the Kubernetes cluster will be ready.
## Verify
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Obtain the generated cluster `kubeconfig` from module outputs (e.g. write to a local file).
```
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
resource "local_file" "kubeconfig-yavin" {
content = module.yavin.kubeconfig-admin
filename = "/home/user/.kube/configs/yavin-config"
}
```
List nodes in the cluster.
```
$ export KUBECONFIG=/home/user/.kube/configs/yavin-config
$ kubectl get nodes
NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.16.0
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.16.0
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.16.0
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.17.0
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.17.0
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.17.0
```
List the pods.
@ -180,7 +188,6 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/google-
| dns_zone | Google Cloud DNS zone | "google-cloud.example.com" |
| dns_zone_name | Google Cloud DNS zone name | "example-zone" |
| ssh_authorized_key | SSH public key for user 'core' | "ssh-rsa AAAAB3NZ..." |
| asset_dir | Path to a directory where generated assets should be placed (contains secrets) | "/home/user/.secrets/clusters/yavin" |
Check the list of valid [regions](https://cloud.google.com/compute/docs/regions-zones/regions-zones) and list Container Linux [images](https://cloud.google.com/compute/docs/images) with `gcloud compute images list | grep coreos`.
@ -205,6 +212,7 @@ resource "google_dns_managed_zone" "zone-for-clusters" {
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/yavin" |
| controller_count | Number of controllers (i.e. masters) | 1 | 3 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | Machine type for controllers | "n1-standard-1" | See below |
@ -217,7 +225,7 @@ resource "google_dns_managed_zone" "zone-for-clusters" {
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by coredns. | "cluster.local" | "k8s.example.com" |
| worker_node_labels | List of initial worker node labels | [] | ["worker-pool=default"] |
Check the list of valid [machine types](https://cloud.google.com/compute/docs/machine-types).

View File

@ -3,7 +3,7 @@
!!! danger
Typhoon for Fedora CoreOS is an early preview! Fedora CoreOS itself is a preview! Expect bugs and design shifts. Please help both projects solve problems. Report Fedora CoreOS bugs to [Fedora](https://github.com/coreos/fedora-coreos-tracker/issues). Report Typhoon issues to Typhoon.
In this tutorial, we'll create a Kubernetes v1.16.0 cluster on AWS with Fedora CoreOS.
In this tutorial, we'll create a Kubernetes v1.17.0 cluster on AWS with Fedora CoreOS.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a VPC, gateway, subnets, security groups, controller instances, worker auto-scaling group, network load balancer, and TLS assets.
@ -13,15 +13,15 @@ Controller hosts are provisioned to run an `etcd-member` peer and a `kubelet` se
* AWS Account and IAM credentials
* AWS Route53 DNS Zone (registered Domain Name or delegated subdomain)
* Terraform v0.12.x and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
* Terraform v0.12.6+ and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
## Terraform Setup
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.x on your system.
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your system.
```sh
$ terraform version
Terraform v0.12.7
Terraform v0.12.16
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
@ -52,8 +52,8 @@ Configure the AWS provider to use your access key credentials in a `providers.tf
```tf
provider "aws" {
version = "2.29.0"
region = "us-east-1" # MUST be us-east-1 right now!
version = "2.41.0"
region = "eu-central-1"
shared_credentials_file = "/home/user/.config/aws/credentials"
}
@ -72,8 +72,8 @@ Additional configuration options are described in the `aws` provider [docs](http
Define a Kubernetes cluster using the module `aws/fedora-coreos/kubernetes`.
```tf
module "aws-tempest" {
source = "git::https://github.com/poseidon/typhoon//aws/fedora-coreos/kubernetes?ref=DEVELOPMENT_SHA"
module "tempest" {
source = "git::https://github.com/poseidon/typhoon//aws/fedora-coreos/kubernetes?ref=v1.17.0"
# AWS
cluster_name = "tempest"
@ -82,7 +82,6 @@ module "aws-tempest" {
# configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
asset_dir = "/home/user/.secrets/clusters/tempest"
# optional
worker_count = 2
@ -113,7 +112,7 @@ Plan the resources to be created.
```sh
$ terraform plan
Plan: 98 to add, 0 to change, 0 to destroy.
Plan: 81 to add, 0 to change, 0 to destroy.
```
Apply the changes to create the cluster.
@ -121,9 +120,9 @@ Apply the changes to create the cluster.
```sh
$ terraform apply
...
module.aws-tempest.null_resource.bootstrap: Still creating... (4m50s elapsed)
module.aws-tempest.null_resource.bootstrap: Still creating... (5m0s elapsed)
module.aws-tempest.null_resource.bootstrap: Creation complete after 5m8s (ID: 3961816482286168143)
module.tempest.null_resource.bootstrap: Still creating... (4m50s elapsed)
module.tempest.null_resource.bootstrap: Still creating... (5m0s elapsed)
module.tempest.null_resource.bootstrap: Creation complete after 5m8s (ID: 3961816482286168143)
Apply complete! Resources: 98 added, 0 changed, 0 destroyed.
```
@ -132,15 +131,24 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
## Verify
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Obtain the generated cluster `kubeconfig` from module outputs (e.g. write to a local file).
```
$ export KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
resource "local_file" "kubeconfig-tempest" {
content = module.tempest.kubeconfig-admin
filename = "/home/user/.kube/configs/tempest-config"
}
```
List nodes in the cluster.
```
$ export KUBECONFIG=/home/user/.kube/configs/tempest-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-3-155 Ready <none> 10m v1.16.0
ip-10-0-26-65 Ready <none> 10m v1.16.0
ip-10-0-41-21 Ready <none> 10m v1.16.0
ip-10-0-3-155 Ready <none> 10m v1.17.0
ip-10-0-26-65 Ready <none> 10m v1.17.0
ip-10-0-41-21 Ready <none> 10m v1.17.0
```
List the pods.
@ -177,7 +185,6 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/aws/fed
| dns_zone | AWS Route53 DNS zone | "aws.example.com" |
| dns_zone_id | AWS Route53 DNS zone id | "Z3PAABBCFAKEC0" |
| ssh_authorized_key | SSH public key for user 'core' | "ssh-rsa AAAAB3NZ..." |
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "/home/user/.secrets/clusters/tempest" |
#### DNS Zone
@ -191,7 +198,7 @@ resource "aws_route53_zone" "zone-for-clusters" {
}
```
Reference the DNS zone id with `"${aws_route53_zone.zone-for-clusters.zone_id}"`.
Reference the DNS zone id with `aws_route53_zone.zone-for-clusters.zone_id`.
!!! tip ""
If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on Route53 (e.g. aws.mydomain.com) and [update nameservers](http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/SOA-NSrecords.html).
@ -200,16 +207,17 @@ Reference the DNS zone id with `"${aws_route53_zone.zone-for-clusters.zone_id}"`
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/tempest" |
| controller_count | Number of controllers (i.e. masters) | 1 | 1 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | EC2 instance type for controllers | "t3.small" | See below |
| worker_type | EC2 instance type for workers | "t3.small" | See below |
| os_image | AMI channel for Fedora CoreOS | not yet used | ? |
| disk_size | Size of the EBS volume in GB | "40" | "100" |
| disk_size | Size of the EBS volume in GB | 40 | 100 |
| disk_type | Type of the EBS volume | "gp2" | standard, gp2, io1 |
| disk_iops | IOPS of the EBS volume | "0" (i.e. auto) | "400" |
| worker_target_groups | Target group ARNs to which worker instances should be added | [] | ["${aws_lb_target_group.app.id}"] |
| worker_price | Spot price in USD for workers. Leave as default empty string for regular on-demand instances | "" | "0.10" |
| disk_iops | IOPS of the EBS volume | 0 (i.e. auto) | 400 |
| worker_target_groups | Target group ARNs to which worker instances should be added | [] | [aws_lb_target_group.app.id] |
| worker_price | Spot price in USD for worker instances or 0 to use on-demand instances | 0 | 0.10 |
| controller_snippets | Controller Fedora CoreOS Config snippets | [] | UNSUPPORTED |
| worker_clc_snippets | Worker Fedora CoreOS Config snippets | [] | UNSUPPORTED |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
@ -217,7 +225,7 @@ Reference the DNS zone id with `"${aws_route53_zone.zone-for-clusters.zone_id}"`
| host_cidr | CIDR IPv4 range to assign to EC2 instances | "10.0.0.0/16" | "10.1.0.0/16" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by coredns. | "cluster.local" | "k8s.example.com" |
| worker_node_labels | List of initial worker node labels | [] | ["worker-pool=default"] |
Check the list of valid [instance types](https://aws.amazon.com/ec2/instance-types/).

View File

@ -3,7 +3,7 @@
!!! danger
Typhoon for Fedora CoreOS is an early preview! Fedora CoreOS itself is a preview! Expect bugs and design shifts. Please help both projects solve problems. Report Fedora CoreOS bugs to [Fedora](https://github.com/coreos/fedora-coreos-tracker/issues). Report Typhoon issues to Typhoon.
In this tutorial, we'll network boot and provision a Kubernetes v1.16.0 cluster on bare-metal with Fedora CoreOS.
In this tutorial, we'll network boot and provision a Kubernetes v1.17.0 cluster on bare-metal with Fedora CoreOS.
First, we'll deploy a [Matchbox](https://github.com/poseidon/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Fedora CoreOS to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers via Ignition.
@ -15,7 +15,7 @@ Controller hosts are provisioned to run an `etcd-member` peer and a `kubelet` se
* PXE-enabled [network boot](https://coreos.com/matchbox/docs/latest/network-setup.html) environment (with HTTPS support)
* Matchbox v0.6+ deployment with API enabled
* Matchbox credentials `client.crt`, `client.key`, `ca.crt`
* Terraform v0.12.x, [terraform-provider-matchbox](https://github.com/poseidon/terraform-provider-matchbox), and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
* Terraform v0.12.6+, [terraform-provider-matchbox](https://github.com/poseidon/terraform-provider-matchbox), and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
## Machines
@ -110,11 +110,11 @@ Read about the [many ways](https://coreos.com/matchbox/docs/latest/network-setup
## Terraform Setup
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.x on your system.
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your system.
```sh
$ terraform version
Terraform v0.12.7
Terraform v0.12.16
```
Add the [terraform-provider-matchbox](https://github.com/poseidon/terraform-provider-matchbox) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
@ -147,9 +147,9 @@ Configure the Matchbox provider to use your Matchbox API endpoint and client cer
provider "matchbox" {
version = "0.3.0"
endpoint = "matchbox.example.com:8081"
client_cert = "${file("~/.config/matchbox/client.crt")}"
client_key = "${file("~/.config/matchbox/client.key")}"
ca = "${file("~/.config/matchbox/ca.crt")}"
client_cert = file("~/.config/matchbox/client.crt")
client_key = file("~/.config/matchbox/client.key")
ca = file("~/.config/matchbox/ca.crt")
}
provider "ct" {
@ -162,36 +162,37 @@ provider "ct" {
Define a Kubernetes cluster using the module `bare-metal/fedora-coreos/kubernetes`.
```tf
module "bare-metal-mercury" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/fedora-coreos/kubernetes?ref=DEVELOPMENT_SHA"
module "mercury" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/fedora-coreos/kubernetes?ref=v1.17.0"
# bare-metal
cluster_name = "mercury"
matchbox_http_endpoint = "http://matchbox.example.com"
os_stream = "testing"
os_version = "30.20190801.0"
cached_install = "true"
os_version = "30.20191002.0"
cached_install = true
# configuration
k8s_domain_name = "node1.example.com"
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
asset_dir = "/home/user/.secrets/clusters/mercury"
# machines
controller_names = ["node1"]
controller_macs = ["52:54:00:a1:9c:ae"]
controller_domains = ["node1.example.com"]
worker_names = [
"node2",
"node3",
]
worker_macs = [
"52:54:00:b2:2f:86",
"52:54:00:c3:61:77",
]
worker_domains = [
"node2.example.com",
"node3.example.com",
controllers = [{
name = "node1"
mac = "52:54:00:a1:9c:ae"
domain = "node1.example.com"
}]
workers = [
{
name = "node2",
mac = "52:54:00:b2:2f:86"
domain = "node2.example.com"
},
{
name = "node3",
mac = "52:54:00:c3:61:77"
domain = "node3.example.com"
}
]
}
```
@ -222,14 +223,14 @@ $ terraform plan
Plan: 55 to add, 0 to change, 0 to destroy.
```
Apply the changes. Terraform will generate bootstrap assets to `asset_dir` and create Matchbox profiles (e.g. controller, worker) and matching rules via the Matchbox API.
Apply the changes. Terraform will generate bootstrap assets and create Matchbox profiles (e.g. controller, worker) and matching rules via the Matchbox API.
```sh
$ terraform apply
module.bare-metal-mercury.null_resource.copy-kubeconfig.0: Provisioning with 'file'...
module.bare-metal-mercury.null_resource.copy-etcd-secrets.0: Provisioning with 'file'...
module.bare-metal-mercury.null_resource.copy-kubeconfig.0: Still creating... (10s elapsed)
module.bare-metal-mercury.null_resource.copy-etcd-secrets.0: Still creating... (10s elapsed)
module.mercury.null_resource.copy-kubeconfig.0: Provisioning with 'file'...
module.mercury.null_resource.copy-etcd-secrets.0: Provisioning with 'file'...
module.mercury.null_resource.copy-kubeconfig.0: Still creating... (10s elapsed)
module.mercury.null_resource.copy-etcd-secrets.0: Still creating... (10s elapsed)
...
```
@ -254,11 +255,11 @@ Machines will network boot, install Fedora CoreOS to disk, reboot into the disk
Wait for the `bootstrap` step to finish bootstrapping the Kubernetes control plane. This may take 5-15 minutes depending on your network.
```
module.bare-metal-mercury.null_resource.bootstrap: Still creating... (6m10s elapsed)
module.bare-metal-mercury.null_resource.bootstrap: Still creating... (6m20s elapsed)
module.bare-metal-mercury.null_resource.bootstrap: Still creating... (6m30s elapsed)
module.bare-metal-mercury.null_resource.bootstrap: Still creating... (6m40s elapsed)
module.bare-metal-mercury.null_resource.bootstrap: Creation complete (ID: 5441741360626669024)
module.mercury.null_resource.bootstrap: Still creating... (6m10s elapsed)
module.mercury.null_resource.bootstrap: Still creating... (6m20s elapsed)
module.mercury.null_resource.bootstrap: Still creating... (6m30s elapsed)
module.mercury.null_resource.bootstrap: Still creating... (6m40s elapsed)
module.mercury.null_resource.bootstrap: Creation complete (ID: 5441741360626669024)
Apply complete! Resources: 55 added, 0 changed, 0 destroyed.
```
@ -277,15 +278,24 @@ systemd[1]: Started Kubernetes control plane.
## Verify
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Obtain the generated cluster `kubeconfig` from module outputs (e.g. write to a local file).
```
$ export KUBECONFIG=/home/user/.secrets/clusters/mercury/auth/kubeconfig
resource "local_file" "kubeconfig-mercury" {
content = module.mercury.kubeconfig-admin
filename = "/home/user/.kube/configs/mercury-config"
}
```
List nodes in the cluster.
```
$ export KUBECONFIG=/home/user/.kube/configs/mercury-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1.example.com Ready <none> 10m v1.16.0
node2.example.com Ready <none> 10m v1.16.0
node3.example.com Ready <none> 10m v1.16.0
node1.example.com Ready <none> 10m v1.17.0
node2.example.com Ready <none> 10m v1.17.0
node3.example.com Ready <none> 10m v1.17.0
```
List the pods.
@ -318,32 +328,27 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/bare-me
| Name | Description | Example |
|:-----|:------------|:--------|
| cluster_name | Unique cluster name | mercury |
| matchbox_http_endpoint | Matchbox HTTP read-only endpoint | http://matchbox.example.com:port |
| os_stream | Fedora CoreOS release stream | testing |
| os_version | Fedora CoreOS version to PXE and install | 30.20190716.1 |
| cluster_name | Unique cluster name | "mercury" |
| matchbox_http_endpoint | Matchbox HTTP read-only endpoint | "http://matchbox.example.com:port" |
| os_stream | Fedora CoreOS release stream | "testing" |
| os_version | Fedora CoreOS version to PXE and install | "30.20190716.1" |
| k8s_domain_name | FQDN resolving to the controller(s) nodes. Workers and kubectl will communicate with this endpoint | "myk8s.example.com" |
| ssh_authorized_key | SSH public key for user 'core' | "ssh-rsa AAAAB3Nz..." |
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "/home/user/.secrets/clusters/mercury" |
| controller_names | Ordered list of controller short names | ["node1"] |
| controller_macs | Ordered list of controller identifying MAC addresses | ["52:54:00:a1:9c:ae"] |
| controller_domains | Ordered list of controller FQDNs | ["node1.example.com"] |
| worker_names | Ordered list of worker short names | ["node2", "node3"] |
| worker_macs | Ordered list of worker identifying MAC addresses | ["52:54:00:b2:2f:86", "52:54:00:c3:61:77"] |
| worker_domains | Ordered list of worker FQDNs | ["node2.example.com", "node3.example.com"] |
| controllers | List of controller machine detail objects (unique name, identifying MAC address, FQDN) | `[{name="node1", mac="52:54:00:a1:9c:ae", domain="node1.example.com"}]` |
| workers | List of worker machine detail objects (unique name, identifying MAC address, FQDN) | `[{name="node2", mac="52:54:00:b2:2f:86", domain="node2.example.com"}, {name="node3", mac="52:54:00:c3:61:77", domain="node3.example.com"}]` |
### Optional
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/mercury" |
| cached_install | PXE boot and install from the Matchbox `/assets` cache. Admin MUST have downloaded Fedora CoreOS images into the cache | false | true |
| install_disk | Disk device where Fedora CoreOS should be installed | "sda" (not "/dev/sda" like Container Linux) | "sdb" |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
| network_mtu | CNI interface MTU (calico-only) | 1480 | - |
| snippets | Map from machine names to lists of Fedora CoreOS Config snippets | {} | UNSUPPORTED |
| network_ip_autodetection_method | Method to detect host IPv4 address (calico-only) | first-found | can-reach=10.0.0.1 |
| network_ip_autodetection_method | Method to detect host IPv4 address (calico-only) | "first-found" | "can-reach=10.0.0.1" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by coredns. | "cluster.local" | "k8s.example.com" |
| kernel_args | Additional kernel args to provide at PXE boot | [] | "kvm-intel.nested=1" |
| kernel_args | Additional kernel args to provide at PXE boot | [] | ["kvm-intel.nested=1"] |

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.16.0 (upstream)
* Kubernetes v1.17.0 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](advanced/worker-pools/), [preemptible](cl/google-cloud/#preemption) workers, and [snippets](advanced/customization/#container-linux) customization
@ -46,8 +46,8 @@ A preview of Typhoon for [Fedora CoreOS](https://getfedora.org/coreos/) is avail
Define a Kubernetes cluster by using the Terraform module for your chosen platform and operating system. Here's a minimal example.
```tf
module "google-cloud-yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.16.0"
module "yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.17.0"
# Google Cloud
cluster_name = "yavin"
@ -57,11 +57,16 @@ module "google-cloud-yavin" {
# configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
asset_dir = "/home/user/.secrets/clusters/yavin"
# optional
worker_count = 2
}
# Obtain cluster kubeconfig
resource "local_file" "kubeconfig-yavin" {
content = module.yavin.kubeconfig-admin
filename = "/home/user/.kube/configs/yavin-config"
}
```
Initialize modules, plan the changes to be made, and apply the changes.
@ -69,20 +74,20 @@ Initialize modules, plan the changes to be made, and apply the changes.
```sh
$ terraform init
$ terraform plan
Plan: 64 to add, 0 to change, 0 to destroy.
Plan: 62 to add, 0 to change, 0 to destroy.
$ terraform apply
Apply complete! Resources: 64 added, 0 changed, 0 destroyed.
Apply complete! Resources: 62 added, 0 changed, 0 destroyed.
```
In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.
```
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
$ export KUBECONFIG=/home/user/.kube/configs/yavin-config
$ kubectl get nodes
NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.16.0
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.16.0
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.16.0
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.17.0
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.17.0
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.17.0
```
List the pods.

View File

@ -6,6 +6,39 @@ Typhoon ensures certain networking hardware integrates well with bare-metal Kube
Ubiquiti EdgeRouters and EdgeOS work well with bare-metal Kubernetes clusters. Familiarity with EdgeRouter setup and CLI usage is required.
### DHCP
Assign static IPs to clients with known MAC addresses. This is called a static mapping by EdgeOS. Configure the router with the commands based on region inventory.
```
configure
show service dhcp-server shared-network
set service dhcp-server shared-network-name LAN subnet SUBNET static-mapping NAME mac-address MACADDR
set service dhcp-server shared-network-name LAN subnet SUBNET static-mapping NAME ip-address 10.0.0.20
```
### DNS
Add DNS A records to static IPs as `dnsmasq` host-records.
```
configure
set service dns forwarding options host-record=node.example.com,10.0.0.20
```
Forward `*.svc.cluster.local` queries to the CoreDNS Kubernetes service IP to allow clients to resolve Kubernetes services.
```
set service dns forwarding options server=/svc.cluster.local/10.3.0.10
commit-confirm
```
Restart `dnsmasq`.
```
sudo /etc/init.d/dnsmasq restart
```
### PXE
Ubiquiti EdgeRouters can provide a PXE-enabled network boot environment for client machines.
@ -65,55 +98,88 @@ set service dns forwarding options tftp-root=/config/tftpboot
commit-confirm
```
### DHCP
### Routing
Assign static IPs to clients with known MAC addresses. This is called a static mapping by EdgeOS. Configure the router with the commands based on region inventory.
#### Static Routes
```
configure
show service dhcp-server shared-network
set service dhcp-server shared-network-name LAN subnet SUBNET static-mapping NAME mac-address MACADDR
set service dhcp-server shared-network-name LAN subnet SUBNET static-mapping NAME ip-address 10.0.0.20
```
### DNS
Assign DNS A records to nodes as options to `dnsmasq`.
```
configure
set service dns forwarding options host-record=node.example.com,10.0.0.20
```
Restart `dnsmasq`.
```
sudo /etc/init.d/dnsmasq restart
```
Configure queries for `*.svc.cluster.local` to be forwarded to the Kubernetes `coredns` service IP to allow hosts to resolve cluster-local Kubernetes names.
```
configure
show service dns forwarding
set service dns forwarding options server=/svc.cluster.local/10.3.0.10
commit-confirm
```
### Kubernetes Services
Add static routes for the Kubernetes IPv4 service range to Kubernetes node(s) so hosts can route to Kubernetes services (default: 10.3.0.0/16).
Add static route(s) to Kubernetes node(s) that can route to Kubernetes service IPs (default: 10.3.0.0/16). Kubernetes service IPs will become routeable on the LAN.
```
configure
show protocols static route
set protocols static route 10.3.0.0/16 next-hop NODE_IP
...
commit-confirm
```
!!! note
Adding multiple next-hop nodes provides equal-cost multi-path (ECMP) routing. EdgeOS v2.0+ is required. The kernel in prior versions used flow-hash to balanced packets, whereas with v2.0, round-robin sessions are used.
Adding multiple next-hop nodes provides equal-cost multi-path (ECMP) routing. EdgeOS v2.0+ is required. The kernel in prior versions used flow-hash to balanced packets, whereas with v2.0, round-robin sessions are used.
#### BGP
EdgeRouter can exchange routes with other autonomous systems, including a cluster's Calico AS. Peers will exchange `podCIDR` routes to make individual pods routeable on the LAN.
Define the EdgeRouter AS (if undefined).
```
configure
show protocols bgp 1
set protocols bgp 1 parameters router-id ROUTER_IP
```
Peer with node(s) in another AS (eg. Calico default 64512)
```
set protocols bgp 1 neighbor NODE1_IP remote-as 64512
set protocols bgp 1 neighbor NODE2_IP remote-as 64512
set protocols bgp 1 neighbor NODE3_IP remote-as 64512
commit-confirm
```
Configure Calico node(s) as to peer with the EdgeRouter.
```
apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
name: NODE_NAME-to-edgerouter
spec:
peerIP: ROUTER_IP
asNumber: 1
node: NODE_NAME
```
Or, if every node is to be peered (i.e. full mesh), define a global BGPPeer.
```
apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
name: global
spec:
peerIP: ROUTER_IP
asNumber: 1
```
If Calico nodes should advertise Kubernetes Service IPs (i.e. ClusterIPs) as well, add a `BGPConfiguration`.
```
apiVersion: crd.projectcalico.org/v1
kind: BGPConfiguration
metadata:
name: default
spec:
logSeverityScreen: Info
nodeToNodeMeshEnabled: true
serviceClusterIPs:
- cidr: 10.3.0.0/16
```
Show a summary of peers and exchanged routes.
```
show ip bgp summary
show ip route bgp
```
### Port Forwarding
@ -150,35 +216,3 @@ set service gui https-port 4443
commit-confirm
```
### BGP
Add the EdgeRouter as a global BGP peer for nodes in a Kubernetes cluster (requires Calico). Neighbors will exchange `podCIDR` routes and individual pods will become routable on the LAN.
Configure node(s) as BGP neighbors.
```
show protocols bgp 1
set protocols bgp 1 parameters router-id LAN_IP
set protocols bgp 1 neighbor NODE1_IP remote-as 64512
set protocols bgp 1 neighbor NODE2_IP remote-as 64512
set protocols bgp 1 neighbor NODE3_IP remote-as 64512
```
View the neighbors and exchanged routes.
```
show ip bgp neighbors
show ip route bgp
```
Be sure to register the peer by creating a Calico `BGPPeer` CRD with `kubectl apply`.
```
apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
name: NAME
spec:
peerIP: LAN_IP
asNumber: 64512
```

Some files were not shown because too many files have changed in this diff Show More