Compare commits

...

74 Commits

Author SHA1 Message Date
1420700bc0 Update CHANGES for v1.18.1 release
* Change order of modules in the README
2020-04-11 13:23:49 -07:00
80538e2953 Add support for Fedora CoreOS on DigitalOcean
* Add `digital-ocean/fedora-coreos/kubernetes` module
* DigitalOcean custom uploaded images do not permit
droplet IPv6 networking
2020-04-09 23:55:29 -07:00
73af2f3b7c Update Kubernetes from v1.18.0 to v1.18.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1181
2020-04-08 19:41:48 -07:00
17ea547723 Update etcd from v3.4.5 to v3.4.7
* https://github.com/etcd-io/etcd/releases/tag/v3.4.7
* https://github.com/etcd-io/etcd/releases/tag/v3.4.6
2020-04-06 21:09:25 -07:00
2b5dfece93 Update Grafana from v6.7.1 to v6.7.2
* https://github.com/grafana/grafana/releases/tag/v6.7.2
2020-04-04 13:13:19 -07:00
d47d40b517 Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh upstream Prometheus rules and alerts and Grafana
dashboards
* All Loki recording rules for convenience
2020-03-31 21:53:01 -07:00
3c1be7b0e0 Fix terraform fmt 2020-03-31 21:42:51 -07:00
bbbaf949f9 Fix UDP outbound and clock sync timeouts on Azure workers
* Add "lb" outbound rule for worker TCP _and_ UDP traffic
* Fix Azure worker nodes clock synchronization being inactive
due to timeouts reaching the CoreOS / Flatcar NTP pool
* Fix Azure worker nodes not providing outbount UDP connectivity

Background:

Azure provides VMs outbound connectivity either by having a public
IP or via an SNAT masquerade feature bundled with their virtual
load balancing abstraction (in contrast with, say, a NAT gateway).

Azure worker nodes have only a private IP, but are associated with
the cluster load balancer's backend pool and ingress frontend IP.
Outbound traffic uses SNAT with this frontend IP. A subtle detail
with Azure SNAT seems to be that since both inbound lb_rule's are
TCP only, outbound UDP traffic isn't SNAT'd (highlights the reasons
Azure shouldn't have conflated inbound load balancing with outbound
SNAT concepts). However, adding a separate outbound rule and
disabling outbound SNAT on our ingress lb_rule's we can tell Azure
to continue load balancing as before, and support outbound SNAT for
worker traffic of both the TCP and UDP protocol.

Fixes clock synchronization timeouts:

```
systemd-timesyncd[786]: Timed out waiting for reply from
45.79.36.123:123 (3.flatcar.pool.ntp.org)
```

Azure controller nodes have their own public IP, so controllers (and
etcd) nodes have not had clock synchronization or outbound UDP issues
2020-03-31 21:00:16 -07:00
135c6182b8 Update flannel from v0.11.0 to v0.12.0
* https://github.com/coreos/flannel/releases/tag/v0.12.0
2020-03-31 18:31:59 -07:00
c53dc66d4a Rename Container Linux snippets variable for consistency
* Rename controller_clc_snippets to controller_snippets (cloud platforms)
* Rename worker_clc_snippets to worker_snippets (cloud platforms)
* Rename clc_snippets to snippets (bare-metal)
2020-03-31 18:25:51 -07:00
9960972726 Fix bootstrap regression when networking="flannel"
* Fix bootstrap error for missing `manifests-networking/crd*yaml`
when `networking = "flannel"`
* Cleanup manifest-networking directory left during bootstrap
* Regressed in v1.18.0 changes for Calico https://github.com/poseidon/typhoon/pull/675
2020-03-31 18:21:59 -07:00
bac5acb3bd Change default kube-system DaemonSet tolerations
* Change kube-proxy, flannel, and calico-node DaemonSet
tolerations to tolerate `node.kubernetes.io/not-ready`
and `node-role.kubernetes.io/master` (i.e. controllers)
explicitly, rather than tolerating all taints
* kube-system DaemonSets will no longer tolerate custom
node taints by default. Instead, custom node taints must
be enumerated to opt-in to scheduling/executing the
kube-system DaemonSets
* Consider setting the daemonset_tolerations variable
of terraform-render-bootstrap at a later date

Background: Tolerating all taints ruled out use-cases
where certain nodes might legitimately need to keep
kube-proxy or CNI networking disabled
Related: https://github.com/poseidon/terraform-render-bootstrap/pull/179
2020-03-31 01:00:45 -07:00
70bdc9ec94 Allow bootstrap re-apply for Fedora CoreOS GCP
* Problem: Fedora CoreOS images are manually uploaded to GCP. When a
cluster is created with a stale image, Zincati immediately checks
for the latest stable image, fetches, and reboots. In practice,
this can unfortunately occur exactly during the initial cluster
bootstrap phase.

* Recommended: Upload the latest Fedora CoreOS image regularly
* Mitigation: Allow a failed bootstrap.service run (which won't touch
the done ConditionalPathExists) to be re-run by running `terraforma apply`
again. Add a known issue to CHANGES
* Update docs to show the current Fedora CoreOS stable version to
reduce likelihood users see this issue

 Longer term ideas:

* Ideal: Fedora CoreOS publishes a stable channel. Instances will always
boot with the latest image in a channel. The problem disappears since
it works the same way AWS does
* Timer: Consider some timer-based approach to have zincati delay any
system reboots for the first ~30 min of a machine's life. Possibly just
configured on the controller node https://github.com/coreos/zincati/pull/251
* External coordination: For Container Linux, locksmith filled a similar
role and was disabled to allow CLUO to coordinate reboots. By running
atop Kubernetes, it was not possible for the reboot to occur before
cluster bootstrap
* Rely on https://github.com/coreos/zincati/issues/115 to delay the
reboot since bootstrap involves an SSH session
* Use path-based activation of zincati on controllers and set that
path at the end of the bootstrap process

Rel: https://github.com/coreos/fedora-coreos-tracker/issues/239
2020-03-28 18:12:31 -07:00
144bb9403c Add support for Fedora CoreOS snippets
* Refresh snippets customization docs
* Requires terraform-provider-ct v0.5+
2020-03-28 16:15:04 -07:00
5fca08064b Fix Fedora CoreOS AMI to filter for stable images
* Fix issue observed in us-east-1 where AMI filters chose the
latest testing channel release, rather than the stable chanel
* Fedora CoreOS AMI filter selects the latest image with a
matching name, x86_64, and hvm, excluding dev images. Add a
filter for "Fedora CoreOS stable", which seems to be the only
distinguishing metadata indicating the channel
2020-03-28 12:57:45 -07:00
fc686c8fc7 Fix delete-node.service kubectl service exec's
* Fix delete-node service that runs on worker (cloud-only)
shutdown to delete a Kubernetes node. Regressed in #669
(unreleased)
* Use rkt `--exec` to invoke kubectl binary in the kubelet
image
* Use podman `--entrypoint` to invoke the kubectl binary in
the kubelet image
2020-03-28 12:35:23 -07:00
a1a5da6bc2 Add CoreOS Container Linux EOL recommendation to CHANGES
* Recommend that users who have not yet tried Fedora CoreOS or
Flatcar Linux do so. Likely, Container Linux will reach EOL
and platform support / stability ratings will be in a mixed
state. Nevertheless, folks should migrate by September.
2020-03-26 23:41:54 -07:00
076b8e3c42 Update Prometheus from v2.17.0 to v2.17.1
* https://github.com/prometheus/prometheus/releases/tag/v2.17.1
2020-03-26 22:17:13 -07:00
ef5f953e04 Set docker log driver to journald on Fedora CoreOS
* Before Kubernetes v1.18.0, Kubelet only supported kubectl
`--limit-bytes` with the Docker `json-file` log driver so
the Fedora CoreOS default was overridden for conformance.
See https://github.com/poseidon/typhoon/pull/642
* Kubelet v1.18+ implemented support for other docker log
drivers, so the Fedora CoreOS default `journald` can be
used again

Rel: https://github.com/kubernetes/kubernetes/issues/86367
2020-03-26 22:06:45 -07:00
d25f23e675 Update docs from Kubernetes v1.17.4 to v1.18.0 2020-03-25 20:28:30 -07:00
f100a90d28 Update Kubernetes from v1.17.4 to v1.18.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md
2020-03-25 17:51:50 -07:00
c3bf8bcf96 Add Fedora CoreOS to issue template and docs
* Update several Container Linux references to start
referring to Flatcar Linux
* Update docs and mentions of Fedora CoreOS
2020-03-25 00:36:15 -07:00
5d1e4ad333 Deprecate asset_dir variable and remove docs
* Remove docs for the `asset_dir` variable and deprecate
it in CHANGES. It will be removed in an upcoming release
* Typhoon v1.17.0 introduced a new mechanism for managing
and distributing generated assets that stopped relying on
writing out to disk. `asset_dir` became optional and
defaulted to being unset / off (recommended)
2020-03-25 00:00:01 -07:00
9f702c72d2 Rename DigitalOcean image variable to os_image
* Rename variable `image` to `os_image` to match the naming
used for the same purpose on other supported platforms (e.g.
AWS, Azure, GCP)
2020-03-24 23:49:37 -07:00
e556bc2167 Update Prometheus from v2.17.0-rc.3 to v2.17.0
* https://github.com/prometheus/prometheus/releases/tag/v2.17.0
2020-03-24 23:15:49 -07:00
1bf4f3b801 Fix image tag for Container Linux AWS workers
* #669 left one reference to the original SHA tagged image
before the v1.17.4 image tag was applied
2020-03-21 15:44:33 -07:00
590d941f50 Switch from upstream hyperkube image to individual images
* Kubernetes plans to stop releasing the hyperkube container image
* Upstream will continue to publish `kube-apiserver`, `kube-controller-manager`,
`kube-scheduler`, and `kube-proxy` container images to `k8s.gcr.io`
* Upstream will publish Kubelet only as a binary for distros to package,
either as a DEB/RPM on traditional distros or a container image on
container-optimized operating systems
* Typhoon will package the upstream Kubelet (checksummed) and its
dependencies as a container image for use on CoreOS Container Linux,
Flatcar Linux, and Fedora CoreOS
* Update the Typhoon container image security policy to list
`quay.io/poseidon/kubelet`as an official distributed artifact

Hyperkube: https://github.com/kubernetes/kubernetes/pull/88676
Kubelet Container Image: https://github.com/poseidon/kubelet
Kubelet Quay Repo: https://quay.io/repository/poseidon/kubelet
2020-03-21 15:43:05 -07:00
ddc1ff5348 Update Grafana from v6.6.2 to v6.7.1
* https://github.com/grafana/grafana/releases/tag/v6.7.1
2020-03-21 15:27:55 -07:00
61557e89a6 Update Prometheus from v2.16.0 to v2.17.0-rc.3
* https://github.com/prometheus/prometheus/releases/tag/v2.17.0-rc.3
2020-03-19 22:38:05 -07:00
c3ef21dbf5 Update etcd from v3.4.4 to v3.4.5
* https://github.com/etcd-io/etcd/releases/tag/v3.4.5
2020-03-18 20:50:41 -07:00
2a5dddeb9d Promote Fedora CoreOS AWS and Google Cloud
* Promote Fedora CoreOS AWS to stable
* Promote Fedora CoreOS GCP to beta
2020-03-16 22:12:26 -07:00
75fb4e5d11 Remove Container Linux Update Operator (CLUO) addon
* Stop providing example manifests for the Container Linux
Update Operator (CLUO)
* CLUO requires patches to support Kubernetes v1.16+, but the
project and push access is rather unowned
* CLUO hasn't been in active use in our clusters and won't be
relevant beyond Container Linux. Not to say folks can't patch
it and run it on their own. Examples just aren't provided here

Related: https://github.com/coreos/container-linux-update-operator/pull/197
2020-03-16 22:05:17 -07:00
1a139ef6f1 Update recommended Terraform versions and providers
* Sync the documented Terraform versions and provider
plugin versions to those that are actively used/tested
by the author
2020-03-16 21:40:52 -07:00
bc7902f40a Update Kubernetes from v1.17.3 to v1.17.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1174
2020-03-13 00:06:41 -07:00
70bf39bb9a Update Calico from v3.12.0 to v3.13.1
* https://docs.projectcalico.org/v3.13/release-notes/
2020-03-12 23:00:38 -07:00
4e1b8f22df Add support for Flatcar Linux on Azure
* Accept `os_image` "flatcar-stable" and "flatcar-beta" to
use Kinvolk's Flatcar Linux images from the Azure Marketplace

Note: Flatcar Linux Azure Marketplace images require terms be
accepted before use
2020-03-12 22:52:48 -07:00
ab7913a061 Accept initial worker node labels and taints map on bare-metal
* Add `worker_node_labels` map from node name to a list of initial
node label strings
* Add `worker_node_taints` map from node name to a list of initial
node taint strings
* Unlike cloud platforms, bare-metal node labels and taints
are defined via a map from node name to list of labels/taints.
Bare-metal clusters may have heterogeneous hardware so per node
labels and taints are accepted
* Only worker node names are allowed. Workloads are not scheduled
on controller nodes so altering their labels/taints isn't suitable

```
module "mercury" {
  ...

  worker_node_labels = {
    "node2" = ["role=special"]
  }

  worker_node_taints = {
    "node2" = ["role=special:NoSchedule"]
  }
}
```

Related: https://github.com/poseidon/typhoon/issues/429
2020-03-09 00:12:02 -07:00
7b0ea23cdc Upgrade terraform-provider-azurerm to v2.0+
* Add support for `terraform-provider-azurerm` v2.0+. Require
`terraform-provider-azurerm` v2.0+ and drop v1.x support since
the Azure provider major release is not backwards compatible
* Use Azure's new Linux VM and Linux VM Scale Set resources
* Change controller's Azure disk caching to None
* Associate subnets (in addition to NICs) with security groups
(aesthetic)
* If set, change `worker_priority` from `Low` to `Spot` (action required)

Related:

* https://www.terraform.io/docs/providers/azurerm/guides/2.0-upgrade-guide.html
2020-03-08 17:40:13 -07:00
c4683c5bad Refresh Prometheus alerts and Grafana dashboards
* Add 2 min wait before KubeNodeUnreachable to be less
noisy on premeptible clusters
* Add a BlackboxProbeFailure alert for any failing probes
for services annotated `prometheus.io/probe: true`
2020-03-02 20:08:37 -08:00
51cee6d5a4 Change Container Linux etcd-member to fetch with docker://
* Quay has historically generated ACI signatures for images to
facilitate rkt's notions of verification (it allowed authors to
actually sign images, though `--trust-keys-from-https` is in use
since etcd and most authors don't sign images). OCI standardization
didn't adopt verification ideas and checking signatures has fallen
out of favor.
* Fix an issue where Quay no longer seems to be generating ACI
signatures for new images (e.g. quay.io/coreos/etcd:v.3.4.4)
* Don't be alarmed by rkt `--insecure-options=image`. It refers
to disabling image signature checking (i.e. docker pull doesn't
check signatures either)
* System containers for Kubelet and bootstrap have transitioned
to the docker:// transport, so there is precedent and this brings
all the system containers on Container Linux controllers into
alignment
2020-03-02 19:57:45 -08:00
87f9a2fc35 Add automatic worker deletion on Fedora CoreOS clouds
* On clouds where workers can scale down or be preempted
(AWS, GCP, Azure), shutdown runs delete-node.service to
remove a node a prevent NotReady nodes from lingering
* Add the delete-node.service that wasn't carried over
from Container Linux and port it to use podman
2020-02-29 20:22:03 -08:00
6de5cf5a55 Update etcd from v3.4.3 to v3.4.4
* https://github.com/etcd-io/etcd/releases/tag/v3.4.4
2020-02-29 16:19:29 -08:00
3250994c95 Use a route table with separate (rather than inline) routes
* Allow users to extend the route table using a data reference
and adding route resources (e.g. unusual peering setups)
* Note: Internally connecting AWS clusters can reduce cross-cloud
flexibility and inhibits blue-green cluster patterns. It is not
recommended
2020-02-25 23:21:58 -08:00
f4d260645c Update node-exporter from v0.18.1 to v1.0.0-rc.0
* Update mdadm alert rule; node-exporter adds `state` label to
`node_md_disks` and removes `node_md_disks_active`
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.0
2020-02-25 22:29:52 -08:00
d9219a6722 Update nginx-ingress from v0.29.0 to v0.30.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.30.0
2020-02-25 22:11:59 -08:00
60c7eb85ee Update nginx-ingress from v0.28.0 to v0.29.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.29.0
2020-02-22 15:57:59 -08:00
4c964b56a0 Update kube-state-metrics from v1.9.4 to v1.9.5
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.5
2020-02-22 15:21:10 -08:00
1fbd6835f2 Update Grafana from v6.6.1 to v6.6.2
* https://github.com/grafana/grafana/releases/tag/v6.6.2
2020-02-22 15:19:24 -08:00
e4d977bfcd Fix worker_node_labels for initial Fedora CoreOS
* Add Terraform strip markers to consume beginning and
trailing whitespace in templated Kubelet arguments for
podman (Fedora CoreOS only)
* Fix initial `worker_node_labels` being quietly ignored
on Fedora CoreOS cloud platforms that offer the feature
* Close https://github.com/poseidon/typhoon/issues/650
2020-02-22 15:12:35 -08:00
947c2c1815 Update mkdocs-material from v4.6.2 to v4.6.3 2020-02-18 21:59:17 -08:00
4a38fb5927 Update CoreDNS from v1.6.6 to v1.6.7
* https://coredns.io/2020/01/28/coredns-1.6.7-release/
2020-02-18 21:46:19 -08:00
c4e64a9d1b Change Kubelet /var/lib/calico mount to read-only (#643)
* Kubelet only requires read access to /var/lib/calico

Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
2020-02-18 21:40:58 -08:00
7ca03e5219 Update Prometheus from v1.15.2 to v1.16.0
* https://github.com/prometheus/prometheus/releases/tag/v2.16.0
2020-02-14 12:10:56 -08:00
362b3fac5c Add guide for Typhoon with Flatcar Linux on DigitalOcean
* Add docs on manually uploading a Flatcar Linux DigitalOcean
bin image as a custom image and using a data reference
* Set status of Flatcar Linux on DigitalOcean to alpha
* IPv6 is not supported for DigitalOcean custom images
2020-02-14 12:08:58 -08:00
32db59b9eb Update CHANGELOG sections and links 2020-02-14 12:05:51 -08:00
0c53ad52e4 Update recommended Terraform versions and providers
* Sync the documented Terraform versions and provider
plugin versions to those that are actively used/tested
by the author
2020-02-13 14:39:48 -08:00
008817b0aa Promote Fedora CoreOS AWS/bare-metal to beta
* Remove alpha warnings from docs headers
2020-02-13 14:25:22 -08:00
49d3b9e6b3 Set docker log driver to json-file on Fedora CoreOS
* Fix the last minor issue for Fedora CoreOS clusters to pass CNCF's
Kubernetes conformance tests
* Kubelet supports a seldom used feature `kubectl logs --limit-bytes=N`
to trim a log stream to a desired length. Kubelet handles this in the
CRI driver. The Kubelet docker shim only supports the limit bytes
feature when Docker is configured with the default `json-file` logging
driver
* CNCF conformance tests started requiring limit-bytes be supported,
indirectly forcing the log driver choice until either the Kubelet or
the conformance tests are fixed
* Fedora CoreOS defaults Docker to use `journald` (desired). For now,
as a workaround to offer conformant clusters, the log driver can
be set back to `json-file`. RHEL CoreOS likely won't have noticed the
non-conformance since its using crio runtime
* https://github.com/kubernetes/kubernetes/issues/86367

Note: When upstream has a fix, the aim is to drop the docker config
override and use the journald default
2020-02-11 23:00:38 -08:00
1243f395d1 Update Kubernetes from v1.17.2 to v1.17.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1173
2020-02-11 20:22:14 -08:00
846f11097f Update Fedora CoreOS kernel arguments to align with upstream
* Align bare-metal kernel arguments with upstream docs
* Add missing initrd argument which can cause issues if
not present. Fix #638
* Add tty0 and ttyS0 consoles (matches Container Linux)
* Remove unused coreos.inst=yes

Related: https://docs.fedoraproject.org/en-US/fedora-coreos/bare-metal/
2020-02-11 20:11:19 -08:00
ba84f86dc7 Add guide for Typhoon with Flatcar Linux on Google Cloud
* Add docs on manually uploading a Flatcar Linux GCE/GCP gzipped
tarball image as a Compute Engine image for use with the Typhoon
container-linux module
* Set status of Flatcar Linux on Google Cloud to alpha
2020-02-11 19:38:40 -08:00
b49a1d715d Update docs generation packages
* Update mkdocs-material from v4.6.0 to v4.6.2
2020-02-08 15:12:12 -08:00
34c3d7cc39 Update Grafana from v6.6.0 to v6.6.1
* https://github.com/grafana/grafana/releases/tag/v6.6.1
2020-02-08 14:50:33 -08:00
ca96a1335c Update Calico from v3.11.2 to v3.12.0
* https://docs.projectcalico.org/release-notes/#v3120
* Remove reverse packet filter override, since Calico no
longer relies on the setting
* https://github.com/coreos/fedora-coreos-tracker/issues/219
* https://github.com/projectcalico/felix/pull/2189
2020-02-06 00:43:33 -08:00
e339fbd2b6 Update kube-state-metrics from v1.9.3 to v1.9.4
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.4
2020-02-04 21:33:34 -08:00
8cc303c9ac Add module for Fedora CoreOS on Google Cloud
* Add Typhoon Fedora CoreOS on Google Cloud as alpha
* Add docs on uploading the Fedora CoreOS GCP gzipped tarball to
Google Cloud storage to create a boot disk image
2020-02-01 15:21:40 -08:00
b19ba16afa Update nginx-ingress from v0.27.1 to v0.28.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.28.0
2020-01-30 18:00:23 -08:00
d127a7345c Update Grafana from v6.5.3 to v6.6.0
* https://github.com/grafana/grafana/releases/tag/v6.6.0
2020-01-27 20:46:32 -08:00
02a470d2f2 Fix minor typo in announcement date 2020-01-23 08:57:01 -08:00
5643ad525f Promote Fedora CoreOS from preview to alpha in docs
* Add an announcement to the website as well
2020-01-23 08:47:18 -08:00
d5b7ce8f27 Update kube-state-metrics from v1.9.2 to v1.9.3
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.3
2020-01-23 00:03:16 -08:00
1cda5bcd2a Update Kubernetes from v1.17.1 to v1.17.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1172
2020-01-21 18:27:39 -08:00
bda73264f7 Update nginx-ingress from v0.26.1 to v0.27.1
* Change runAsUser from 33 to 101 for new alpine-based image
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.1
2020-01-20 15:22:16 -08:00
dd930a2ff9 Update bare-metal Fedora CoreOS image location
* Use Fedora CoreOS production download streams (change)
* Use live PXE kernel and initramfs images
* https://getfedora.org/coreos/download/
* Update docs example to use public images (cache is still
recommended at large scale) and stable stream
2020-01-20 14:44:06 -08:00
135 changed files with 6678 additions and 3562 deletions

View File

@ -5,7 +5,7 @@
### Environment
* Platform: aws, azure, bare-metal, google-cloud, digital-ocean
* OS: container-linux, flatcar-linux
* OS: fedora-coreos, flatcar-linux
* Release: Typhoon version or Git SHA (reporting latest is **not** helpful)
* Terraform: `terraform version` (reporting latest is **not** helpful)
* Plugins: Provider plugin versions (reporting latest is **not** helpful)

View File

@ -4,6 +4,184 @@ Notable changes between versions.
## Latest
## v1.18.1
* Kubernetes [v1.18.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1181)
* Choose Fedora CoreOS or Flatcar Linux (**action recommended**)
* Use a `fedora-coreos` module for Fedora CoreOS
* Use a `container-linux` module with OS set to Flatcar Linux
* Update etcd from v3.4.5 to [v3.4.7](https://github.com/etcd-io/etcd/releases/tag/v3.4.7)
* Change `kube-proxy` and `calico` or `flannel` to tolerate specific taints ([#682](https://github.com/poseidon/typhoon/pull/682))
* Tolerate master and not-ready taints, rather than tolerating all taints
* Update flannel from v0.11.0 to v0.12.0 ([#690](https://github.com/poseidon/typhoon/pull/690))
* Fix bootstrap when `networking` mode `flannel` (non-default) is chosen ([#689](https://github.com/poseidon/typhoon/pull/689))
* Regressed in v1.18.0 changes for Calico ([#675](https://github.com/poseidon/typhoon/pull/675))
* Rename Container Linux `controller_clc_snippets` to `controller_snippets` for consistency ([#688](https://github.com/poseidon/typhoon/pull/688))
* Rename Container Linux `worker_clc_snippets` to `worker_snippets` for consistency
* Rename Container Linux `clc_snippets` (bare-metal) to `snippets` for consistency
* Drop support for [gitRepo](https://kubernetes.io/docs/concepts/storage/volumes/#gitrepo) volumes
#### Azure
* Fix Azure worker UDP outbound connections ([#691](https://github.com/poseidon/typhoon/pull/691))
* Fix Azure worker clock sync timeouts
#### DigitalOcean
* Add support for Fedora CoreOS ([#699](https://github.com/poseidon/typhoon/pull/699))
#### Addons
* Refresh Prometheus rules/alerts and Grafana dashboards ([#692](https://github.com/poseidon/typhoon/pull/692))
* Update Grafana from v6.7.1 to v6.7.2
## v1.18.0
* Kubernetes [v1.18.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1180)
* Update etcd from v3.4.4 to [v3.4.5](https://github.com/etcd-io/etcd/releases/tag/v3.4.5)
* Switch from upstream hyperkube image to individual images ([#669](https://github.com/poseidon/typhoon/pull/669))
* Use upstream k8s.gcr.io `kube-apiserver`, `kube-controller-manager`, `kube-scheduler`, and `kube-proxy` container images
* Use [poseidon/kubelet](https://github.com/poseidon/kubelet) to package the upstream Kubelet binary and dependencies as a container image (checksummed, automated build)
* Add [quay.io/poseidon/kubelet](https://quay.io/repository/poseidon/kubelet) as a Typhoon distributed artifact in the security policy
* Update base images from debian 9 to debian 10
* Background: Kubernetes will [stop releasing](https://github.com/kubernetes/kubernetes/pull/88676) the hyperkube container image and provide the Kubelet as a binary for packaging
* Choose Fedora CoreOS or Flatcar Linux (**action recommended**)
* Use a `fedora-coreos` module for Fedora CoreOS
* Use a `container-linux` module with OS set for Flatcar Linux (varies, see docs)
* CoreOS Container Linux [won't receive updates](https://coreos.com/os/eol/) after May 2020
* Add support for Fedora CoreOS snippets (`terraform-provider-ct` v0.5+) ([#686](https://github.com/poseidon/typhoon/pull/686))
* Recommend updating `terraform-provider-ct` plugin from v0.4.0 to [v0.5.0](https://github.com/poseidon/terraform-provider-ct/releases/tag/v0.5.0)
* Set Fedora CoreOS log driver back to the default `journald` ([#681](https://github.com/poseidon/typhoon/pull/681))
* Deprecate `asset_dir` variable and remove docs ([#678](https://github.com/poseidon/typhoon/pull/678))
* Deprecate support for [gitRepo](https://kubernetes.io/docs/concepts/storage/volumes/#gitrepo) volumes. A future release will drop support.
#### AWS
* Fix Fedora CoreOS AMI to filter for stable images ([#685](https://github.com/poseidon/typhoon/pull/685))
* Latest Fedora CoreOS `testing` or `bodhi-update` images could be chosen depending on the region
#### Bare-Metal
* Update Fedora CoreOS default `os_stream` from testing to stable
#### Google Cloud
* Known: Use of stale Fedora CoreOS image may require terraform re-apply during bootstrap ([#687](https://github.com/poseidon/typhoon/pull/687))
#### DigitalOcean
* Rename `image` variable to `os_image` for consistency ([#677](https://github.com/poseidon/typhoon/pull/677)) (action required)
#### Addons
* Update Prometheus from v2.16.0 to [v2.17.1](https://github.com/prometheus/prometheus/releases/tag/v2.17.1)
* Update Grafana from v6.6.2 to [v6.7.1](https://github.com/grafana/grafana/releases/tag/v6.7.1)
## v1.17.4
* Kubernetes [v1.17.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1174)
* Update etcd from v3.4.3 to [v3.4.4](https://github.com/etcd-io/etcd/releases/tag/v3.4.4)
* On Container Linux, fetch using the docker transport format ([#659](https://github.com/poseidon/typhoon/pull/659))
* Update CoreDNS from v1.6.6 to v1.6.7 ([#648](https://github.com/poseidon/typhoon/pull/648))
* Update Calico from v3.12.0 to [v3.13.1](https://docs.projectcalico.org/v3.13/release-notes/)
#### AWS
* Promote Fedora CoreOS to stable ([#668](https://github.com/poseidon/typhoon/pull/668))
* Allow VPC route table extension via reference ([#654](https://github.com/poseidon/typhoon/pull/654))
* Fix `worker_node_labels` on Fedora CoreOS ([#651](https://github.com/poseidon/typhoon/pull/651))
* Fix automatic worker node delete on shutdown on Fedora CoreOS ([#657](https://github.com/poseidon/typhoon/pull/657))
#### Azure
* Upgrade to `terraform-provider-azurerm` [v2.0+](https://www.terraform.io/docs/providers/azurerm/guides/2.0-upgrade-guide.html) (action required)
* Change `worker_priority` from `Low` to `Spot` if used (action required)
* Switch to Azure's new Linux VM and Linux VM Scale Set resources
* Set controller's Azure disk caching to None
* Associate subnets (in addition to NICs) with security groups (aesthetic)
* Add support for Flatcar Container Linux ([#664](https://github.com/poseidon/typhoon/pull/664))
* Requires accepting Flatcar Linux Azure Marketplace terms
#### Bare-Metal
* Add `worker_node_labels` map variable for per-worker node labels ([#663](https://github.com/poseidon/typhoon/pull/663))
* Add `worker_node_taints` map variable for per-worker node taints ([#663](https://github.com/poseidon/typhoon/pull/663))
#### DigitalOcean
* Add support for Flatcar Container Linux ([#644](https://github.com/poseidon/typhoon/pull/644))
#### Google Cloud
* Promote Fedora CoreOS to beta ([#668](https://github.com/poseidon/typhoon/pull/668))
* Fix `worker_node_labels` on Fedora CoreOS ([#651](https://github.com/poseidon/typhoon/pull/651))
* Fix automatic worker node delete on shutdown on Fedora CoreOS ([#657](https://github.com/poseidon/typhoon/pull/657))
#### Addons
* Update nginx-ingress from v0.28.0 to [v0.30.0](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.30.0)
* Update Prometheus from v2.15.2 to [v2.16.0](https://github.com/prometheus/prometheus/releases/tag/v2.16.0)
* Refresh Prometheus rules and alerts
* Add a BlackboxProbeFailure alert
* Update kube-state-metrics from v1.9.4 to v1.9.5
* Update node-exporter from v0.18.1 to [v1.0.0-rc.0](https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.0)
* Update Grafana from v6.6.1 to v6.6.2
* Refresh Grafana dashboards
* Remove Container Linux Update Operator (CLUO) addon example ([#667](https://github.com/poseidon/typhoon/pull/667))
* CLUO hasn't been in active use in our clusters and won't be relevant
beyond Container Linux. Requires patches for use on Kubernetes v1.16+
## v1.17.3
* Kubernetes [v1.17.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1173)
* Update Calico from v3.11.2 to v3.12.0
* Allow Fedora CoreOS clusters to pass CNCF conformance suite
* Set Docker log driver to `json-file` as a workaround
* Try Fedora CoreOS or Flatcar Linux alongside CoreOS [Container Linux](https://coreos.com/os/eol/) clusters (recommended)
#### AWS
* Promote Fedora CoreOS to beta ([#645](https://github.com/poseidon/typhoon/pull/645))
#### Bare-Metal
* Promote Fedora CoreOS to beta ([#645](https://github.com/poseidon/typhoon/pull/645))
* Add Fedora CoreOS kernel arguments initrd and console ([#640](https://github.com/poseidon/typhoon/pull/640))
#### Google Cloud
* Add Terraform module for Fedora CoreOS ([#632](https://github.com/poseidon/typhoon/pull/632))
* Add support for Flatcar Container Linux ([#639](https://github.com/poseidon/typhoon/pull/639))
#### Addons
* Update nginx-ingress from v0.27.1 to v0.28.0
* Update kube-state-metrics from v1.9.3 to v1.9.4
* Update Grafana from v6.5.3 to v6.6.1
## v1.17.2
* Kubernetes [v1.17.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1172)
#### AWS
* Promote Fedora CoreOS from preview to alpha
#### Bare-Metal
* Promote Fedora CoreOS from preview to alpha
* Update Fedora CoreOS images location
* Use Fedora CoreOS production [download](https://getfedora.org/coreos/download/) streams
* Use live PXE kernel and initramfs images
#### Addons
* Update nginx-ingress from v0.26.1 to [v0.27.1](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.1) ([#625](https://github.com/poseidon/typhoon/pull/625))
* Change runAsUser from 33 to 101 for alpine-based image
* Update kube-state-metrics from v1.9.2 to v1.9.3
## v1.17.1
* Kubernetes [v1.17.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1171)
* Update CoreDNS from v1.6.5 to [v1.6.6](https://coredns.io/2019/12/11/coredns-1.6.6-release/) ([#602](https://github.com/poseidon/typhoon/pull/602))
* Update Calico from v3.10.2 to v3.11.2 ([#604](https://github.com/poseidon/typhoon/pull/604))
@ -12,6 +190,10 @@ Notable changes between versions.
* Enable kube-proxy metrics and allow Prometheus scrapes
* Allow TCP/10249 traffic with worker node sources
#### AWS
* Update Fedora CoreOS AMI filter for fedora-coreos-31 ([#620](https://github.com/poseidon/typhoon/pull/620))
#### Google
* Allow `terraform-provider-google` v3.0+ ([#617](https://github.com/poseidon/typhoon/pull/617))
@ -247,7 +429,7 @@ Notable changes between versions.
* Require `terraform-provider-azurerm` v1.27+ to support Terraform v0.12 (action required)
* Avoid unneeded rotations of Regular priority virtual machine scale sets
* Azure only allows `eviction_policy` to be set for Low priority VMs. Supporting Low priority VMs meant when Regular VMs were used, each `terraform apply` rolled workers, to set eviction_policy to null.
* Terraform v0.12 nullable variables fix the issue so plan does not produce a diff.
* Terraform v0.12 nullable variables fix the issue so plan does not produce a diff.
#### Bare-Metal
@ -302,7 +484,7 @@ Notable changes between versions.
* Update Grafana from v6.1.6 to v6.2.1
## v1.14.2
* Kubernetes [v1.14.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#v1142)
* Update etcd from v3.3.12 to [v3.3.13](https://github.com/etcd-io/etcd/releases/tag/v3.3.13)
* Upgrade Calico from v3.6.1 to [v3.7.2](https://docs.projectcalico.org/v3.7/release-notes/)
@ -373,7 +555,7 @@ Notable changes between versions.
* Add ability to load balance TCP/UDP applications ([#442](https://github.com/poseidon/typhoon/pull/442))
* Add worker instances to a target pool, output as `worker_target_pool`
* Health check for workers with Ingress controllers. Forward rules don't support differing internal/external ports, but some Ingress controllers support TCP/UDP proxy as a workaround
* Health check for workers with Ingress controllers. Forward rules don't support differing internal/external ports, but some Ingress controllers support TCP/UDP proxy as a workaround
* Remove Haswell minimum CPU platform requirement ([#439](https://github.com/poseidon/typhoon/pull/439))
* Google Cloud API implements `min_cpu_platform` to mean "use exactly this CPU". Revert [#405](https://github.com/poseidon/typhoon/pull/405) added in v1.13.4.
* Fix error creating clusters in new regions without Haswell (e.g. europe-west2) ([#438](https://github.com/poseidon/typhoon/issues/438))
@ -558,7 +740,7 @@ Notable changes between versions.
* Update Calico from v3.3.0 to [v3.3.1](https://docs.projectcalico.org/v3.3/releases/)
* Disable Felix usage reporting by default ([#345](https://github.com/poseidon/typhoon/pull/345))
* Improve flannel manifests
* [Rename](https://github.com/poseidon/terraform-render-bootkube/commit/d045a8e6b8eccfbb9d69bb51953b5a93d23f67f7) `kube-flannel` DaemonSet to `flannel` and `kube-flannel-cfg` ConfigMap to `flannel-config`
* [Rename](https://github.com/poseidon/terraform-render-bootkube/commit/d045a8e6b8eccfbb9d69bb51953b5a93d23f67f7) `kube-flannel` DaemonSet to `flannel` and `kube-flannel-cfg` ConfigMap to `flannel-config`
* [Drop](https://github.com/poseidon/terraform-render-bootkube/commit/39f9afb3360ec642e5b98457c8bd07eda35b6c96) unused mounts and add a CPU resource request
* Update CoreDNS from v1.2.4 to [v1.2.6](https://coredns.io/2018/11/05/coredns-1.2.6-release/)
* Enable CoreDNS `loop` and `loadbalance` plugins ([#340](https://github.com/poseidon/typhoon/pull/340))
@ -720,7 +902,7 @@ Notable changes between versions.
* Force apiserver to stop listening on `127.0.0.1:8080`
* Replace `kube-dns` with [CoreDNS](https://coredns.io/) ([#261](https://github.com/poseidon/typhoon/pull/261))
* Edit the `coredns` ConfigMap to [customize](https://coredns.io/plugins/)
* CoreDNS doesn't use a resizer. For large clusters, scaling may be required.
* CoreDNS doesn't use a resizer. For large clusters, scaling may be required.
#### AWS
@ -765,7 +947,7 @@ Notable changes between versions.
* Switch `kube-apiserver` port from 443 to 6443 ([#248](https://github.com/poseidon/typhoon/pull/248))
* Users who exposed kube-apiserver on a WAN via their router/load-balancer will need to adjust its configuration (e.g. DNAT 6443). Most apiservers are on a LAN (internal, VPN-only, etc) so if you didn't specially configure network gear for 443, no change is needed. (possible action required)
* Fix possible deadlock when provisioning clusters larger than 10 nodes ([#244](https://github.com/poseidon/typhoon/pull/244))
* Fix possible deadlock when provisioning clusters larger than 10 nodes ([#244](https://github.com/poseidon/typhoon/pull/244))
#### DigitalOcean
@ -833,7 +1015,7 @@ Notable changes between versions.
* Please change values stable, beta, or alpha to coreos-stable, coreos-beta, coreos-alpha (**action required!**)
* Replace `container_linux_version` variable with `os_version`
* Add `network_ip_autodetection_method` variable for Calico host IPv4 address detection
* Use Calico's default "first-found" to support single NIC and bonded NIC nodes
* Use Calico's default "first-found" to support single NIC and bonded NIC nodes
* Allow [alternative](https://docs.projectcalico.org/v3.1/reference/node/configuration#ip-autodetection-methods) methods for multi NIC nodes, like can-reach=IP or interface=REGEX
* Deprecate `container_linux_oem` variable
@ -866,7 +1048,7 @@ Notable changes between versions.
#### Google Cloud
* Add support for multi-controller clusters (i.e. multi-master) ([#54](https://github.com/poseidon/typhoon/issues/54), [#190](https://github.com/poseidon/typhoon/pull/190))
* Switch from Google Cloud network load balancer to a TCP proxy load balancer. Avoid a [bug](https://issuetracker.google.com/issues/67366622) in Google network load balancers that limited clusters to only bootstrapping one controller node.
* Switch from Google Cloud network load balancer to a TCP proxy load balancer. Avoid a [bug](https://issuetracker.google.com/issues/67366622) in Google network load balancers that limited clusters to only bootstrapping one controller node.
* Add TCP health check for apiserver pods on controllers. Replace kubelet check approximation.
#### Addons
@ -1097,7 +1279,7 @@ Notable changes between versions.
* Container Linux stable, beta, and alpha now provide Docker 17.09 (instead
of 1.12)
* Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09
* Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69))
* Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69))
* Add optional `cluster_domain_suffix` variable (#74)
* Use kubernetes-incubator/bootkube v0.9.1

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.17.1 (upstream)
* Kubernetes v1.18.1 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/cl/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
@ -21,26 +21,41 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
Typhoon provides a Terraform Module for each supported operating system and platform.
Typhoon is available for [Fedora CoreOS](https://getfedora.org/coreos/).
| Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------|
| AWS | Container Linux / Flatcar Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable |
| AWS | Fedora CoreOS | [aws/fedora-coreos/kubernetes](aws/fedora-coreos/kubernetes) | stable |
| Bare-Metal | Fedora CoreOS | [bare-metal/fedora-coreos/kubernetes](bare-metal/fedora-coreos/kubernetes) | beta |
| DigitalOcean | Fedora CoreOS | [digital-ocean/fedora-coreos/kubernetes](digital-ocean/fedora-coreos/kubernetes) | alpha |
| Google Cloud | Fedora CoreOS | [google-cloud/fedora-coreos/kubernetes](google-cloud/fedora-coreos/kubernetes) | beta |
Typhoon is available for [Flatcar Container Linux](https://www.flatcar-linux.org/releases/).
| Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------|
| AWS | Flatcar Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable |
| Azure | Flatcar Linux | [azure/container-linux/kubernetes](azure/container-linux/kubernetes) | alpha |
| Bare-Metal | Flatcar Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
| DigitalOcean | Flatcar Linux | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | alpha |
| Google Cloud | Flatcar Linux | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | alpha |
Typhoon is available for CoreOS Container Linux ([no updates](https://coreos.com/os/eol/) after May 2020).
| Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------|
| AWS | Container Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable |
| Azure | Container Linux | [azure/container-linux/kubernetes](azure/container-linux/kubernetes) | alpha |
| Bare-Metal | Container Linux / Flatcar Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
| Bare-Metal | Container Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
| Digital Ocean | Container Linux | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta |
| Google Cloud | Container Linux | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | stable |
A preview of Typhoon for [Fedora CoreOS](https://getfedora.org/coreos/) is available for testing.
| Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------|
| AWS | Fedora CoreOS | [aws/fedora-coreos/kubernetes](aws/fedora-coreos/kubernetes) | preview |
| Bare-Metal | Fedora CoreOS | [bare-metal/fedora-coreos/kubernetes](bare-metal/fedora-coreos/kubernetes) | preview |
## Documentation
* [Docs](https://typhoon.psdn.io)
* Architecture [concepts](https://typhoon.psdn.io/architecture/concepts/) and [operating systems](https://typhoon.psdn.io/architecture/operating-systems/)
* Tutorials for [AWS](docs/cl/aws.md), [Azure](docs/cl/azure.md), [Bare-Metal](docs/cl/bare-metal.md), [Digital Ocean](docs/cl/digital-ocean.md), and [Google-Cloud](docs/cl/google-cloud.md)
* Fedora CoreOS tutorials for [AWS](docs/fedora-coreos/aws.md), [Bare-Metal](docs/fedora-coreos/bare-metal.md), [DigitalOcean](docs/fedora-coreos/digitalocean.md), and [Google Cloud](docs/fedora-coreos/google-cloud.md)
* Flatcar Linux tutorials for [AWS](docs/cl/aws.md), [Azure](docs/cl/azure.md), [Bare-Metal](docs/cl/bare-metal.md), [DigitalOcean](docs/cl/digital-ocean.md), and [Google Cloud](docs/cl/google-cloud.md)
## Usage
@ -48,7 +63,7 @@ Define a Kubernetes cluster by using the Terraform module for your chosen platfo
```tf
module "yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.17.1"
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.18.1"
# Google Cloud
cluster_name = "yavin"
@ -58,7 +73,7 @@ module "yavin" {
# configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
# optional
worker_count = 2
worker_preemptible = true
@ -87,9 +102,9 @@ In 4-8 minutes (varies by platform), the cluster will be ready. This Google Clou
$ export KUBECONFIG=/home/user/.kube/configs/yavin-config
$ kubectl get nodes
NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.17.1
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.17.1
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.17.1
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.18.1
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.18.1
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.18.1
```
List the pods.

View File

@ -1,4 +0,0 @@
apiVersion: v1
kind: Namespace
metadata:
name: reboot-coordinator

View File

@ -1,12 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: reboot-coordinator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: reboot-coordinator
subjects:
- kind: ServiceAccount
namespace: reboot-coordinator
name: default

View File

@ -1,45 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: reboot-coordinator
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- get
- update
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- delete
- apiGroups:
- "extensions"
resources:
- daemonsets
verbs:
- get

View File

@ -1,68 +0,0 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: container-linux-update-agent
namespace: reboot-coordinator
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
name: container-linux-update-agent
template:
metadata:
labels:
name: container-linux-update-agent
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: update-agent
image: quay.io/coreos/container-linux-update-operator:v0.7.0
command:
- "/bin/update-agent"
env:
# read by update-agent as the node name to manage reboots for
- name: UPDATE_AGENT_NODE
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 10m
memory: 20Mi
limits:
cpu: 20m
memory: 40Mi
volumeMounts:
- mountPath: /var/run/dbus
name: var-run-dbus
- mountPath: /etc/coreos
name: etc-coreos
- mountPath: /usr/share/coreos
name: usr-share-coreos
- mountPath: /etc/os-release
name: etc-os-release
volumes:
- name: var-run-dbus
hostPath:
path: /var/run/dbus
- name: etc-coreos
hostPath:
path: /etc/coreos
- name: usr-share-coreos
hostPath:
path: /usr/share/coreos
- name: etc-os-release
hostPath:
path: /etc/os-release

View File

@ -1,39 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: container-linux-update-operator
namespace: reboot-coordinator
spec:
replicas: 1
selector:
matchLabels:
name: container-linux-update-operator
template:
metadata:
labels:
name: container-linux-update-operator
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: update-operator
image: quay.io/coreos/container-linux-update-operator:v0.7.0
command:
- "/bin/update-operator"
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 10m
memory: 20Mi
limits:
cpu: 20m
memory: 40Mi

View File

@ -21,7 +21,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -558,15 +558,15 @@ data:
},
"id": 8,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -649,15 +649,15 @@ data:
},
"id": 9,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -753,15 +753,15 @@ data:
},
"id": 10,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -857,15 +857,15 @@ data:
},
"id": 11,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -955,15 +955,15 @@ data:
},
"id": 12,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1066,17 +1066,17 @@ data:
},
"id": 13,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"hideEmpty": "true",
"hideZero": "true",
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1159,17 +1159,17 @@ data:
},
"id": 14,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"hideEmpty": "true",
"hideZero": "true",
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1265,17 +1265,17 @@ data:
},
"id": 15,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"hideEmpty": "true",
"hideZero": "true",
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1371,15 +1371,15 @@ data:
},
"id": 16,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1462,15 +1462,15 @@ data:
},
"id": 17,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1567,15 +1567,15 @@ data:
},
"id": 18,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1658,15 +1658,15 @@ data:
},
"id": 19,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1762,15 +1762,15 @@ data:
},
"id": 20,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1991,15 +1991,15 @@ data:
},
"id": 22,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -2373,8 +2373,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -2427,7 +2427,7 @@ data:
"options": [
],
"query": "label_values(kubelet_runtime_operations{cluster=\"$cluster\", job=\"kubelet\"}, instance)",
"query": "label_values(kubelet_runtime_operations_total{cluster=\"$cluster\", job=\"kubelet\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 1,
@ -2496,7 +2496,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -2691,15 +2691,15 @@ data:
},
"id": 4,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -2886,15 +2886,15 @@ data:
},
"id": 6,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -3206,15 +3206,15 @@ data:
},
"id": 9,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -3588,8 +3588,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,

View File

@ -59,7 +59,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "1 - avg(rate(node_cpu_seconds_total{mode=\"idle\", cluster=\"$cluster\"}[1m]))",
"expr": "1 - avg(rate(node_cpu_seconds_total{mode=\"idle\", cluster=\"$cluster\"}[$__interval]))",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
@ -1561,7 +1561,7 @@ data:
],
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -1570,7 +1570,7 @@ data:
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -1579,7 +1579,7 @@ data:
"step": 10
},
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -1588,7 +1588,7 @@ data:
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -1597,7 +1597,7 @@ data:
"step": 10
},
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -1606,7 +1606,7 @@ data:
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -1706,7 +1706,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
@ -1804,7 +1804,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
@ -1902,7 +1902,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
@ -2000,7 +2000,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
@ -2098,7 +2098,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
@ -2196,7 +2196,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
@ -2294,7 +2294,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
@ -2392,7 +2392,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$interval])) by (namespace)",
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\".+\"}[$__interval])) by (namespace)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{namespace}}",
@ -2458,8 +2458,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -2501,37 +2501,29 @@ data:
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
"text": "",
"value": ""
},
"datasource": "prometheus",
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"name": "cluster",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"query": "label_values(node_cpu_seconds_total, cluster)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"type": "query",
"useTags": false
}
]
@ -2586,6 +2578,354 @@ data:
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "100px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) / sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation (from requests)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate{cluster=\"$cluster\", namespace=\"$namespace\"}) / sum(kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilisation (from limits)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) / sum(kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilization (from requests)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"format": "percentunit",
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\",container!=\"\"}) / sum(kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 2,
"refId": "A"
}
],
"thresholds": "70,80",
"timeFrom": null,
"timeShift": null,
"title": "Memory Utilisation (from limits)",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "singlestat",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Headlines",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
@ -2599,7 +2939,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"id": 5,
"legend": {
"avg": false,
"current": false,
@ -2620,7 +2960,26 @@ data:
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "quota - requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
},
{
"alias": "quota - limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
}
],
"spaceLength": 10,
"span": 12,
@ -2634,6 +2993,22 @@ data:
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - requests",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - limits",
"legendLink": null,
"step": 10
}
],
"thresholds": [
@ -2697,7 +3072,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"id": 6,
"legend": {
"avg": false,
"current": false,
@ -2964,7 +3339,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"id": 7,
"legend": {
"avg": false,
"current": false,
@ -2985,7 +3360,26 @@ data:
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "quota - requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
},
{
"alias": "quota - limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
}
],
"spaceLength": 10,
"span": 12,
@ -2999,6 +3393,22 @@ data:
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - requests",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - limits",
"legendLink": null,
"step": 10
}
],
"thresholds": [
@ -3062,7 +3472,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"id": 8,
"legend": {
"avg": false,
"current": false,
@ -3410,7 +3820,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"id": 9,
"legend": {
"avg": false,
"current": false,
@ -3588,7 +3998,7 @@ data:
],
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -3597,7 +4007,7 @@ data:
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -3606,7 +4016,7 @@ data:
"step": 10
},
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -3615,7 +4025,7 @@ data:
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -3624,7 +4034,7 @@ data:
"step": 10
},
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -3633,7 +4043,7 @@ data:
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -3698,398 +4108,6 @@ data:
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
@ -4125,7 +4143,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -4138,7 +4156,7 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
@ -4223,7 +4241,399 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 12,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 13,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 14,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 15,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -4289,8 +4699,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -4306,13 +4716,13 @@ data:
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"label": null,
"multi": false,
"name": "cluster",
"options": [
@ -4321,7 +4731,7 @@ data:
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -4333,13 +4743,13 @@ data:
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"label": null,
"multi": false,
"name": "namespace",
"options": [
@ -4348,48 +4758,13 @@ data:
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"type": "query",
"useTags": false
}
]
@ -5265,8 +5640,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -5282,13 +5657,13 @@ data:
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"label": null,
"multi": false,
"name": "cluster",
"options": [
@ -5297,7 +5672,7 @@ data:
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -5309,13 +5684,13 @@ data:
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "node",
"label": null,
"multi": false,
"name": "node",
"options": [
@ -5324,7 +5699,7 @@ data:
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, node)",
"refresh": 1,
"regex": "",
"sort": 2,
"sort": 1,
"tagValuesQuery": "",
"tags": [

View File

@ -50,7 +50,24 @@ data:
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "requests",
"color": "#F2495C",
"fill": 0,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
},
{
"alias": "limits",
"color": "#FF9830",
"fill": 0,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
}
],
"spaceLength": 10,
"span": 12,
@ -64,6 +81,22 @@ data:
"legendFormat": "{{container}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "requests",
"legendLink": null,
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits_cpu_cores{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limits",
"legendLink": null,
"step": 10
}
],
"thresholds": [
@ -126,8 +159,113 @@ data:
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fill": 10,
"id": 2,
"legend": {
"avg": false,
"current": true,
"max": true,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(container_cpu_cfs_throttled_periods_total{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", cluster=\"$cluster\"}[5m])) by (container) /sum(increase(container_cpu_cfs_periods_total{namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", cluster=\"$cluster\"}[5m])) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
{
"colorMode": "critical",
"fill": true,
"line": true,
"op": "gt",
"value": 1,
"yaxis": "left"
}
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Throttling",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Throttling",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 3,
"legend": {
"avg": false,
"current": false,
@ -394,7 +532,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"id": 4,
"legend": {
"avg": false,
"current": false,
@ -415,7 +553,26 @@ data:
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
},
{
"alias": "limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
}
],
"spaceLength": 10,
"span": 12,
@ -423,26 +580,26 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_rss{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)",
"expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}} (RSS)",
"legendFormat": "{{container}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum(container_memory_cache{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)",
"expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}} (Cache)",
"legendFormat": "requests",
"legendLink": null,
"step": 10
},
{
"expr": "sum(container_memory_swap{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\"}) by (container)",
"expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}} (Swap)",
"legendFormat": "limits",
"legendLink": null,
"step": 10
}
@ -508,7 +665,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"id": 5,
"legend": {
"avg": false,
"current": false,
@ -850,104 +1007,6 @@ data:
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
@ -983,7 +1042,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_receive_bytes_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -996,7 +1055,7 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
@ -1081,7 +1140,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_transmit_bytes_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -1094,7 +1153,7 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
@ -1179,7 +1238,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_receive_packets_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -1192,7 +1251,7 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 0,
@ -1277,7 +1336,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_transmit_packets_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -1290,7 +1349,7 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 0,
@ -1375,7 +1434,105 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$interval])) by (pod)",
"expr": "sum(irate(container_network_receive_packets_dropped_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 11,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{namespace=~\"$namespace\", pod=~\"$pod\"}[$__interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -1441,8 +1598,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -1458,13 +1615,13 @@ data:
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"label": null,
"multi": false,
"name": "cluster",
"options": [
@ -1473,7 +1630,7 @@ data:
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -1485,13 +1642,13 @@ data:
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"label": null,
"multi": false,
"name": "namespace",
"options": [
@ -1500,75 +1657,40 @@ data:
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "pod",
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=\"$namespace\"}, pod)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=\"$namespace\"}, pod)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
@ -2544,7 +2666,7 @@ data:
],
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -2553,7 +2675,7 @@ data:
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -2562,7 +2684,7 @@ data:
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -2571,7 +2693,7 @@ data:
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -2580,7 +2702,7 @@ data:
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -2589,7 +2711,7 @@ data:
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -2689,7 +2811,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -2787,7 +2909,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -2885,7 +3007,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -2983,7 +3105,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -3081,7 +3203,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -3179,7 +3301,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -3277,7 +3399,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -3375,7 +3497,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod) \ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
@ -3441,8 +3563,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -3458,13 +3580,13 @@ data:
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"label": null,
"multi": false,
"name": "cluster",
"options": [
@ -3473,7 +3595,7 @@ data:
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -3485,13 +3607,13 @@ data:
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"label": null,
"multi": false,
"name": "namespace",
"options": [
@ -3500,7 +3622,7 @@ data:
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -3512,13 +3634,13 @@ data:
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "workload",
"label": null,
"multi": false,
"name": "workload",
"options": [
@ -3527,7 +3649,7 @@ data:
"query": "label_values(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\"}, workload)",
"refresh": 1,
"regex": "",
"sort": 2,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -3539,13 +3661,13 @@ data:
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "type",
"label": null,
"multi": false,
"name": "type",
"options": [
@ -3554,48 +3676,13 @@ data:
"query": "label_values(mixin_pod_workload{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\"}, workload_type)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"type": "query",
"useTags": false
}
]
@ -3684,7 +3771,26 @@ data:
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "quota - requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
},
{
"alias": "quota - limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
}
],
"spaceLength": 10,
"span": 12,
@ -3698,6 +3804,22 @@ data:
"legendFormat": "{{workload}} - {{workload_type}}",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - requests",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - limits",
"legendLink": null,
"step": 10
}
],
"thresholds": [
@ -4094,7 +4216,26 @@ data:
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "quota - requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
},
{
"alias": "quota - limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": false,
"linewidth": 2,
"stack": false
}
],
"spaceLength": 10,
"span": 12,
@ -4108,6 +4249,22 @@ data:
"legendFormat": "{{workload}} - {{workload_type}}",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - requests",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - limits",
"legendLink": null,
"step": 10
}
],
"thresholds": [
@ -4679,7 +4836,7 @@ data:
],
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -4688,7 +4845,7 @@ data:
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -4697,7 +4854,7 @@ data:
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -4706,7 +4863,7 @@ data:
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -4715,7 +4872,7 @@ data:
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -4724,7 +4881,7 @@ data:
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
@ -4824,7 +4981,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
@ -4922,7 +5079,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
@ -5020,7 +5177,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
@ -5118,7 +5275,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
@ -5216,7 +5373,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
@ -5314,7 +5471,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
@ -5412,7 +5569,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
@ -5510,7 +5667,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$interval])\n* on (namespace,pod) \ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\", namespace=~\"$namespace\"}[$__interval])\n* on (namespace,pod) \ngroup_left(workload,workload_type) mixin_pod_workload{cluster=\"$cluster\", namespace=~\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
@ -5576,8 +5733,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -5590,95 +5747,6 @@ data:
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "prometheus",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
},
{
"allValue": null,
"auto": false,
@ -5706,6 +5774,60 @@ data:
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",

View File

@ -21,7 +21,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -88,7 +88,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(up{job=\"apiserver\"})",
"expr": "sum(up{job=\"apiserver\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -155,28 +155,28 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"2..\"}[5m]))",
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"2..\", cluster=\"$cluster\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "2xx",
"refId": "A"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"3..\"}[5m]))",
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"3..\", cluster=\"$cluster\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "3xx",
"refId": "B"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"4..\"}[5m]))",
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"4..\", cluster=\"$cluster\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "4xx",
"refId": "C"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"5..\"}[5m]))",
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"5..\", cluster=\"$cluster\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5xx",
@ -237,15 +237,15 @@ data:
},
"id": 4,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -267,7 +267,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (verb, le))",
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\", verb!=\"WATCH\", cluster=\"$cluster\"}[5m])) by (verb, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{verb}}",
@ -371,7 +371,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(workqueue_adds_total{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (instance, name)",
"expr": "sum(rate(workqueue_adds_total{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance, name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}",
@ -462,7 +462,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(workqueue_depth{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (instance, name)",
"expr": "sum(rate(workqueue_depth{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance, name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}",
@ -523,15 +523,15 @@ data:
},
"id": 7,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -553,7 +553,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (instance, name, le))",
"expr": "histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance, name, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}",
@ -657,7 +657,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "etcd_helper_cache_entry_total{job=\"apiserver\", instance=~\"$instance\"}",
"expr": "etcd_helper_cache_entry_total{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -748,14 +748,14 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_helper_cache_hit_total{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (intance)",
"expr": "sum(rate(etcd_helper_cache_hit_total{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} hit",
"refId": "A"
},
{
"expr": "sum(rate(etcd_helper_cache_miss_total{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (instance)",
"expr": "sum(rate(etcd_helper_cache_miss_total{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} miss",
@ -846,14 +846,14 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99,sum(rate(etcd_request_cache_get_duration_seconds_bucket{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99,sum(rate(etcd_request_cache_get_duration_seconds_bucket{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} get",
"refId": "A"
},
{
"expr": "histogram_quantile(0.99,sum(rate(etcd_request_cache_add_duration_seconds_bucket{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99,sum(rate(etcd_request_cache_add_duration_seconds_bucket{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} miss",
@ -957,7 +957,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes{job=\"apiserver\",instance=~\"$instance\"}",
"expr": "process_resident_memory_bytes{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -1048,7 +1048,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(process_cpu_seconds_total{job=\"apiserver\",instance=~\"$instance\"}[5m])",
"expr": "rate(process_cpu_seconds_total{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -1139,7 +1139,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "go_goroutines{job=\"apiserver\",instance=~\"$instance\"}",
"expr": "go_goroutines{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -1205,8 +1205,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -1219,6 +1219,33 @@ data:
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(apiserver_request_total, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
@ -1233,7 +1260,7 @@ data:
"options": [
],
"query": "label_values(apiserver_request_total{job=\"apiserver\"}, instance)",
"query": "label_values(apiserver_request_total{job=\"apiserver\", cluster=\"$cluster\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 1,
@ -1302,7 +1329,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -1406,15 +1433,15 @@ data:
},
"id": 3,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1510,15 +1537,15 @@ data:
},
"id": 4,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1614,15 +1641,15 @@ data:
},
"id": 5,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1934,15 +1961,15 @@ data:
},
"id": 8,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -2316,8 +2343,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -2413,7 +2440,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -2815,8 +2842,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -2943,664 +2970,6 @@ data:
"uid": "919b92a8e8041bd567af9edab12c840c",
"version": 0
}
pods.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "$datasource",
"enable": true,
"expr": "time() == BOOL timestamp(rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[2m]) > 0)",
"hide": false,
"iconColor": "rgba(215, 44, 44, 1)",
"name": "Restarts",
"showIn": 0,
"tags": [
"restart"
],
"type": "rows"
}
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(container) (container_memory_usage_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container=~\"$container\", container!=\"POD\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Current: {{ container }}",
"refId": "A"
},
{
"expr": "sum by(container) (kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Requested: {{ container }}",
"refId": "B"
},
{
"expr": "sum by(container) (kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Limit: {{ container }}",
"refId": "C"
},
{
"expr": "sum by(container) (container_memory_cache{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\", container=~\"$container\", container!=\"POD\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Cache: {{ container }}",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (container) (irate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", image!=\"\", pod=\"$pod\", container=~\"$container\", container!=\"POD\"}[4m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Current: {{ container }}",
"refId": "A"
},
{
"expr": "sum by(container) (kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Requested: {{ container }}",
"refId": "B"
},
{
"expr": "sum by(container) (kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Limit: {{ container }}",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum by (pod) (irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[4m])))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "RX: {{ pod }}",
"refId": "A"
},
{
"expr": "sort_desc(sum by (pod) (irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[4m])))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "TX: {{ pod }}",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network I/O",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 5,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by (container) (kube_pod_container_status_restarts_total{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Restarts: {{ container }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Total Restarts Per Container",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Pod",
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=~\"$namespace\"}, pod)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "Container",
"multi": false,
"name": "container",
"options": [
],
"query": "label_values(kube_pod_container_info{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}, container)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Pods",
"uid": "ab4f13a9892a76a4d21ce8c2445bf4ea",
"version": 0
}
scheduler.json: |-
{
"__inputs": [
@ -3622,7 +2991,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -3726,15 +3095,15 @@ data:
},
"id": 3,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -3838,15 +3207,15 @@ data:
},
"id": 4,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -4179,15 +3548,15 @@ data:
},
"id": 7,
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -4561,8 +3930,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
@ -4637,910 +4006,6 @@ data:
"uid": "2e6b6a3b4bddf1427b3a55aa1311c656",
"version": 0
}
statefulset.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 2,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "cores",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$statefulset.*\"}[3m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "CPU",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 3,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "GB",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$statefulset.*\"}) / 1024^3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Memory",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 4,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "Bps",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$statefulset.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\",pod=~\"$statefulset.*\"}[3m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Network",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"height": "100px",
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_replicas{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Desired Replicas",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 6,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "min(kube_statefulset_status_replicas_current{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Replicas of current version",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 7,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_status_observed_generation{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Observed Generation",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 8,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Metadata Generation",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(kube_statefulset_replicas{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas specified",
"refId": "A"
},
{
"expr": "max(kube_statefulset_status_replicas{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas created",
"refId": "B"
},
{
"expr": "min(kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "ready",
"refId": "C"
},
{
"expr": "min(kube_statefulset_status_replicas_current{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas of current version",
"refId": "D"
},
{
"expr": "min(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "updated",
"refId": "E"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Replicas",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Name",
"multi": false,
"name": "statefulset",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\"}, statefulset)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / StatefulSets",
"uid": "a31c1f46e6f727cb37c0d731a7245005",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s

View File

@ -2,6 +2,12 @@ apiVersion: v1
data:
prometheus-remote-write.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
@ -11,14 +17,15 @@ data:
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "10s",
"refresh": "",
"rows": [
{
"collapse": false,
"height": "250px",
"collapsed": false,
"panels": [
{
"aliasColors": {
@ -29,12 +36,17 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
@ -44,11 +56,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -58,12 +71,11 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"} - ignoring(queue) group_right(instance) prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}",
"expr": "(\n prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"} \n- \n ignoring(queue) group_right(instance) prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"refId": "A"
}
],
"thresholds": [
@ -89,11 +101,11 @@ data:
},
"yaxes": [
{
"format": "s",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -102,7 +114,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
@ -115,12 +127,17 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
@ -130,11 +147,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -144,12 +162,11 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) - ignoring (queue) group_right(instance) rate(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"expr": "(\n rate(prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) \n- \n ignoring (queue) group_right(instance) rate(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"refId": "A"
}
],
"thresholds": [
@ -179,7 +196,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -188,7 +205,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
}
@ -198,11 +215,12 @@ data:
"repeatRowId": null,
"showTitle": true,
"title": "Timestamps",
"titleSize": "h6"
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"height": "250px",
"collapsed": false,
"panels": [
{
"aliasColors": {
@ -213,12 +231,17 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 3,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
@ -228,11 +251,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -242,12 +266,11 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(prometheus_remote_storage_samples_in_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])- ignoring(queue) group_right(instance) rate(prometheus_remote_storage_succeeded_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) - rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"expr": "rate(\n prometheus_remote_storage_samples_in_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n- \n ignoring(queue) group_right(instance) rate(prometheus_remote_storage_succeeded_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) \n- \n rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"refId": "A"
}
],
"thresholds": [
@ -277,7 +300,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -286,7 +309,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
}
@ -296,11 +319,12 @@ data:
"repeatRowId": null,
"showTitle": true,
"title": "Samples",
"titleSize": "h6"
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"height": "250px",
"collapsed": false,
"panels": [
{
"aliasColors": {
@ -311,12 +335,17 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"gridPos": {
},
"id": 5,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
@ -326,16 +355,18 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"minSpan": 6,
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
@ -344,8 +375,7 @@ data:
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"refId": "A"
}
],
"thresholds": [
@ -353,7 +383,7 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "Num. Shards",
"title": "Current Shards",
"tooltip": {
"shared": true,
"sort": 0,
@ -375,7 +405,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -384,7 +414,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
@ -397,12 +427,17 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"gridPos": {
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
@ -412,11 +447,298 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_shards_max{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Max Shards",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 7,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_shards_min{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Min Shards",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_shards_desired{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Desired Shards",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Shards",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -430,8 +752,7 @@ data:
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"refId": "A"
}
],
"thresholds": [
@ -439,7 +760,7 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "Capacity",
"title": "Shard Capacity",
"tooltip": {
"shared": true,
"sort": 0,
@ -461,7 +782,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -470,7 +791,98 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_pending_samples{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Pending Samples",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
@ -479,12 +891,13 @@ data:
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Shards",
"titleSize": "h6"
"title": "Shard Details",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"height": "250px",
"collapsed": false,
"panels": [
{
"aliasColors": {
@ -495,12 +908,17 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 6,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
@ -510,11 +928,207 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_tsdb_wal_segment_current{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "TSDB Current Segment",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "none",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_wal_watcher_current_segment{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Remote Write Current Segment",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "none",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Segments",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 13,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -528,8 +1142,7 @@ data:
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"refId": "A"
}
],
"thresholds": [
@ -559,7 +1172,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -568,7 +1181,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
@ -581,12 +1194,17 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 7,
"gridPos": {
},
"id": 14,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
@ -596,11 +1214,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -614,8 +1233,7 @@ data:
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"refId": "A"
}
],
"thresholds": [
@ -645,7 +1263,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -654,7 +1272,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
@ -667,12 +1285,17 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 8,
"gridPos": {
},
"id": 15,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
@ -682,11 +1305,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -700,8 +1324,7 @@ data:
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"refId": "A"
}
],
"thresholds": [
@ -731,7 +1354,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -740,7 +1363,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
@ -753,12 +1376,17 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 9,
"gridPos": {
},
"id": 16,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
@ -768,11 +1396,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -786,8 +1415,7 @@ data:
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"refId": "A"
}
],
"thresholds": [
@ -817,7 +1445,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -826,7 +1454,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
}
@ -835,8 +1463,9 @@ data:
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Misc Rates.",
"titleSize": "h6"
"title": "Misc. Rates",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
@ -847,10 +1476,6 @@ data:
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
@ -865,23 +1490,30 @@ data:
{
"allValue": null,
"current": {
"selected": true,
"text": "All",
"value": "$__all"
"text": {
"selected": true,
"text": "All",
"value": "$__all"
},
"value": {
"selected": true,
"text": "All",
"value": "$__all"
}
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "instance",
"multi": true,
"label": null,
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(prometheus_build_info, instance)",
"refresh": 1,
"refresh": 2,
"regex": "",
"sort": 2,
"sort": 0,
"tagValuesQuery": "",
"tags": [
@ -893,23 +1525,56 @@ data:
{
"allValue": null,
"current": {
"selected": true,
"text": "All",
"value": "$__all"
"text": {
"selected": true,
"text": "All",
"value": "$__all"
},
"value": {
"selected": true,
"text": "All",
"value": "$__all"
}
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_container_info{image=~\".*prometheus.*\"}, cluster)",
"refresh": 1,
"refresh": 2,
"regex": "",
"sort": 2,
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": null,
"multi": false,
"name": "queue",
"options": [
],
"query": "label_values(prometheus_remote_storage_shards{cluster=~\"$cluster\", instance=~\"$instance\"}, queue)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
@ -921,7 +1586,7 @@ data:
]
},
"time": {
"from": "now-1h",
"from": "now-6h",
"to": "now"
},
"timepicker": {
@ -949,9 +1614,8 @@ data:
"30d"
]
},
"timezone": "utc",
"timezone": "browser",
"title": "Prometheus Remote Write",
"uid": "",
"version": 0
}
prometheus.json: |-
@ -2048,8 +2712,8 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,

View File

@ -23,7 +23,7 @@ spec:
spec:
containers:
- name: grafana
image: docker.io/grafana/grafana:6.5.3
image: docker.io/grafana/grafana:6.7.2
env:
- name: GF_PATHS_CONFIG
value: "/etc/grafana/custom.ini"

View File

@ -22,7 +22,7 @@ spec:
spec:
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.30.0
args:
- /nginx-ingress-controller
- --ingress-class=public
@ -76,6 +76,6 @@ spec:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33 # www-data
runAsUser: 101 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 300

View File

@ -22,7 +22,7 @@ spec:
spec:
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.30.0
args:
- /nginx-ingress-controller
- --ingress-class=public
@ -76,6 +76,6 @@ spec:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33 # www-data
runAsUser: 101 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 300

View File

@ -22,7 +22,7 @@ spec:
spec:
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.30.0
args:
- /nginx-ingress-controller
- --ingress-class=public
@ -73,7 +73,7 @@ spec:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33 # www-data
runAsUser: 101 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 300

View File

@ -22,7 +22,7 @@ spec:
spec:
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.30.0
args:
- /nginx-ingress-controller
- --ingress-class=public
@ -76,6 +76,6 @@ spec:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33 # www-data
runAsUser: 101 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 300

View File

@ -22,7 +22,7 @@ spec:
spec:
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.1
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.30.0
args:
- /nginx-ingress-controller
- --ingress-class=public
@ -76,6 +76,6 @@ spec:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33 # www-data
runAsUser: 101 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 300

View File

@ -20,7 +20,7 @@ spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: quay.io/prometheus/prometheus:v2.15.2
image: quay.io/prometheus/prometheus:v2.17.1
args:
- --web.listen-address=0.0.0.0:9090
- --config.file=/etc/prometheus/prometheus.yaml

View File

@ -24,7 +24,7 @@ spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.9.2
image: quay.io/coreos/kube-state-metrics:v1.9.5
ports:
- name: metrics
containerPort: 8080

View File

@ -28,7 +28,7 @@ spec:
hostPID: true
containers:
- name: node-exporter
image: quay.io/prometheus/node-exporter:v0.18.1
image: quay.io/prometheus/node-exporter:v1.0.0-rc.0
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys

View File

@ -42,10 +42,10 @@ data:
{
"alert": "etcdHighNumberOfLeaderChanges",
"annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": instance {{ $labels.instance }} has seen {{ $value }} leader changes within the last 30 minutes."
"message": "etcd cluster \"{{ $labels.job }}\": {{ $value }} leader changes within the last 15 minutes. Frequent elections may be a sign of insufficient resources, high network latency, or disruptions by other components and should be investigated."
},
"expr": "rate(etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}[15m]) > 3\n",
"for": "15m",
"expr": "increase((max by (job) (etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}) or 0*absent(etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}))[15m:1m]) >= 3\n",
"for": "5m",
"labels": {
"severity": "warning"
}
@ -145,25 +145,132 @@ data:
kube.yaml: |-
{
"groups": [
{
"name": "kube-apiserver-error",
"rules": [
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[5m]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate5m"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[30m]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate30m"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[1h]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate1h"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[2h]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate2h"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[6h]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate6h"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[1d]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate1d"
},
{
"expr": "sum by (status_class) (\n label_replace(\n rate(apiserver_request_total{job=\"apiserver\"}[3d]\n ), \"status_class\", \"${1}xx\", \"code\", \"([0-9])..\")\n)\n",
"labels": {
"job": "apiserver"
},
"record": "status_class:apiserver_request_total:rate3d"
},
{
"expr": "sum(status_class:apiserver_request_total:rate5m{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate5m{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate5m"
},
{
"expr": "sum(status_class:apiserver_request_total:rate30m{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate30m{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate30m"
},
{
"expr": "sum(status_class:apiserver_request_total:rate1h{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate1h{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate1h"
},
{
"expr": "sum(status_class:apiserver_request_total:rate2h{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate2h{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate2h"
},
{
"expr": "sum(status_class:apiserver_request_total:rate6h{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate6h{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate6h"
},
{
"expr": "sum(status_class:apiserver_request_total:rate1d{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate1d{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate1d"
},
{
"expr": "sum(status_class:apiserver_request_total:rate3d{job=\"apiserver\",status_class=\"5xx\"})\n/\nsum(status_class:apiserver_request_total:rate3d{job=\"apiserver\"})\n",
"labels": {
"job": "apiserver"
},
"record": "status_class_5xx:apiserver_request_total:ratio_rate3d"
}
]
},
{
"name": "kube-apiserver.rules",
"rules": [
{
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"expr": "sum(rate(apiserver_request_duration_seconds_sum{subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod)\n/\nsum(rate(apiserver_request_duration_seconds_count{subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod)\n",
"record": "cluster:apiserver_request_duration_seconds:mean5m"
},
{
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.99"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"expr": "histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.9"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.5, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"expr": "histogram_quantile(0.5, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.5"
},
@ -179,23 +286,23 @@ data:
"record": "namespace:container_cpu_usage_seconds_total:sum_rate"
},
{
"expr": "sum by (namespace, pod, container) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])\n) * on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n",
"expr": "sum by (cluster, namespace, pod, container) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])\n) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (\n 1, max by(cluster, namespace, pod, node) (kube_pod_info)\n)\n",
"record": "node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate"
},
{
"expr": "container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n",
"expr": "container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info)\n)\n",
"record": "node_namespace_pod_container:container_memory_working_set_bytes"
},
{
"expr": "container_memory_rss{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n",
"expr": "container_memory_rss{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info)\n)\n",
"record": "node_namespace_pod_container:container_memory_rss"
},
{
"expr": "container_memory_cache{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n",
"expr": "container_memory_cache{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info)\n)\n",
"record": "node_namespace_pod_container:container_memory_cache"
},
{
"expr": "container_memory_swap{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)\n",
"expr": "container_memory_swap{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info)\n)\n",
"record": "node_namespace_pod_container:container_memory_swap"
},
{
@ -203,29 +310,29 @@ data:
"record": "namespace:container_memory_usage_bytes:sum"
},
{
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~\"Pending|Running\"} == 1)) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"expr": "sum by (namespace) (\n sum by (namespace, pod) (\n max by (namespace, pod, container) (\n kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"}\n ) * on(namespace, pod) group_left() max by (namespace, pod) (\n kube_pod_status_phase{phase=~\"Pending|Running\"} == 1\n )\n )\n)\n",
"record": "namespace:kube_pod_container_resource_requests_memory_bytes:sum"
},
{
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~\"Pending|Running\"} == 1)) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"expr": "sum by (namespace) (\n sum by (namespace, pod) (\n max by (namespace, pod, container) (\n kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n ) * on(namespace, pod) group_left() max by (namespace, pod) (\n kube_pod_status_phase{phase=~\"Pending|Running\"} == 1\n )\n )\n)\n",
"record": "namespace:kube_pod_container_resource_requests_cpu_cores:sum"
},
{
"expr": "sum(\n label_replace(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"ReplicaSet\"},\n \"replicaset\", \"$1\", \"owner_name\", \"(.*)\"\n ) * on(replicaset, namespace) group_left(owner_name) kube_replicaset_owner{job=\"kube-state-metrics\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n) by (namespace, workload, pod)\n",
"expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"ReplicaSet\"},\n \"replicaset\", \"$1\", \"owner_name\", \"(.*)\"\n ) * on(replicaset, namespace) group_left(owner_name) topk by(replicaset, namespace) (\n 1, max by (replicaset, namespace, owner_name) (\n kube_replicaset_owner{job=\"kube-state-metrics\"}\n )\n ),\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": {
"workload_type": "deployment"
},
"record": "mixin_pod_workload"
},
{
"expr": "sum(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"DaemonSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n) by (namespace, workload, pod)\n",
"expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"DaemonSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": {
"workload_type": "daemonset"
},
"record": "mixin_pod_workload"
},
{
"expr": "sum(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"StatefulSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n) by (namespace, workload, pod)\n",
"expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"StatefulSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": {
"workload_type": "statefulset"
},
@ -305,23 +412,49 @@ data:
"name": "node.rules",
"rules": [
{
"expr": "sum(min(kube_pod_info) by (node))",
"expr": "sum(min(kube_pod_info) by (cluster, node))\n",
"record": ":kube_pod_info_node_count:"
},
{
"expr": "max(label_replace(kube_pod_info{job=\"kube-state-metrics\"}, \"pod\", \"$1\", \"pod\", \"(.*)\")) by (node, namespace, pod)\n",
"expr": "topk by(namespace, pod) (1,\n max by (node, namespace, pod) (\n label_replace(kube_pod_info{job=\"kube-state-metrics\"}, \"pod\", \"$1\", \"pod\", \"(.*)\")\n))\n",
"record": "node_namespace_pod:kube_pod_info:"
},
{
"expr": "count by (node) (sum by (node, cpu) (\n node_cpu_seconds_total{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n))\n",
"expr": "count by (cluster, node) (sum by (node, cpu) (\n node_cpu_seconds_total{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n))\n",
"record": "node:node_num_cpu:sum"
},
{
"expr": "sum(\n node_memory_MemAvailable_bytes{job=\"node-exporter\"} or\n (\n node_memory_Buffers_bytes{job=\"node-exporter\"} +\n node_memory_Cached_bytes{job=\"node-exporter\"} +\n node_memory_MemFree_bytes{job=\"node-exporter\"} +\n node_memory_Slab_bytes{job=\"node-exporter\"}\n )\n)\n",
"expr": "sum(\n node_memory_MemAvailable_bytes{job=\"node-exporter\"} or\n (\n node_memory_Buffers_bytes{job=\"node-exporter\"} +\n node_memory_Cached_bytes{job=\"node-exporter\"} +\n node_memory_MemFree_bytes{job=\"node-exporter\"} +\n node_memory_Slab_bytes{job=\"node-exporter\"}\n )\n) by (cluster)\n",
"record": ":node_memory_MemAvailable_bytes:sum"
}
]
},
{
"name": "kubelet.rules",
"rules": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (instance, le) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"})\n",
"labels": {
"quantile": "0.99"
},
"record": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.9, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (instance, le) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"})\n",
"labels": {
"quantile": "0.9"
},
"record": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.5, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (instance, le) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"})\n",
"labels": {
"quantile": "0.5"
},
"record": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile"
}
]
},
{
"name": "kubernetes-apps",
"rules": [
@ -343,7 +476,7 @@ data:
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"
},
"expr": "sum by (namespace, pod) (kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Failed|Pending|Unknown\"} * on(namespace, pod) group_left(owner_kind) kube_pod_owner{owner_kind!=\"Job\"}) > 0\n",
"expr": "sum by (namespace, pod) (max by(namespace, pod) (kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Pending|Unknown\"}) * on(namespace, pod) group_left(owner_kind) max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!=\"Job\"})) > 0\n",
"for": "15m",
"labels": {
"severity": "critical"
@ -367,7 +500,7 @@ data:
"message": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"
},
"expr": "kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n !=\nkube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n",
"expr": "(\n kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n !=\n kube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n) and (\n changes(kube_deployment_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m",
"labels": {
"severity": "critical"
@ -379,7 +512,7 @@ data:
"message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetreplicasmismatch"
},
"expr": "kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_status_replicas{job=\"kube-state-metrics\"}\n",
"expr": "(\n kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas{job=\"kube-state-metrics\"}\n) and (\n changes(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m",
"labels": {
"severity": "critical"
@ -528,7 +661,7 @@ data:
"message": "Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit"
},
"expr": "sum(namespace:kube_pod_container_resource_requests_cpu_cores:sum)\n /\nsum(kube_node_status_allocatable_cpu_cores)\n >\n(count(kube_node_status_allocatable_cpu_cores)-1) / count(kube_node_status_allocatable_cpu_cores)\n",
"expr": "sum(namespace:kube_pod_container_resource_requests_cpu_cores:sum{})\n /\nsum(kube_node_status_allocatable_cpu_cores)\n >\n(count(kube_node_status_allocatable_cpu_cores)-1) / count(kube_node_status_allocatable_cpu_cores)\n",
"for": "5m",
"labels": {
"severity": "warning"
@ -540,7 +673,7 @@ data:
"message": "Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit"
},
"expr": "sum(namespace:kube_pod_container_resource_requests_memory_bytes:sum)\n /\nsum(kube_node_status_allocatable_memory_bytes)\n >\n(count(kube_node_status_allocatable_memory_bytes)-1)\n /\ncount(kube_node_status_allocatable_memory_bytes)\n",
"expr": "sum(namespace:kube_pod_container_resource_requests_memory_bytes:sum{})\n /\nsum(kube_node_status_allocatable_memory_bytes)\n >\n(count(kube_node_status_allocatable_memory_bytes)-1)\n /\ncount(kube_node_status_allocatable_memory_bytes)\n",
"for": "5m",
"labels": {
"severity": "warning"
@ -618,7 +751,7 @@ data:
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefullinfourdays"
},
"expr": "(\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 0.15\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\n",
"for": "5m",
"for": "1h",
"labels": {
"severity": "critical"
}
@ -666,17 +799,46 @@ data:
}
]
},
{
"name": "kube-apiserver-error-alerts",
"rules": [
{
"alert": "ErrorBudgetBurn",
"annotations": {
"message": "High requests error budget burn for job=apiserver (current value: {{ $value }})",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-errorbudgetburn"
},
"expr": "(\n status_class_5xx:apiserver_request_total:ratio_rate1h{job=\"apiserver\"} > (14.4*0.010000)\n and\n status_class_5xx:apiserver_request_total:ratio_rate5m{job=\"apiserver\"} > (14.4*0.010000)\n)\nor\n(\n status_class_5xx:apiserver_request_total:ratio_rate6h{job=\"apiserver\"} > (6*0.010000)\n and\n status_class_5xx:apiserver_request_total:ratio_rate30m{job=\"apiserver\"} > (6*0.010000)\n)\n",
"labels": {
"job": "apiserver",
"severity": "critical"
}
},
{
"alert": "ErrorBudgetBurn",
"annotations": {
"message": "High requests error budget burn for job=apiserver (current value: {{ $value }})",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-errorbudgetburn"
},
"expr": "(\n status_class_5xx:apiserver_request_total:ratio_rate1d{job=\"apiserver\"} > (3*0.010000)\n and\n status_class_5xx:apiserver_request_total:ratio_rate2h{job=\"apiserver\"} > (3*0.010000)\n)\nor\n(\n status_class_5xx:apiserver_request_total:ratio_rate3d{job=\"apiserver\"} > (0.010000)\n and\n status_class_5xx:apiserver_request_total:ratio_rate6h{job=\"apiserver\"} > (0.010000)\n)\n",
"labels": {
"job": "apiserver",
"severity": "warning"
}
}
]
},
{
"name": "kubernetes-system-apiserver",
"rules": [
{
"alert": "KubeAPILatencyHigh",
"annotations": {
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"message": "The API server has an abnormal latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
},
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|PROXY|CONNECT\"} > 1\n",
"for": "10m",
"expr": "(\n cluster:apiserver_request_duration_seconds:mean5m{job=\"apiserver\"}\n >\n on (verb) group_left()\n (\n avg by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job=\"apiserver\"} >= 0)\n +\n 2*stddev by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job=\"apiserver\"} >= 0)\n )\n) > on (verb) group_left()\n1.2 * avg by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job=\"apiserver\"} >= 0)\nand on (verb,resource)\ncluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\"}\n>\n1\n",
"for": "5m",
"labels": {
"severity": "warning"
}
@ -687,36 +849,12 @@ data:
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
},
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"LIST|WATCH|WATCHLIST|PROXY|CONNECT\"} > 4\n",
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\"} > 4\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m]))\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) > 0.03\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
"message": "API server is returning errors for {{ $value | humanizePercentage }} of requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"5..\"}[5m]))\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) > 0.01\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPIErrorsHigh",
"annotations": {
@ -747,7 +885,7 @@ data:
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 7.0 days.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration"
},
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 604800\n",
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 604800\n",
"labels": {
"severity": "warning"
}
@ -758,11 +896,34 @@ data:
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration"
},
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 86400\n",
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 86400\n",
"labels": {
"severity": "critical"
}
},
{
"alert": "AggregatedAPIErrors",
"annotations": {
"message": "An aggregated API {{ $labels.name }}/{{ $labels.namespace }} has reported errors. The number of errors have increased for it in the past five minutes. High values indicate that the availability of the service changes too often.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapierrors"
},
"expr": "sum by(name, namespace)(increase(aggregator_unavailable_apiservice_count[5m])) > 2\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "AggregatedAPIDown",
"annotations": {
"message": "An aggregated API {{ $labels.name }}/{{ $labels.namespace }} is down. It has not been available at least for the past five minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-aggregatedapidown"
},
"expr": "sum by(name, namespace)(sum_over_time(aggregator_unavailable_apiservice[5m])) > 0\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPIDown",
"annotations": {
@ -799,6 +960,7 @@ data:
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodeunreachable"
},
"expr": "kube_node_spec_taint{job=\"kube-state-metrics\",key=\"node.kubernetes.io/unreachable\",effect=\"NoSchedule\"} == 1\n",
"for": "2m",
"labels": {
"severity": "warning"
}
@ -809,7 +971,43 @@ data:
"message": "Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage }} of its Pod capacity.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods"
},
"expr": "max(max(kubelet_running_pod_count{job=\"kubelet\"}) by(instance) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"}) by(node) / max(kube_node_status_capacity_pods{job=\"kube-state-metrics\"}) by(node) > 0.95\n",
"expr": "max(max(kubelet_running_pod_count{job=\"kubelet\"}) by(instance) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"}) by(node) / max(kube_node_status_capacity_pods{job=\"kube-state-metrics\"} != 1) by(node) > 0.95\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeNodeReadinessFlapping",
"annotations": {
"message": "The readiness status of node {{ $labels.node }} has changed {{ $value }} times in the last 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping"
},
"expr": "sum(changes(kube_node_status_condition{status=\"true\",condition=\"Ready\"}[15m])) by (node) > 2\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletPlegDurationHigh",
"annotations": {
"message": "The Kubelet Pod Lifecycle Event Generator has a 99th percentile duration of {{ $value }} seconds on node {{ $labels.node }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletplegdurationhigh"
},
"expr": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile{quantile=\"0.99\"} >= 10\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletPodStartUpLatencyHigh",
"annotations": {
"message": "Kubelet Pod startup 99th percentile latency is {{ $value }} seconds on node {{ $labels.node }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh"
},
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\"}[5m])) by (instance, le)) * on(instance) group_left(node) kubelet_node_name > 60\n",
"for": "15m",
"labels": {
"severity": "warning"
@ -865,9 +1063,167 @@ data:
}
]
}
loki.yaml: |-
{
"groups": [
{
"name": "loki_rules",
"rules": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job))",
"record": "job:loki_request_duration_seconds:99quantile"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job))",
"record": "job:loki_request_duration_seconds:50quantile"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (job) / sum(rate(loki_request_duration_seconds_count[1m])) by (job)",
"record": "job:loki_request_duration_seconds:avg"
},
{
"expr": "sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job)",
"record": "job:loki_request_duration_seconds_bucket:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (job)",
"record": "job:loki_request_duration_seconds_sum:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_count[1m])) by (job)",
"record": "job:loki_request_duration_seconds_count:sum_rate"
},
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route))",
"record": "job_route:loki_request_duration_seconds:99quantile"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route))",
"record": "job_route:loki_request_duration_seconds:50quantile"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (job, route)",
"record": "job_route:loki_request_duration_seconds:avg"
},
{
"expr": "sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, job, route)",
"record": "job_route:loki_request_duration_seconds_bucket:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (job, route)",
"record": "job_route:loki_request_duration_seconds_sum:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_count[1m])) by (job, route)",
"record": "job_route:loki_request_duration_seconds_count:sum_rate"
},
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, namespace, job, route))",
"record": "namespace_job_route:loki_request_duration_seconds:99quantile"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, namespace, job, route))",
"record": "namespace_job_route:loki_request_duration_seconds:50quantile"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (namespace, job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route)",
"record": "namespace_job_route:loki_request_duration_seconds:avg"
},
{
"expr": "sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, namespace, job, route)",
"record": "namespace_job_route:loki_request_duration_seconds_bucket:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_sum[1m])) by (namespace, job, route)",
"record": "namespace_job_route:loki_request_duration_seconds_sum:sum_rate"
},
{
"expr": "sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route)",
"record": "namespace_job_route:loki_request_duration_seconds_count:sum_rate"
}
]
},
{
"name": "loki_alerts",
"rules": [
{
"alert": "LokiRequestErrors",
"annotations": {
"message": "{{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}% errors.\n"
},
"expr": "100 * sum(rate(loki_request_duration_seconds_count{status_code=~\"5..\"}[1m])) by (namespace, job, route)\n /\nsum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route)\n > 10\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "LokiRequestLatency",
"annotations": {
"message": "{{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}s 99th percentile latency.\n"
},
"expr": "namespace_job_route:loki_request_duration_seconds:99quantile{route!~\"(?i).*tail.*\"} > 1\n",
"for": "15m",
"labels": {
"severity": "critical"
}
}
]
}
]
}
node-exporter.yaml: |-
{
"groups": [
{
"name": "node-exporter.rules",
"rules": [
{
"expr": "count without (cpu) (\n count without (mode) (\n node_cpu_seconds_total{job=\"node-exporter\"}\n )\n)\n",
"record": "instance:node_num_cpu:sum"
},
{
"expr": "1 - avg without (cpu, mode) (\n rate(node_cpu_seconds_total{job=\"node-exporter\", mode=\"idle\"}[1m])\n)\n",
"record": "instance:node_cpu_utilisation:rate1m"
},
{
"expr": "(\n node_load1{job=\"node-exporter\"}\n/\n instance:node_num_cpu:sum{job=\"node-exporter\"}\n)\n",
"record": "instance:node_load1_per_cpu:ratio"
},
{
"expr": "1 - (\n node_memory_MemAvailable_bytes{job=\"node-exporter\"}\n/\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n)\n",
"record": "instance:node_memory_utilisation:ratio"
},
{
"expr": "rate(node_vmstat_pgmajfault{job=\"node-exporter\"}[1m])\n",
"record": "instance:node_vmstat_pgmajfault:rate1m"
},
{
"expr": "rate(node_disk_io_time_seconds_total{job=\"node-exporter\", device!~\"dm.*\"}[1m])\n",
"record": "instance_device:node_disk_io_time_seconds:rate1m"
},
{
"expr": "rate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\", device!~\"dm.*\"}[1m])\n",
"record": "instance_device:node_disk_io_time_weighted_seconds:rate1m"
},
{
"expr": "sum without (device) (\n rate(node_network_receive_bytes_total{job=\"node-exporter\", device!=\"lo\"}[1m])\n)\n",
"record": "instance:node_network_receive_bytes_excluding_lo:rate1m"
},
{
"expr": "sum without (device) (\n rate(node_network_transmit_bytes_total{job=\"node-exporter\", device!=\"lo\"}[1m])\n)\n",
"record": "instance:node_network_transmit_bytes_excluding_lo:rate1m"
},
{
"expr": "sum without (device) (\n rate(node_network_receive_drop_total{job=\"node-exporter\", device!=\"lo\"}[1m])\n)\n",
"record": "instance:node_network_receive_drop_excluding_lo:rate1m"
},
{
"expr": "sum without (device) (\n rate(node_network_transmit_drop_total{job=\"node-exporter\", device!=\"lo\"}[1m])\n)\n",
"record": "instance:node_network_transmit_drop_excluding_lo:rate1m"
}
]
},
{
"name": "node-exporter",
"rules": [
@ -990,6 +1346,41 @@ data:
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeHighNumberConntrackEntriesUsed",
"annotations": {
"description": "{{ $value | humanizePercentage }} of conntrack entries are used",
"summary": "Number of conntrack are getting close to the limit"
},
"expr": "(node_nf_conntrack_entries / node_nf_conntrack_entries_limit) > 0.75\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeClockSkewDetected",
"annotations": {
"message": "Clock on {{ $labels.instance }} is out of sync by more than 300s. Ensure NTP is configured correctly on this host.",
"summary": "Clock skew detected."
},
"expr": "(\n node_timex_offset_seconds > 0.05\nand\n deriv(node_timex_offset_seconds[5m]) >= 0\n)\nor\n(\n node_timex_offset_seconds < -0.05\nand\n deriv(node_timex_offset_seconds[5m]) <= 0\n)\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeClockNotSynchronising",
"annotations": {
"message": "Clock on {{ $labels.instance }} is not synchronising. Ensure NTP is configured on this host.",
"summary": "Clock not synchronising."
},
"expr": "min_over_time(node_timex_sync_status[5m]) == 0\n",
"for": "10m",
"labels": {
"severity": "warning"
}
}
]
}
@ -1124,7 +1515,7 @@ data:
{
"alert": "PrometheusRemoteStorageFailures",
"annotations": {
"description": "Prometheus {{$labels.instance}} failed to send {{ printf \"%.1f\" $value }}% of the samples to queue {{$labels.queue}}.",
"description": "Prometheus {{$labels.instance}} failed to send {{ printf \"%.1f\" $value }}% of the samples to {{ if $labels.queue }}{{ $labels.queue }}{{ else }}{{ $labels.url }}{{ end }}.",
"summary": "Prometheus fails to send samples to remote storage."
},
"expr": "(\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n/\n (\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n +\n rate(prometheus_remote_storage_succeeded_samples_total{job=\"prometheus\"}[5m])\n )\n)\n* 100\n> 1\n",
@ -1136,7 +1527,7 @@ data:
{
"alert": "PrometheusRemoteWriteBehind",
"annotations": {
"description": "Prometheus {{$labels.instance}} remote write is {{ printf \"%.1f\" $value }}s behind for queue {{$labels.queue}}.",
"description": "Prometheus {{$labels.instance}} remote write is {{ printf \"%.1f\" $value }}s behind for {{ if $labels.queue }}{{ $labels.queue }}{{ else }}{{ $labels.url }}{{ end }}.",
"summary": "Prometheus remote write is behind."
},
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds{job=\"prometheus\"}[5m])\n- on(job, instance) group_right\n max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job=\"prometheus\"}[5m])\n)\n> 120\n",
@ -1148,10 +1539,10 @@ data:
{
"alert": "PrometheusRemoteWriteDesiredShards",
"annotations": {
"description": "Prometheus {{$labels.instance}} remote write desired shards calculation wants to run {{ printf $value }} shards, which is more than the max of {{ printf `prometheus_remote_storage_shards_max{instance=\"%s\",job=\"prometheus\"}` $labels.instance | query | first | value }}.",
"description": "Prometheus {{$labels.instance}} remote write desired shards calculation wants to run {{ $value }} shards, which is more than the max of {{ printf `prometheus_remote_storage_shards_max{instance=\"%s\",job=\"prometheus\"}` $labels.instance | query | first | value }}.",
"summary": "Prometheus remote write desired shards calculation wants to run more than configured max shards."
},
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_shards_desired{job=\"prometheus\"}[5m])\n> on(job, instance) group_right\n max_over_time(prometheus_remote_storage_shards_max{job=\"prometheus\"}[5m])\n)\n",
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_shards_desired{job=\"prometheus\"}[5m])\n>\n max_over_time(prometheus_remote_storage_shards_max{job=\"prometheus\"}[5m])\n)\n",
"for": "15m",
"labels": {
"severity": "warning"
@ -1201,6 +1592,17 @@ data:
"labels": {
"severity": "warning"
}
},
{
"alert": "BlackboxProbeFailure",
"annotations": {
"message": "Blackbox probe {{$labels.instance}} failed"
},
"expr": "probe_success == 0",
"for": "2m",
"labels": {
"severity": "critical"
}
}
]
},
@ -1212,7 +1614,7 @@ data:
"annotations": {
"message": "{{ $value }} RAID disk(s) on node {{ $labels.instance }} are inactive."
},
"expr": "node_md_disks - node_md_disks_active > 0",
"expr": "node_md_disks{state=\"failed\"} > 0",
"for": "10m",
"labels": {
"severity": "warning"

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.17.1 (upstream)
* Kubernetes v1.18.1 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=de85f1da7df0b13dfb7488350c20a510f3090cdf"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=1ad53d3b1c1ad75a4ed27f124f772fc5dc025245"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]

View File

@ -7,7 +7,9 @@ systemd:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.4.3"
Environment="ETCD_IMAGE_TAG=v3.4.7"
Environment="ETCD_IMAGE_URL=docker://quay.io/coreos/etcd"
Environment="RKT_RUN_ARGS=--insecure-options=image"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -79,7 +81,7 @@ systemd:
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
@ -89,8 +91,7 @@ systemd:
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
--exec=/usr/local/bin/kubelet -- \
docker://quay.io/poseidon/kubelet:v1.18.1 -- \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -124,7 +125,6 @@ systemd:
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \
@ -134,7 +134,7 @@ systemd:
--volume script,kind=host,source=/opt/bootstrap/apply \
--mount volume=script,target=/apply \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
docker://quay.io/poseidon/kubelet:v1.18.1 \
--net=host \
--dns=host \
--exec=/apply
@ -169,8 +169,10 @@ storage:
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
sudo mkdir -p /opt/bootstrap/assets/manifests/crds
sudo mv manifests-networking/{crd,cluster}*.yaml /opt/bootstrap/assets/manifests/crds 2>/dev/null || true
sudo mv manifests-networking/* /opt/bootstrap/assets/manifests/
rm -rf assets auth static-manifests tls manifests-networking
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
@ -182,6 +184,10 @@ storage:
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests/crds -R; do
echo "Retry Custom Resource Definition manifests"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5

View File

@ -52,7 +52,7 @@ data "ct_config" "controller-ignitions" {
count = var.controller_count
content = data.template_file.controller-configs.*.rendered[count.index]
pretty_print = false
snippets = var.controller_clc_snippets
snippets = var.controller_snippets
}
# Controller Container Linux configs

View File

@ -25,21 +25,23 @@ resource "aws_internet_gateway" "gateway" {
resource "aws_route_table" "default" {
vpc_id = aws_vpc.network.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.gateway.id
}
route {
ipv6_cidr_block = "::/0"
gateway_id = aws_internet_gateway.gateway.id
}
tags = {
"Name" = var.cluster_name
}
}
resource "aws_route" "egress-ipv4" {
route_table_id = aws_route_table.default.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.gateway.id
}
resource "aws_route" "egress-ipv6" {
route_table_id = aws_route_table.default.id
destination_ipv6_cidr_block = "::/0"
gateway_id = aws_internet_gateway.gateway.id
}
# Subnets (one per availability zone)
resource "aws_subnet" "public" {

View File

@ -77,13 +77,13 @@ variable "worker_target_groups" {
default = []
}
variable "controller_clc_snippets" {
variable "controller_snippets" {
type = list(string)
description = "Controller Container Linux Config snippets"
default = []
}
variable "worker_clc_snippets" {
variable "worker_snippets" {
type = list(string)
description = "Worker Container Linux Config snippets"
default = []

View File

@ -18,7 +18,7 @@ module "workers" {
ssh_authorized_key = var.ssh_authorized_key
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
clc_snippets = var.worker_clc_snippets
snippets = var.worker_snippets
node_labels = var.worker_node_labels
}

View File

@ -54,7 +54,7 @@ systemd:
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
@ -64,8 +64,7 @@ systemd:
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
--exec=/usr/local/bin/kubelet -- \
docker://quay.io/poseidon/kubelet:v1.18.1 -- \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -80,9 +79,9 @@ systemd:
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{ for label in split(",", node_labels) }
%{~ for label in split(",", node_labels) ~}
--node-labels=${label} \
%{ endfor ~}
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
@ -128,11 +127,10 @@ storage:
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
docker://quay.io/poseidon/kubelet:v1.18.1 \
--net=host \
--dns=host \
-- \
kubectl --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
--exec=/usr/local/bin/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
passwd:
users:
- name: core

View File

@ -70,7 +70,7 @@ variable "target_groups" {
default = []
}
variable "clc_snippets" {
variable "snippets" {
type = list(string)
description = "Container Linux Config snippets"
default = []

View File

@ -73,7 +73,7 @@ resource "aws_launch_configuration" "worker" {
data "ct_config" "worker-ignition" {
content = data.template_file.worker-config.rendered
pretty_print = false
snippets = var.clc_snippets
snippets = var.snippets
}
# Worker Container Linux config

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.17.1 (upstream)
* Kubernetes v1.18.1 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -18,6 +18,11 @@ data "aws_ami" "fedora-coreos" {
values = ["fedora-coreos-31.*.*.*-hvm"]
}
filter {
name = "description"
values = ["Fedora CoreOS stable*"]
}
# try to filter out dev images (AWS filters can't)
name_regex = "^fedora-coreos-31.[0-9]*.[0-9]*.[0-9]*-hvm*"
}

View File

@ -1,6 +1,6 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=de85f1da7df0b13dfb7488350c20a510f3090cdf"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=1ad53d3b1c1ad75a4ed27f124f772fc5dc025245"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]

View File

@ -28,7 +28,7 @@ systemd:
--network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.4.3
quay.io/coreos/etcd:v3.4.7
ExecStop=/usr/bin/podman stop etcd
[Install]
WantedBy=multi-user.target
@ -73,13 +73,13 @@ systemd:
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico \
--volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \
--volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \
k8s.gcr.io/hyperkube:v1.17.1 kubelet \
quay.io/poseidon/kubelet:v1.18.1 \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -116,14 +116,13 @@ systemd:
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/podman run --name bootstrap \
--network host \
--volume /etc/kubernetes/bootstrap-secrets:/etc/kubernetes/secrets:ro,Z \
--volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \
--entrypoint=/apply \
k8s.gcr.io/hyperkube:v1.17.1
quay.io/poseidon/kubelet:v1.18.1
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap
storage:
@ -155,8 +154,10 @@ storage:
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
sudo mkdir -p /opt/bootstrap/assets/manifests/crds
sudo mv manifests-networking/{crd,cluster}*.yaml /opt/bootstrap/assets/manifests/crds 2>/dev/null || true
sudo mv manifests-networking/* /opt/bootstrap/assets/manifests/
rm -rf assets auth static-manifests tls manifests-networking
- path: /opt/bootstrap/apply
mode: 0544
contents:
@ -167,14 +168,14 @@ storage:
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests/crds -R; do
echo "Retry Custom Resource Definition manifests"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5
done
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.all.rp_filter=1
- path: /etc/sysctl.d/max-user-watches.conf
contents:
inline: |

View File

@ -25,21 +25,23 @@ resource "aws_internet_gateway" "gateway" {
resource "aws_route_table" "default" {
vpc_id = aws_vpc.network.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.gateway.id
}
route {
ipv6_cidr_block = "::/0"
gateway_id = aws_internet_gateway.gateway.id
}
tags = {
"Name" = var.cluster_name
}
}
resource "aws_route" "egress-ipv4" {
route_table_id = aws_route_table.default.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.gateway.id
}
resource "aws_route" "egress-ipv6" {
route_table_id = aws_route_table.default.id
destination_ipv6_cidr_block = "::/0"
gateway_id = aws_internet_gateway.gateway.id
}
# Subnets (one per availability zone)
resource "aws_subnet" "public" {

View File

@ -18,6 +18,11 @@ data "aws_ami" "fedora-coreos" {
values = ["fedora-coreos-31.*.*.*-hvm"]
}
filter {
name = "description"
values = ["Fedora CoreOS stable*"]
}
# try to filter out dev images (AWS filters can't)
name_regex = "^fedora-coreos-31.[0-9]*.[0-9]*.[0-9]*-hvm*"
}

View File

@ -43,13 +43,13 @@ systemd:
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico \
--volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \
--volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \
k8s.gcr.io/hyperkube:v1.17.1 kubelet \
quay.io/poseidon/kubelet:v1.18.1 \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -66,9 +66,9 @@ systemd:
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{ for label in split(",", node_labels) }
%{~ for label in split(",", node_labels) ~}
--node-labels=${label} \
%{ endfor ~}
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
@ -78,6 +78,18 @@ systemd:
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: delete-node.service
enabled: true
contents: |
[Unit]
Description=Delete Kubernetes node on shutdown
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash -c '/usr/bin/podman run --volume /etc/kubernetes:/etc/kubernetes:ro,z --entrypoint /usr/local/bin/kubectl quay.io/poseidon/kubelet:v1.18.1 --kubeconfig=/etc/kubernetes/kubeconfig delete node $HOSTNAME'
[Install]
WantedBy=multi-user.target
storage:
directories:
- path: /etc/kubernetes
@ -87,10 +99,6 @@ storage:
contents:
inline: |
${kubeconfig}
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.all.rp_filter=1
- path: /etc/sysctl.d/max-user-watches.conf
contents:
inline: |

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.17.1 (upstream)
* Kubernetes v1.18.1 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [low-priority](https://typhoon.psdn.io/cl/azure/#low-priority) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=de85f1da7df0b13dfb7488350c20a510f3090cdf"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=1ad53d3b1c1ad75a4ed27f124f772fc5dc025245"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]

View File

@ -7,7 +7,9 @@ systemd:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.4.3"
Environment="ETCD_IMAGE_TAG=v3.4.7"
Environment="ETCD_IMAGE_URL=docker://quay.io/coreos/etcd"
Environment="RKT_RUN_ARGS=--insecure-options=image"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -78,7 +80,7 @@ systemd:
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
@ -88,8 +90,7 @@ systemd:
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
--exec=/usr/local/bin/kubelet -- \
docker://quay.io/poseidon/kubelet:v1.18.1 -- \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -122,7 +123,6 @@ systemd:
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \
@ -132,7 +132,7 @@ systemd:
--volume script,kind=host,source=/opt/bootstrap/apply \
--mount volume=script,target=/apply \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
docker://quay.io/poseidon/kubelet:v1.18.1 \
--net=host \
--dns=host \
--exec=/apply
@ -167,8 +167,10 @@ storage:
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
sudo mkdir -p /opt/bootstrap/assets/manifests/crds
sudo mv manifests-networking/{crd,cluster}*.yaml /opt/bootstrap/assets/manifests/crds 2>/dev/null || true
sudo mv manifests-networking/* /opt/bootstrap/assets/manifests/
rm -rf assets auth static-manifests tls manifests-networking
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
@ -180,6 +182,10 @@ storage:
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests/crds -R; do
echo "Retry Custom Resource Definition manifests"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5

View File

@ -15,8 +15,10 @@ resource "azurerm_dns_a_record" "etcds" {
}
locals {
# Channel for a Container Linux derivative
# Container Linux derivative
# coreos-stable -> Container Linux Stable
# flatcar-stable -> Flatcar Linux Stable
flavor = split("-", var.os_image)[0]
channel = split("-", var.os_image)[1]
}
@ -32,92 +34,63 @@ resource "azurerm_availability_set" "controllers" {
}
# Controller instances
resource "azurerm_virtual_machine" "controllers" {
resource "azurerm_linux_virtual_machine" "controllers" {
count = var.controller_count
resource_group_name = azurerm_resource_group.cluster.name
name = "${var.cluster_name}-controller-${count.index}"
location = var.region
availability_set_id = azurerm_availability_set.controllers.id
vm_size = var.controller_type
# boot
storage_image_reference {
publisher = "CoreOS"
offer = "CoreOS"
size = var.controller_type
custom_data = base64encode(data.ct_config.controller-ignitions.*.rendered[count.index])
# storage
os_disk {
name = "${var.cluster_name}-controller-${count.index}"
caching = "None"
disk_size_gb = var.disk_size
storage_account_type = "Premium_LRS"
}
source_image_reference {
publisher = local.flavor == "flatcar" ? "Kinvolk" : "CoreOS"
offer = local.flavor == "flatcar" ? "flatcar-container-linux" : "CoreOS"
sku = local.channel
version = "latest"
}
# storage
storage_os_disk {
name = "${var.cluster_name}-controller-${count.index}"
create_option = "FromImage"
caching = "ReadWrite"
disk_size_gb = var.disk_size
os_type = "Linux"
managed_disk_type = "Premium_LRS"
}
# Gross hack just for Flatcar Linux
dynamic "plan" {
for_each = local.flavor == "flatcar" ? [1] : []
# network
network_interface_ids = [azurerm_network_interface.controllers.*.id[count.index]]
os_profile {
computer_name = "${var.cluster_name}-controller-${count.index}"
admin_username = "core"
custom_data = data.ct_config.controller-ignitions.*.rendered[count.index]
}
# Azure mandates setting an ssh_key, even though Ignition custom_data handles it too
os_profile_linux_config {
disable_password_authentication = true
ssh_keys {
path = "/home/core/.ssh/authorized_keys"
key_data = var.ssh_authorized_key
content {
name = local.channel
publisher = "kinvolk"
product = "flatcar-container-linux"
}
}
# lifecycle
delete_os_disk_on_termination = true
delete_data_disks_on_termination = true
# network
network_interface_ids = [
azurerm_network_interface.controllers.*.id[count.index]
]
# Azure requires setting admin_ssh_key, though Ignition custom_data handles it too
admin_username = "core"
admin_ssh_key {
username = "core"
public_key = var.ssh_authorized_key
}
lifecycle {
ignore_changes = [
storage_os_disk,
os_profile,
os_disk,
custom_data,
]
}
}
# Controller NICs with public and private IPv4
resource "azurerm_network_interface" "controllers" {
count = var.controller_count
resource_group_name = azurerm_resource_group.cluster.name
name = "${var.cluster_name}-controller-${count.index}"
location = azurerm_resource_group.cluster.location
network_security_group_id = azurerm_network_security_group.controller.id
ip_configuration {
name = "ip0"
subnet_id = azurerm_subnet.controller.id
private_ip_address_allocation = "dynamic"
# public IPv4
public_ip_address_id = azurerm_public_ip.controllers.*.id[count.index]
}
}
# Add controller NICs to the controller backend address pool
resource "azurerm_network_interface_backend_address_pool_association" "controllers" {
count = var.controller_count
network_interface_id = azurerm_network_interface.controllers[count.index].id
ip_configuration_name = "ip0"
backend_address_pool_id = azurerm_lb_backend_address_pool.controller.id
}
# Controller public IPv4 addresses
resource "azurerm_public_ip" "controllers" {
count = var.controller_count
@ -129,12 +102,46 @@ resource "azurerm_public_ip" "controllers" {
allocation_method = "Static"
}
# Controller NICs with public and private IPv4
resource "azurerm_network_interface" "controllers" {
count = var.controller_count
resource_group_name = azurerm_resource_group.cluster.name
name = "${var.cluster_name}-controller-${count.index}"
location = azurerm_resource_group.cluster.location
ip_configuration {
name = "ip0"
subnet_id = azurerm_subnet.controller.id
private_ip_address_allocation = "Dynamic"
# instance public IPv4
public_ip_address_id = azurerm_public_ip.controllers.*.id[count.index]
}
}
# Associate controller network interface with controller security group
resource "azurerm_network_interface_security_group_association" "controllers" {
count = var.controller_count
network_interface_id = azurerm_network_interface.controllers[count.index].id
network_security_group_id = azurerm_network_security_group.controller.id
}
# Associate controller network interface with controller backend address pool
resource "azurerm_network_interface_backend_address_pool_association" "controllers" {
count = var.controller_count
network_interface_id = azurerm_network_interface.controllers[count.index].id
ip_configuration_name = "ip0"
backend_address_pool_id = azurerm_lb_backend_address_pool.controller.id
}
# Controller Ignition configs
data "ct_config" "controller-ignitions" {
count = var.controller_count
content = data.template_file.controller-configs.*.rendered[count.index]
pretty_print = false
snippets = var.controller_clc_snippets
snippets = var.controller_snippets
}
# Controller Container Linux configs

View File

@ -72,6 +72,7 @@ resource "azurerm_lb_rule" "ingress-http" {
name = "ingress-http"
loadbalancer_id = azurerm_lb.cluster.id
frontend_ip_configuration_name = "ingress"
disable_outbound_snat = true
protocol = "Tcp"
frontend_port = 80
@ -86,6 +87,7 @@ resource "azurerm_lb_rule" "ingress-https" {
name = "ingress-https"
loadbalancer_id = azurerm_lb.cluster.id
frontend_ip_configuration_name = "ingress"
disable_outbound_snat = true
protocol = "Tcp"
frontend_port = 443
@ -94,6 +96,20 @@ resource "azurerm_lb_rule" "ingress-https" {
probe_id = azurerm_lb_probe.ingress.id
}
# Worker outbound TCP/UDP SNAT
resource "azurerm_lb_outbound_rule" "worker-outbound" {
resource_group_name = azurerm_resource_group.cluster.name
name = "worker"
loadbalancer_id = azurerm_lb.cluster.id
frontend_ip_configuration {
name = "ingress"
}
protocol = "All"
backend_address_pool_id = azurerm_lb_backend_address_pool.worker.id
}
# Address pool of controllers
resource "azurerm_lb_backend_address_pool" "controller" {
resource_group_name = azurerm_resource_group.cluster.name

View File

@ -24,6 +24,11 @@ resource "azurerm_subnet" "controller" {
address_prefix = cidrsubnet(var.host_cidr, 1, 0)
}
resource "azurerm_subnet_network_security_group_association" "controller" {
subnet_id = azurerm_subnet.controller.id
network_security_group_id = azurerm_network_security_group.controller.id
}
resource "azurerm_subnet" "worker" {
resource_group_name = azurerm_resource_group.cluster.name
@ -32,3 +37,8 @@ resource "azurerm_subnet" "worker" {
address_prefix = cidrsubnet(var.host_cidr, 1, 1)
}
resource "azurerm_subnet_network_security_group_association" "worker" {
subnet_id = azurerm_subnet.worker.id
network_security_group_id = azurerm_network_security_group.worker.id
}

View File

@ -13,7 +13,7 @@ resource "null_resource" "copy-controller-secrets" {
depends_on = [
module.bootstrap,
azurerm_virtual_machine.controllers
azurerm_linux_virtual_machine.controllers
]
connection {

View File

@ -49,7 +49,7 @@ variable "worker_type" {
variable "os_image" {
type = string
default = "coreos-stable"
description = "Channel for a Container Linux derivative (coreos-stable, coreos-beta, coreos-alpha)"
description = "Channel for a Container Linux derivative (coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta)"
}
variable "disk_size" {
@ -64,13 +64,13 @@ variable "worker_priority" {
default = "Regular"
}
variable "controller_clc_snippets" {
variable "controller_snippets" {
type = list(string)
description = "Controller Container Linux Config snippets"
default = []
}
variable "worker_clc_snippets" {
variable "worker_snippets" {
type = list(string)
description = "Worker Container Linux Config snippets"
default = []

View File

@ -3,7 +3,7 @@
terraform {
required_version = "~> 0.12.6"
required_providers {
azurerm = "~> 1.27"
azurerm = "~> 2.0"
ct = "~> 0.3"
template = "~> 2.1"
null = "~> 2.1"

View File

@ -19,7 +19,6 @@ module "workers" {
ssh_authorized_key = var.ssh_authorized_key
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
clc_snippets = var.worker_clc_snippets
snippets = var.worker_snippets
node_labels = var.worker_node_labels
}

View File

@ -53,7 +53,7 @@ systemd:
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
@ -63,8 +63,7 @@ systemd:
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
--exec=/usr/local/bin/kubelet -- \
docker://quay.io/poseidon/kubelet:v1.18.1 -- \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -78,9 +77,9 @@ systemd:
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{ for label in split(",", node_labels) }
%{~ for label in split(",", node_labels) ~}
--node-labels=${label} \
%{ endfor ~}
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
@ -126,11 +125,10 @@ storage:
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
docker://quay.io/poseidon/kubelet:v1.18.1 \
--net=host \
--dns=host \
-- \
kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname | tr '[:upper:]' '[:lower:]')
--exec=/usr/local/bin/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname | tr '[:upper:]' '[:lower:]')
passwd:
users:
- name: core

View File

@ -56,7 +56,7 @@ variable "priority" {
default = "Regular"
}
variable "clc_snippets" {
variable "snippets" {
type = list(string)
description = "Container Linux Config snippets"
default = []

View File

@ -1,57 +1,56 @@
locals {
# Channel for a Container Linux derivative
# coreos-stable -> Container Linux Stable
# flatcar-stable -> Flatcar Linux Stable
flavor = split("-", var.os_image)[0]
channel = split("-", var.os_image)[1]
}
# Workers scale set
resource "azurerm_virtual_machine_scale_set" "workers" {
resource "azurerm_linux_virtual_machine_scale_set" "workers" {
resource_group_name = var.resource_group_name
name = "${var.name}-workers"
location = var.region
name = "${var.name}-worker"
location = var.region
sku = var.vm_type
instances = var.worker_count
# instance name prefix for instances in the set
computer_name_prefix = "${var.name}-worker"
single_placement_group = false
custom_data = base64encode(data.ct_config.worker-ignition.rendered)
sku {
name = var.vm_type
tier = "standard"
capacity = var.worker_count
# storage
os_disk {
storage_account_type = "Standard_LRS"
caching = "ReadWrite"
}
# boot
storage_profile_image_reference {
publisher = "CoreOS"
offer = "CoreOS"
source_image_reference {
publisher = local.flavor == "flatcar" ? "Kinvolk" : "CoreOS"
offer = local.flavor == "flatcar" ? "flatcar-container-linux" : "CoreOS"
sku = local.channel
version = "latest"
}
# storage
storage_profile_os_disk {
create_option = "FromImage"
caching = "ReadWrite"
os_type = "linux"
managed_disk_type = "Standard_LRS"
}
# Gross hack just for Flatcar Linux
dynamic "plan" {
for_each = local.flavor == "flatcar" ? [1] : []
os_profile {
computer_name_prefix = "${var.name}-worker-"
admin_username = "core"
custom_data = data.ct_config.worker-ignition.rendered
}
# Azure mandates setting an ssh_key, even though Ignition custom_data handles it too
os_profile_linux_config {
disable_password_authentication = true
ssh_keys {
path = "/home/core/.ssh/authorized_keys"
key_data = var.ssh_authorized_key
content {
name = local.channel
publisher = "kinvolk"
product = "flatcar-container-linux"
}
}
# Azure requires setting admin_ssh_key, though Ignition custom_data handles it too
admin_username = "core"
admin_ssh_key {
username = "core"
public_key = var.ssh_authorized_key
}
# network
network_profile {
network_interface {
name = "nic0"
primary = true
network_security_group_id = var.security_group_id
@ -67,10 +66,10 @@ resource "azurerm_virtual_machine_scale_set" "workers" {
}
# lifecycle
upgrade_policy_mode = "Manual"
# eviction policy may only be set when priority is Low
upgrade_mode = "Manual"
# eviction policy may only be set when priority is Spot
priority = var.priority
eviction_policy = var.priority == "Low" ? "Delete" : null
eviction_policy = var.priority == "Spot" ? "Delete" : null
}
# Scale up or down to maintain desired number, tolerating deallocations.
@ -82,7 +81,7 @@ resource "azurerm_monitor_autoscale_setting" "workers" {
# autoscale
enabled = true
target_resource_id = azurerm_virtual_machine_scale_set.workers.id
target_resource_id = azurerm_linux_virtual_machine_scale_set.workers.id
profile {
name = "default"
@ -99,7 +98,7 @@ resource "azurerm_monitor_autoscale_setting" "workers" {
data "ct_config" "worker-ignition" {
content = data.template_file.worker-config.rendered
pretty_print = false
snippets = var.clc_snippets
snippets = var.snippets
}
# Worker Container Linux configs

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.17.1 (upstream)
* Kubernetes v1.18.1 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=de85f1da7df0b13dfb7488350c20a510f3090cdf"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=1ad53d3b1c1ad75a4ed27f124f772fc5dc025245"
cluster_name = var.cluster_name
api_servers = [var.k8s_domain_name]

View File

@ -7,7 +7,9 @@ systemd:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.4.3"
Environment="ETCD_IMAGE_TAG=v3.4.7"
Environment="ETCD_IMAGE_URL=docker://quay.io/coreos/etcd"
Environment="RKT_RUN_ARGS=--insecure-options=image"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
@ -87,7 +89,7 @@ systemd:
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
@ -101,8 +103,7 @@ systemd:
--mount volume=etc-iscsi,target=/etc/iscsi \
--volume usr-sbin-iscsiadm,kind=host,source=/usr/sbin/iscsiadm \
--mount volume=usr-sbin-iscsiadm,target=/sbin/iscsiadm \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
--exec=/usr/local/bin/kubelet -- \
docker://quay.io/poseidon/kubelet:v1.18.1 -- \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -137,7 +138,6 @@ systemd:
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \
@ -147,7 +147,7 @@ systemd:
--volume script,kind=host,source=/opt/bootstrap/apply \
--mount volume=script,target=/apply \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
docker://quay.io/poseidon/kubelet:v1.18.1 \
--net=host \
--dns=host \
--exec=/apply
@ -185,8 +185,10 @@ storage:
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
sudo mkdir -p /opt/bootstrap/assets/manifests/crds
sudo mv manifests-networking/{crd,cluster}*.yaml /opt/bootstrap/assets/manifests/crds 2>/dev/null || true
sudo mv manifests-networking/* /opt/bootstrap/assets/manifests/
rm -rf assets auth static-manifests tls manifests-networking
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
@ -198,6 +200,10 @@ storage:
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests/crds -R; do
echo "Retry Custom Resource Definition manifests"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5

View File

@ -62,7 +62,7 @@ systemd:
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
@ -76,8 +76,7 @@ systemd:
--mount volume=etc-iscsi,target=/etc/iscsi \
--volume usr-sbin-iscsiadm,kind=host,source=/usr/sbin/iscsiadm \
--mount volume=usr-sbin-iscsiadm,target=/sbin/iscsiadm \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
--exec=/usr/local/bin/kubelet -- \
docker://quay.io/poseidon/kubelet:v1.18.1 -- \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -93,6 +92,12 @@ systemd:
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{~ for label in compact(split(",", node_labels)) ~}
--node-labels=${label} \
%{~ endfor ~}
%{~ for taint in compact(split(",", node_taints)) ~}
--register-with-taints=${taint} \
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins

View File

@ -144,7 +144,7 @@ data "ct_config" "controller-ignitions" {
count = length(var.controllers)
content = data.template_file.controller-configs.*.rendered[count.index]
pretty_print = false
snippets = local.clc_map[var.controllers.*.name[count.index]]
snippets = lookup(var.snippets, var.controllers.*.name[count.index], [])
}
data "template_file" "controller-configs" {
@ -174,7 +174,7 @@ data "ct_config" "worker-ignitions" {
count = length(var.workers)
content = data.template_file.worker-configs.*.rendered[count.index]
pretty_print = false
snippets = local.clc_map[var.workers.*.name[count.index]]
snippets = lookup(var.snippets, var.workers.*.name[count.index], [])
}
data "template_file" "worker-configs" {
@ -188,26 +188,7 @@ data "template_file" "worker-configs" {
cluster_dns_service_ip = module.bootstrap.cluster_dns_service_ip
cluster_domain_suffix = var.cluster_domain_suffix
ssh_authorized_key = var.ssh_authorized_key
node_labels = join(",", lookup(var.worker_node_labels, var.workers.*.name[count.index], []))
node_taints = join(",", lookup(var.worker_node_taints, var.workers.*.name[count.index], []))
}
}
locals {
# Hack to workaround https://github.com/hashicorp/terraform/issues/17251
# Still an issue in Terraform v0.12 https://github.com/hashicorp/terraform/issues/20572
# Default Container Linux config snippets map every node names to list("\n") so
# all lookups succeed
clc_defaults = zipmap(
concat(var.controllers.*.name, var.workers.*.name),
chunklist(data.template_file.clc-default-snippets.*.rendered, 1),
)
# Union of the default and user specific snippets, later overrides prior.
clc_map = merge(local.clc_defaults, var.clc_snippets)
}
// Horrible hack to generate a Terraform list of node count length
data "template_file" "clc-default-snippets" {
count = length(var.controllers) + length(var.workers)
template = "\n"
}

View File

@ -49,12 +49,24 @@ List of worker machine details (unique name, identifying MAC address, FQDN)
EOD
}
variable "clc_snippets" {
variable "snippets" {
type = map(list(string))
description = "Map from machine names to lists of Container Linux Config snippets"
default = {}
}
variable "worker_node_labels" {
type = map(list(string))
description = "Map from worker names to lists of initial node labels"
default = {}
}
variable "worker_node_taints" {
type = map(list(string))
description = "Map from worker names to lists of initial node taints"
default = {}
}
# configuration
variable "k8s_domain_name" {

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.17.1 (upstream)
* Kubernetes v1.18.1 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=de85f1da7df0b13dfb7488350c20a510f3090cdf"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=1ad53d3b1c1ad75a4ed27f124f772fc5dc025245"
cluster_name = var.cluster_name
api_servers = [var.k8s_domain_name]

View File

@ -28,7 +28,7 @@ systemd:
--network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.4.3
quay.io/coreos/etcd:v3.4.7
ExecStop=/usr/bin/podman stop etcd
[Install]
WantedBy=multi-user.target
@ -72,7 +72,7 @@ systemd:
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico \
--volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \
@ -80,7 +80,7 @@ systemd:
--volume /opt/cni/bin:/opt/cni/bin:z \
--volume /etc/iscsi:/etc/iscsi \
--volume /sbin/iscsiadm:/sbin/iscsiadm \
k8s.gcr.io/hyperkube:v1.17.1 kubelet \
quay.io/poseidon/kubelet:v1.18.1 \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -127,14 +127,13 @@ systemd:
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/podman run --name bootstrap \
--network host \
--volume /etc/kubernetes/bootstrap-secrets:/etc/kubernetes/secrets:ro,Z \
--volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \
--entrypoint=/apply \
k8s.gcr.io/hyperkube:v1.17.1
quay.io/poseidon/kubelet:v1.18.1
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap
storage:
@ -166,8 +165,10 @@ storage:
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
sudo mkdir -p /opt/bootstrap/assets/manifests/crds
sudo mv manifests-networking/{crd,cluster}*.yaml /opt/bootstrap/assets/manifests/crds 2>/dev/null || true
sudo mv manifests-networking/* /opt/bootstrap/assets/manifests/
rm -rf assets auth static-manifests tls manifests-networking
- path: /opt/bootstrap/apply
mode: 0544
contents:
@ -178,14 +179,14 @@ storage:
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests/crds -R; do
echo "Retry Custom Resource Definition manifests"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5
done
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.all.rp_filter=1
- path: /etc/sysctl.d/max-user-watches.conf
contents:
inline: |

View File

@ -42,7 +42,7 @@ systemd:
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico \
--volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \
@ -50,7 +50,7 @@ systemd:
--volume /opt/cni/bin:/opt/cni/bin:z \
--volume /etc/iscsi:/etc/iscsi \
--volume /sbin/iscsiadm:/sbin/iscsiadm \
k8s.gcr.io/hyperkube:v1.17.1 kubelet \
quay.io/poseidon/kubelet:v1.18.1 \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -68,6 +68,12 @@ systemd:
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{~ for label in compact(split(",", node_labels)) ~}
--node-labels=${label} \
%{~ endfor ~}
%{~ for taint in compact(split(",", node_taints)) ~}
--register-with-taints=${taint} \
%{~ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
@ -95,10 +101,6 @@ storage:
contents:
inline:
${domain_name}
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.all.rp_filter=1
- path: /etc/sysctl.d/max-user-watches.conf
contents:
inline: |

View File

@ -1,24 +1,28 @@
locals {
remote_kernel = "https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-installer-kernel-x86_64"
remote_initrd = "https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-installer-initramfs.x86_64.img"
remote_kernel = "https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-live-kernel-x86_64"
remote_initrd = "https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-live-initramfs.x86_64.img"
remote_args = [
"ip=dhcp",
"rd.neednet=1",
"coreos.inst=yes",
"initrd=fedora-coreos-${var.os_version}-live-initramfs.x86_64.img",
"coreos.inst.image_url=https://builds.coreos.fedoraproject.org/prod/streams/${var.os_stream}/builds/${var.os_version}/x86_64/fedora-coreos-${var.os_version}-metal.x86_64.raw.xz",
"coreos.inst.ignition_url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
"coreos.inst.install_dev=${var.install_disk}"
"coreos.inst.install_dev=${var.install_disk}",
"console=tty0",
"console=ttyS0",
]
cached_kernel = "/assets/fedora-coreos/fedora-coreos-${var.os_version}-installer-kernel-x86_64"
cached_initrd = "/assets/fedora-coreos/fedora-coreos-${var.os_version}-installer-initramfs.x86_64.img"
cached_kernel = "/assets/fedora-coreos/fedora-coreos-${var.os_version}-live-kernel-x86_64"
cached_initrd = "/assets/fedora-coreos/fedora-coreos-${var.os_version}-live-initramfs.x86_64.img"
cached_args = [
"ip=dhcp",
"rd.neednet=1",
"coreos.inst=yes",
"initrd=fedora-coreos-${var.os_version}-live-initramfs.x86_64.img",
"coreos.inst.image_url=${var.matchbox_http_endpoint}/assets/fedora-coreos/fedora-coreos-${var.os_version}-metal.x86_64.raw.xz",
"coreos.inst.ignition_url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
"coreos.inst.install_dev=${var.install_disk}"
"coreos.inst.install_dev=${var.install_disk}",
"console=tty0",
"console=ttyS0",
]
kernel = var.cached_install ? local.cached_kernel : local.remote_kernel
@ -44,8 +48,9 @@ resource "matchbox_profile" "controllers" {
data "ct_config" "controller-ignitions" {
count = length(var.controllers)
content = data.template_file.controller-configs.*.rendered[count.index]
strict = true
content = data.template_file.controller-configs.*.rendered[count.index]
strict = true
snippets = lookup(var.snippets, var.controllers.*.name[count.index], [])
}
data "template_file" "controller-configs" {
@ -79,8 +84,9 @@ resource "matchbox_profile" "workers" {
data "ct_config" "worker-ignitions" {
count = length(var.workers)
content = data.template_file.worker-configs.*.rendered[count.index]
strict = true
content = data.template_file.worker-configs.*.rendered[count.index]
strict = true
snippets = lookup(var.snippets, var.workers.*.name[count.index], [])
}
data "template_file" "worker-configs" {
@ -92,6 +98,8 @@ data "template_file" "worker-configs" {
cluster_dns_service_ip = module.bootstrap.cluster_dns_service_ip
cluster_domain_suffix = var.cluster_domain_suffix
ssh_authorized_key = var.ssh_authorized_key
node_labels = join(",", lookup(var.worker_node_labels, var.workers.*.name[count.index], []))
node_taints = join(",", lookup(var.worker_node_taints, var.workers.*.name[count.index], []))
}
}

View File

@ -13,12 +13,12 @@ variable "matchbox_http_endpoint" {
variable "os_stream" {
type = string
description = "Fedora CoreOS release stream (e.g. testing, stable)"
default = "testing"
default = "stable"
}
variable "os_version" {
type = string
description = "Fedora CoreOS version to PXE and install (e.g. 30.20190712.0)"
description = "Fedora CoreOS version to PXE and install (e.g. 31.20200310.3.0)"
}
# machines
@ -56,6 +56,18 @@ variable "snippets" {
default = {}
}
variable "worker_node_labels" {
type = map(list(string))
description = "Map from worker names to lists of initial node labels"
default = {}
}
variable "worker_node_taints" {
type = map(list(string))
description = "Map from worker names to lists of initial node taints"
default = {}
}
# configuration
variable "k8s_domain_name" {

View File

@ -11,7 +11,7 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.17.1 (upstream)
* Kubernetes v1.18.1 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization

View File

@ -1,6 +1,6 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=de85f1da7df0b13dfb7488350c20a510f3090cdf"
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=1ad53d3b1c1ad75a4ed27f124f772fc5dc025245"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]

View File

@ -7,7 +7,9 @@ systemd:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.4.3"
Environment="ETCD_IMAGE_TAG=v3.4.7"
Environment="ETCD_IMAGE_URL=docker://quay.io/coreos/etcd"
Environment="RKT_RUN_ARGS=--insecure-options=image"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -89,7 +91,7 @@ systemd:
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
@ -99,8 +101,7 @@ systemd:
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
--exec=/usr/local/bin/kubelet -- \
docker://quay.io/poseidon/kubelet:v1.18.1 -- \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -134,7 +135,6 @@ systemd:
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes/bootstrap-secrets \
@ -144,7 +144,7 @@ systemd:
--volume script,kind=host,source=/opt/bootstrap/apply \
--mount volume=script,target=/apply \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
docker://quay.io/poseidon/kubelet:v1.18.1 \
--net=host \
--dns=host \
--exec=/apply
@ -176,8 +176,10 @@ storage:
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
sudo mkdir -p /opt/bootstrap/assets/manifests/crds
sudo mv manifests-networking/{crd,cluster}*.yaml /opt/bootstrap/assets/manifests/crds 2>/dev/null || true
sudo mv manifests-networking/* /opt/bootstrap/assets/manifests/
rm -rf assets auth static-manifests tls manifests-networking
- path: /opt/bootstrap/apply
filesystem: root
mode: 0544
@ -189,6 +191,10 @@ storage:
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests/crds -R; do
echo "Retry Custom Resource Definition manifests"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5

View File

@ -64,7 +64,7 @@ systemd:
--mount volume=run,target=/run \
--volume usr-share-certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
--mount volume=usr-share-certs,target=/usr/share/ca-certificates \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--volume var-lib-calico,kind=host,source=/var/lib/calico,readOnly=true \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume var-lib-docker,kind=host,source=/var/lib/docker \
--mount volume=var-lib-docker,target=/var/lib/docker \
@ -74,8 +74,7 @@ systemd:
--mount volume=var-log,target=/var/log \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
--exec=/usr/local/bin/kubelet -- \
docker://quay.io/poseidon/kubelet:v1.18.1 -- \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
@ -132,8 +131,7 @@ storage:
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.17.1 \
docker://quay.io/poseidon/kubelet:v1.18.1 \
--net=host \
--dns=host \
-- \
kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
--exec=/usr/local/bin/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)

View File

@ -1,3 +1,8 @@
locals {
official_images = ["coreos-stable", "coreos-beta", "coreos-alpha"]
is_official_image = contains(local.official_images, var.os_image)
}
# Controller Instance DNS records
resource "digitalocean_record" "controllers" {
count = var.controller_count
@ -37,11 +42,12 @@ resource "digitalocean_droplet" "controllers" {
name = "${var.cluster_name}-controller-${count.index}"
region = var.region
image = var.image
image = var.os_image
size = var.controller_type
# network
ipv6 = true
# only official DigitalOcean images support IPv6
ipv6 = local.is_official_image
private_networking = true
user_data = data.ct_config.controller-ignitions.*.rendered[count.index]
@ -66,7 +72,7 @@ data "ct_config" "controller-ignitions" {
count = var.controller_count
content = data.template_file.controller-configs.*.rendered[count.index]
pretty_print = false
snippets = var.controller_clc_snippets
snippets = var.controller_snippets
}
# Controller Container Linux configs

View File

@ -41,19 +41,19 @@ variable "worker_type" {
default = "s-1vcpu-2gb"
}
variable "image" {
variable "os_image" {
type = string
description = "Container Linux image for instances (e.g. coreos-stable)"
default = "coreos-stable"
}
variable "controller_clc_snippets" {
variable "controller_snippets" {
type = list(string)
description = "Controller Container Linux Config snippets"
default = []
}
variable "worker_clc_snippets" {
variable "worker_snippets" {
type = list(string)
description = "Worker Container Linux Config snippets"
default = []

View File

@ -12,7 +12,8 @@ resource "digitalocean_record" "workers-record-a" {
}
resource "digitalocean_record" "workers-record-aaaa" {
count = var.worker_count
# only official DigitalOcean images support IPv6
count = local.is_official_image ? var.worker_count : 0
# DNS zone where record should be created
domain = var.dns_zone
@ -30,11 +31,12 @@ resource "digitalocean_droplet" "workers" {
name = "${var.cluster_name}-worker-${count.index}"
region = var.region
image = var.image
image = var.os_image
size = var.worker_type
# network
ipv6 = true
# only official DigitalOcean images support IPv6
ipv6 = local.is_official_image
private_networking = true
user_data = data.ct_config.worker-ignition.rendered
@ -58,7 +60,7 @@ resource "digitalocean_tag" "workers" {
data "ct_config" "worker-ignition" {
content = data.template_file.worker-config.rendered
pretty_print = false
snippets = var.worker_clc_snippets
snippets = var.worker_snippets
}
# Worker Container Linux config

View File

@ -0,0 +1,23 @@
The MIT License (MIT)
Copyright (c) 2020 Typhoon Authors
Copyright (c) 2020 Dalton Hubble
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@ -0,0 +1,23 @@
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
Typhoon is a minimal and free Kubernetes distribution.
* Minimal, stable base Kubernetes distribution
* Declarative infrastructure and configuration
* Free (freedom and cost) and privacy-respecting
* Practical for labs, datacenters, and clouds
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.18.1 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/) customization
* Ready for Ingress, Prometheus, Grafana, CSI, and other [addons](https://typhoon.psdn.io/addons/overview/)
## Docs
Please see the [official docs](https://typhoon.psdn.io) and the Digital Ocean [tutorial](https://typhoon.psdn.io/fedora-coreos/digitalocean/).

View File

@ -0,0 +1,25 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=1ad53d3b1c1ad75a4ed27f124f772fc5dc025245"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]
etcd_servers = digitalocean_record.etcds.*.fqdn
asset_dir = var.asset_dir
networking = var.networking
# only effective with Calico networking
network_encapsulation = "vxlan"
network_mtu = "1450"
pod_cidr = var.pod_cidr
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
enable_reporting = var.enable_reporting
enable_aggregation = var.enable_aggregation
# Fedora CoreOS
trusted_certs_dir = "/etc/pki/tls/certs"
}

View File

@ -0,0 +1,100 @@
# Controller Instance DNS records
resource "digitalocean_record" "controllers" {
count = var.controller_count
# DNS zone where record should be created
domain = var.dns_zone
# DNS record (will be prepended to domain)
name = var.cluster_name
type = "A"
ttl = 300
# IPv4 addresses of controllers
value = digitalocean_droplet.controllers.*.ipv4_address[count.index]
}
# Discrete DNS records for each controller's private IPv4 for etcd usage
resource "digitalocean_record" "etcds" {
count = var.controller_count
# DNS zone where record should be created
domain = var.dns_zone
# DNS record (will be prepended to domain)
name = "${var.cluster_name}-etcd${count.index}"
type = "A"
ttl = 300
# private IPv4 address for etcd
value = digitalocean_droplet.controllers.*.ipv4_address_private[count.index]
}
# Controller droplet instances
resource "digitalocean_droplet" "controllers" {
count = var.controller_count
name = "${var.cluster_name}-controller-${count.index}"
region = var.region
image = var.os_image
size = var.controller_type
# network
# TODO: Only official DigitalOcean images support IPv6
ipv6 = false
private_networking = true
user_data = data.ct_config.controller-ignitions.*.rendered[count.index]
ssh_keys = var.ssh_fingerprints
tags = [
digitalocean_tag.controllers.id,
]
lifecycle {
ignore_changes = [user_data]
}
}
# Tag to label controllers
resource "digitalocean_tag" "controllers" {
name = "${var.cluster_name}-controller"
}
# Controller Ignition configs
data "ct_config" "controller-ignitions" {
count = var.controller_count
content = data.template_file.controller-configs.*.rendered[count.index]
strict = true
snippets = var.controller_snippets
}
# Controller Fedora CoreOS configs
data "template_file" "controller-configs" {
count = var.controller_count
template = file("${path.module}/fcc/controller.yaml")
vars = {
# Cannot use cyclic dependencies on controllers or their DNS records
etcd_name = "etcd${count.index}"
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
etcd_initial_cluster = join(",", data.template_file.etcds.*.rendered)
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
}
}
data "template_file" "etcds" {
count = var.controller_count
template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380"
vars = {
index = count.index
cluster_name = var.cluster_name
dns_zone = var.dns_zone
}
}

View File

@ -0,0 +1,214 @@
---
variant: fcos
version: 1.0.0
systemd:
units:
- name: etcd-member.service
enabled: true
contents: |
[Unit]
Description=etcd (System Container)
Documentation=https://github.com/coreos/etcd
Wants=network-online.target network.target
After=network-online.target
[Service]
# https://github.com/opencontainers/runc/pull/1807
# Type=notify
# NotifyAccess=exec
Type=exec
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
ExecStartPre=/bin/mkdir -p /var/lib/etcd
ExecStartPre=-/usr/bin/podman rm etcd
#--volume $${NOTIFY_SOCKET}:/run/systemd/notify \
ExecStart=/usr/bin/podman run --name etcd \
--env-file /etc/etcd/etcd.env \
--network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.4.7
ExecStop=/usr/bin/podman stop etcd
[Install]
WantedBy=multi-user.target
- name: docker.service
enabled: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
contents: |
[Unit]
Description=Kubelet via Hyperkube (System Container)
Requires=afterburn.service
After=afterburn.service
Wants=rpc-statd.service
[Service]
EnvironmentFile=/run/metadata/afterburn
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/podman rm kubelet
ExecStart=/usr/bin/podman run --name kubelet \
--privileged \
--pid host \
--network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \
--volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \
--volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \
quay.io/poseidon/kubelet:v1.18.1 \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--cgroup-driver=systemd \
--cgroups-per-qos=true \
--enforce-node-allocatable=pods \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--healthz-port=0 \
--hostname-override=$${AFTERBURN_DIGITALOCEAN_IPV4_PRIVATE_0} \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/master \
--node-labels=node.kubernetes.io/controller="true" \
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/podman stop kubelet
Delegate=yes
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: kubelet.path
enabled: true
contents: |
[Unit]
Description=Watch for kubeconfig
[Path]
PathExists=/etc/kubernetes/kubeconfig
[Install]
WantedBy=multi-user.target
- name: bootstrap.service
contents: |
[Unit]
Description=Kubernetes control plane
ConditionPathExists=!/opt/bootstrap/bootstrap.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/podman rm bootstrap
ExecStart=/usr/bin/podman run --name bootstrap \
--network host \
--volume /etc/kubernetes/bootstrap-secrets:/etc/kubernetes/secrets:ro,Z \
--volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \
--entrypoint=/apply \
quay.io/poseidon/kubelet:v1.18.1
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap
storage:
directories:
- path: /etc/kubernetes
- path: /opt/bootstrap
files:
- path: /opt/bootstrap/layout
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/bootstrap-secrets
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/bootstrap-secrets/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
mv auth/kubeconfig /etc/kubernetes/bootstrap-secrets/
mv tls/k8s/* /etc/kubernetes/bootstrap-secrets/
sudo mkdir -p /etc/kubernetes/manifests
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking/* /opt/bootstrap/assets/manifests/
rm -rf assets auth static-manifests tls manifests-networking
- path: /opt/bootstrap/apply
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/etc/kubernetes/secrets/kubeconfig
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5
done
- path: /etc/sysctl.d/max-user-watches.conf
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/systemd/system.conf.d/accounting.conf
contents:
inline: |
[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes
- path: /etc/etcd/etcd.env
mode: 0644
contents:
inline: |
# TODO: Use a systemd dropin once podman v1.4.5 is avail.
NOTIFY_SOCKET=/run/systemd/notify
ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381
ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}
ETCD_STRICT_RECONFIG_CHECK=true
ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt
ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt
ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true

View File

@ -0,0 +1,117 @@
---
variant: fcos
version: 1.0.0
systemd:
units:
- name: docker.service
enabled: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet via Hyperkube (System Container)
Requires=afterburn.service
After=afterburn.service
Wants=rpc-statd.service
[Service]
EnvironmentFile=/run/metadata/afterburn
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/podman rm kubelet
ExecStart=/usr/bin/podman run --name kubelet \
--privileged \
--pid host \
--network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \
--volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \
--volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \
quay.io/poseidon/kubelet:v1.18.1 \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--cgroup-driver=systemd \
--cgroups-per-qos=true \
--enforce-node-allocatable=pods \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--healthz-port=0 \
--hostname-override=$${AFTERBURN_DIGITALOCEAN_IPV4_PRIVATE_0} \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/podman stop kubelet
Delegate=yes
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: kubelet.path
enabled: true
contents: |
[Unit]
Description=Watch for kubeconfig
[Path]
PathExists=/etc/kubernetes/kubeconfig
[Install]
WantedBy=multi-user.target
- name: delete-node.service
enabled: true
contents: |
[Unit]
Description=Delete Kubernetes node on shutdown
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash -c '/usr/bin/podman run --volume /etc/kubernetes:/etc/kubernetes:ro,z --entrypoint /usr/local/bin/kubectl quay.io/poseidon/kubelet:v1.18.1 --kubeconfig=/etc/kubernetes/kubeconfig delete node $HOSTNAME'
[Install]
WantedBy=multi-user.target
storage:
directories:
- path: /etc/kubernetes
files:
- path: /etc/sysctl.d/max-user-watches.conf
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/systemd/system.conf.d/accounting.conf
contents:
inline: |
[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes

View File

@ -0,0 +1,117 @@
resource "digitalocean_firewall" "rules" {
name = var.cluster_name
tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
# allow ssh, internal flannel, internal node-exporter, internal kubelet
inbound_rule {
protocol = "tcp"
port_range = "22"
source_addresses = ["0.0.0.0/0", "::/0"]
}
inbound_rule {
protocol = "udp"
port_range = "4789"
source_tags = [digitalocean_tag.controllers.name, digitalocean_tag.workers.name]
}
# Allow Prometheus to scrape node-exporter
inbound_rule {
protocol = "tcp"
port_range = "9100"
source_tags = [digitalocean_tag.workers.name]
}
# Allow Prometheus to scrape kube-proxy
inbound_rule {
protocol = "tcp"
port_range = "10249"
source_tags = [digitalocean_tag.workers.name]
}
inbound_rule {
protocol = "tcp"
port_range = "10250"
source_tags = [digitalocean_tag.controllers.name, digitalocean_tag.workers.name]
}
# allow all outbound traffic
outbound_rule {
protocol = "tcp"
port_range = "1-65535"
destination_addresses = ["0.0.0.0/0", "::/0"]
}
outbound_rule {
protocol = "udp"
port_range = "1-65535"
destination_addresses = ["0.0.0.0/0", "::/0"]
}
outbound_rule {
protocol = "icmp"
port_range = "1-65535"
destination_addresses = ["0.0.0.0/0", "::/0"]
}
}
resource "digitalocean_firewall" "controllers" {
name = "${var.cluster_name}-controllers"
tags = ["${var.cluster_name}-controller"]
# etcd
inbound_rule {
protocol = "tcp"
port_range = "2379-2380"
source_tags = [digitalocean_tag.controllers.name]
}
# etcd metrics
inbound_rule {
protocol = "tcp"
port_range = "2381"
source_tags = [digitalocean_tag.workers.name]
}
# kube-apiserver
inbound_rule {
protocol = "tcp"
port_range = "6443"
source_addresses = ["0.0.0.0/0", "::/0"]
}
# kube-scheduler metrics, kube-controller-manager metrics
inbound_rule {
protocol = "tcp"
port_range = "10251-10252"
source_tags = [digitalocean_tag.workers.name]
}
}
resource "digitalocean_firewall" "workers" {
name = "${var.cluster_name}-workers"
tags = ["${var.cluster_name}-worker"]
# allow HTTP/HTTPS ingress
inbound_rule {
protocol = "tcp"
port_range = "80"
source_addresses = ["0.0.0.0/0", "::/0"]
}
inbound_rule {
protocol = "tcp"
port_range = "443"
source_addresses = ["0.0.0.0/0", "::/0"]
}
inbound_rule {
protocol = "tcp"
port_range = "10254"
source_addresses = ["0.0.0.0/0"]
}
}

View File

@ -0,0 +1,47 @@
output "kubeconfig-admin" {
value = module.bootstrap.kubeconfig-admin
}
output "controllers_dns" {
value = digitalocean_record.controllers[0].fqdn
}
output "workers_dns" {
# Multiple A and AAAA records with the same FQDN
value = digitalocean_record.workers-record-a[0].fqdn
}
output "controllers_ipv4" {
value = digitalocean_droplet.controllers.*.ipv4_address
}
output "controllers_ipv6" {
value = digitalocean_droplet.controllers.*.ipv6_address
}
output "workers_ipv4" {
value = digitalocean_droplet.workers.*.ipv4_address
}
output "workers_ipv6" {
value = digitalocean_droplet.workers.*.ipv6_address
}
# Outputs for worker pools
output "kubeconfig" {
value = module.bootstrap.kubeconfig-kubelet
}
# Outputs for custom firewalls
output "controller_tag" {
description = "Tag applied to controller droplets"
value = digitalocean_tag.controllers.name
}
output "worker_tag" {
description = "Tag applied to worker droplets"
value = digitalocean_tag.workers.name
}

View File

@ -0,0 +1,87 @@
locals {
# format assets for distribution
assets_bundle = [
# header with the unpack location
for key, value in module.bootstrap.assets_dist :
format("##### %s\n%s", key, value)
]
}
# Secure copy assets to controllers. Activates kubelet.service
resource "null_resource" "copy-controller-secrets" {
count = var.controller_count
depends_on = [
module.bootstrap,
digitalocean_firewall.rules
]
connection {
type = "ssh"
host = digitalocean_droplet.controllers.*.ipv4_address[count.index]
user = "core"
timeout = "15m"
}
provisioner "file" {
content = module.bootstrap.kubeconfig-kubelet
destination = "$HOME/kubeconfig"
}
provisioner "file" {
content = join("\n", local.assets_bundle)
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mv $HOME/kubeconfig /etc/kubernetes/kubeconfig",
"sudo /opt/bootstrap/layout",
]
}
}
# Secure copy kubeconfig to all workers. Activates kubelet.service.
resource "null_resource" "copy-worker-secrets" {
count = var.worker_count
connection {
type = "ssh"
host = digitalocean_droplet.workers.*.ipv4_address[count.index]
user = "core"
timeout = "15m"
}
provisioner "file" {
content = module.bootstrap.kubeconfig-kubelet
destination = "$HOME/kubeconfig"
}
provisioner "remote-exec" {
inline = [
"sudo mv $HOME/kubeconfig /etc/kubernetes/kubeconfig",
]
}
}
# Connect to a controller to perform one-time cluster bootstrap.
resource "null_resource" "bootstrap" {
depends_on = [
null_resource.copy-controller-secrets,
null_resource.copy-worker-secrets,
]
connection {
type = "ssh"
host = digitalocean_droplet.controllers[0].ipv4_address
user = "core"
timeout = "15m"
}
provisioner "remote-exec" {
inline = [
"sudo systemctl start bootstrap",
]
}
}

View File

@ -0,0 +1,114 @@
variable "cluster_name" {
type = string
description = "Unique cluster name (prepended to dns_zone)"
}
# Digital Ocean
variable "region" {
type = string
description = "Digital Ocean region (e.g. nyc1, sfo2, fra1, tor1)"
}
variable "dns_zone" {
type = string
description = "Digital Ocean domain (i.e. DNS zone) (e.g. do.example.com)"
}
# instances
variable "controller_count" {
type = number
description = "Number of controllers (i.e. masters)"
default = 1
}
variable "worker_count" {
type = number
description = "Number of workers"
default = 1
}
variable "controller_type" {
type = string
description = "Droplet type for controllers (e.g. s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb)."
default = "s-2vcpu-2gb"
}
variable "worker_type" {
type = string
description = "Droplet type for workers (e.g. s-1vcpu-2gb, s-2vcpu-2gb)"
default = "s-1vcpu-2gb"
}
variable "os_image" {
type = string
description = "Fedora CoreOS image for instances"
}
variable "controller_snippets" {
type = list(string)
description = "Controller Fedora CoreOS Config snippets"
default = []
}
variable "worker_snippets" {
type = list(string)
description = "Worker Fedora CoreOS Config snippets"
default = []
}
# configuration
variable "ssh_fingerprints" {
type = list(string)
description = "SSH public key fingerprints. (e.g. see `ssh-add -l -E md5`)"
}
variable "networking" {
type = string
description = "Choice of networking provider (flannel or calico)"
default = "calico"
}
variable "pod_cidr" {
type = string
description = "CIDR IPv4 range to assign Kubernetes pods"
default = "10.2.0.0/16"
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
default = "10.3.0.0/16"
}
variable "enable_reporting" {
type = bool
description = "Enable usage or analytics reporting to upstreams (Calico)"
default = false
}
variable "enable_aggregation" {
type = bool
description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
default = false
}
# unofficial, undocumented, unsupported
variable "asset_dir" {
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "cluster_domain_suffix" {
type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
default = "cluster.local"
}

View File

@ -0,0 +1,12 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.12.6"
required_providers {
digitalocean = "~> 1.3"
ct = "~> 0.3"
template = "~> 2.1"
null = "~> 2.1"
}
}

View File

@ -0,0 +1,77 @@
# Worker DNS records
resource "digitalocean_record" "workers-record-a" {
count = var.worker_count
# DNS zone where record should be created
domain = var.dns_zone
name = "${var.cluster_name}-workers"
type = "A"
ttl = 300
value = digitalocean_droplet.workers.*.ipv4_address[count.index]
}
/*
# TODO: Only official DigitalOcean images support IPv6
resource "digitalocean_record" "workers-record-aaaa" {
count = var.worker_count
# DNS zone where record should be created
domain = var.dns_zone
name = "${var.cluster_name}-workers"
type = "AAAA"
ttl = 300
value = digitalocean_droplet.workers.*.ipv6_address[count.index]
}
*/
# Worker droplet instances
resource "digitalocean_droplet" "workers" {
count = var.worker_count
name = "${var.cluster_name}-worker-${count.index}"
region = var.region
image = var.os_image
size = var.worker_type
# network
# TODO: Only official DigitalOcean images support IPv6
ipv6 = false
private_networking = true
user_data = data.ct_config.worker-ignition.rendered
ssh_keys = var.ssh_fingerprints
tags = [
digitalocean_tag.workers.id,
]
lifecycle {
create_before_destroy = true
}
}
# Tag to label workers
resource "digitalocean_tag" "workers" {
name = "${var.cluster_name}-worker"
}
# Worker Ignition config
data "ct_config" "worker-ignition" {
content = data.template_file.worker-config.rendered
strict = true
snippets = var.worker_snippets
}
# Worker Fedora CoreOS config
data "template_file" "worker-config" {
template = file("${path.module}/fcc/worker.yaml")
vars = {
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
}
}

View File

@ -1,29 +0,0 @@
# Container Linux Update Operator
The [Container Linux Update Operator](https://github.com/coreos/container-linux-update-operator) (i.e. CLUO) coordinates reboots of auto-updating Container Linux nodes so that one node reboots at a time and nodes are drained before reboot. CLUO enables the auto-update behavior Container Linux clusters are known for, but does so in a Kubernetes native way.
## Create
Create the `update-operator` deployment and `update-agent` DaemonSet.
```sh
kubectl apply -f addons/cluo -R
```
## Usage
`update-agent` runs as a DaemonSet and annotates a node when `update-engine.service` indicates an update has been installed and a reboot is needed. It also adds additional labels and annotations to nodes.
```
$ kubectl get nodes --show-labels
...
container-linux-update.v1.coreos.com/group=stable
container-linux-update.v1.coreos.com/version=1632.3.0
```
`update-operator` ensures one node reboots at a time and that pods are drained prior to reboot.
!!! note ""
CLUO replaces `locksmithd` reboot coordination. The `update_engine` systemd unit on hosts still performs the Container Linux update check, download, and install to the inactive partition.

View File

@ -2,7 +2,6 @@
Every Typhoon cluster is verified to work well with several post-install addons.
* [CLUO](cluo.md) (Container Linux only)
* Nginx [Ingress Controller](ingress.md)
* [Prometheus](prometheus.md)
* [Grafana](grafana.md)

View File

@ -8,24 +8,88 @@ Typhoon modules accept Terraform input variables for customizing clusters in mer
## Addons
Clusters are kept to a minimal Kubernetes control plane by offering components like Nginx Ingress Controller, Prometheus, Grafana, and Heapster as optional post-install [addons](https://github.com/poseidon/typhoon/tree/master/addons). Customize addons by modifying a copy of our addon manifests.
Clusters are kept to a minimal Kubernetes control plane by offering components like Nginx Ingress Controller, Prometheus, and Grafana as optional post-install [addons](https://github.com/poseidon/typhoon/tree/master/addons). Customize addons by modifying a copy of our addon manifests.
## Hosts
### Container Linux
Typhoon uses the [Ignition](https://github.com/coreos/ignition) system of Fedora CoreOS and Flatcar Linux to immutably declare a system via first-boot disk provisioning. Fedora CoreOS uses a [Fedora CoreOS Config](https://docs.fedoraproject.org/en-US/fedora-coreos/fcct-config/) (FCC) and Flatcar Linux uses a [Container Linux Config](https://github.com/coreos/container-linux-config-transpiler/blob/master/doc/examples.md) (CLC). These define disk partitions, filesystems, systemd units, dropins, config files, mount units, raid arrays, and users.
Controller and worker instances form a minimal and secure Kubernetes cluster on each platform. Typhoon provides the **snippets** feature to accept Fedora CoreOS Configs or Container Linux Configs to validate and additively merge into instance declarations. This allows advanced host customization and experimentation.
!!! note
Snippets cannot be used to modify an already existing instance, the antithesis of immutable provisioning. Ignition fully declares a system on first boot only.
!!! danger
Container Linux Configs provide powerful host customization abilities. You are responsible for the additional configs defined for hosts.
Snippets provide the powerful host customization abilities of Ignition. You are responsible for additional units, configs, files, and conflicts.
Container Linux Configs (CLCs) declare how a Container Linux instance's disk should be provisioned on first boot from disk. CLCs define disk partitions, filesystems, files, systemd units, dropins, networkd configs, mount units, raid arrays, and users. Typhoon creates controller and worker instances with base Container Linux Configs to create a minimal, secure Kubernetes cluster on each platform.
!!! danger
Edits to snippets for controller instances can (correctly) cause Terraform to observe a diff (if not otherwise suppressed) and propose destroying and recreating controller(s). Recognize that this is destructive since controllers run etcd and are stateful. See [blue/green](/topics/maintenance/#upgrades) clusters.
Typhoon AWS, Azure, bare-metal, DigitalOcean, and Google Cloud support CLC *snippets* - valid Container Linux Configs that are validated and additively merged into the Typhoon base config during `terraform plan`. This allows advanced host customizations and experimentation.
### Fedora CoreOS
#### Examples
!!! note
Fedora CoreOS snippets require `terraform-provider-ct` v0.5+
Container Linux [docs](https://coreos.com/os/docs/latest/clc-examples.html) show many simple config examples. Ensure a file `/opt/hello` is created with permissions 0644.
Define a Fedora CoreOS Config (FCC) ([docs](https://docs.fedoraproject.org/en-US/fedora-coreos/fcct-config/), [config](https://github.com/coreos/fcct/blob/master/docs/configuration-v1_0.md), [examples](https://github.com/coreos/fcct/blob/master/docs/examples.md)) in version control near your Terraform workspace directory (e.g. perhaps in a `snippets` subdirectory). You may organize snippets into multiple files, if desired.
For example, ensure an `/opt/hello` file is created with permissions 0644.
```yaml
# custom-files
variant: fcos
version: 1.0.0
storage:
files:
- path: /opt/hello
contents:
inline: |
Hello World
mode: 0644
```
Reference the FCC contents by location (e.g. `file("./custom-units.yaml")`). On [AWS](/fedora-coreos/aws/#cluster) or [Google Cloud](/fedora-coreos/google-cloud/#cluster) extend the `controller_snippets` or `worker_snippets` list variables.
```tf
module "nemo" {
...
controller_count = 1
worker_count = 2
controller_snippets = [
file("./custom-files"),
file("./custom-units"),
]
worker_snippets = [
file("./custom-files"),
file("./custom-units")",
]
...
}
```
On [Bare-Metal](/fedora-coreos/bare-metal/#cluster), different FCCs may be used for each node (since hardware may be heterogeneous). Extend the `snippets` map variable by mapping a controller or worker name key to a list of snippets.
```tf
module "mercury" {
...
snippets = {
"node2" = [file("./units/hello.yaml")]
"node3" = [
file("./units/world.yaml"),
file("./units/hello.yaml"),
]
}
...
}
```
### Container Linux
Define a Container Linux Config (CLC) ([config](https://github.com/coreos/container-linux-config-transpiler/blob/master/doc/configuration.md), [examples](https://github.com/coreos/container-linux-config-transpiler/blob/master/doc/examples.md)) in version control near your Terraform workspace directory (e.g. perhaps in a `snippets` subdirectory). You may organize snippets into multiple files, if desired.
For example, ensure an `/opt/hello` file is created with permissions 0644.
```yaml
# custom-files
storage:
files:
@ -37,9 +101,9 @@ storage:
mode: 0644
```
Ensure a systemd unit `hello.service` is created and a dropin `50-etcd-cluster.conf` is added for `etcd-member.service`.
Or ensure a systemd unit `hello.service` is created and a dropin `50-etcd-cluster.conf` is added for `etcd-member.service`.
```
```yaml
# custom-units
systemd:
units:
@ -61,27 +125,19 @@ systemd:
Environment="ETCD_LOG_PACKAGE_LEVELS=etcdserver=WARNING,security=DEBUG"
```
#### Specification
Reference the CLC contents by location (e.g. `file("./custom-units.yaml")`). On [AWS](/cl/aws/#cluster), [Azure](/cl/azure/#cluster), [DigitalOcean](/cl/digital-ocean/#cluster), or [Google Cloud](/cl/google-cloud/#cluster) extend the `controller_snippets` or `worker_snippets` list variables.
View the Container Linux Config [format](https://coreos.com/os/docs/1576.4.0/configuration.html) to read about each field.
#### Usage
Write Container Linux Configs *snippets* as files in the repository where you keep Terraform configs for clusters (perhaps in a `clc` or `snippets` subdirectory). You may organize snippets in multiple files as desired, provided they are each valid.
[AWS](/cl/aws/#cluster), [Azure](/cl/azure/#cluster), [DigitalOcean](/cl/digital-ocean/#cluster), and [Google Cloud](/cl/google-cloud/#cluster) clusters allow populating a list of `controller_clc_snippets` or `worker_clc_snippets`.
```
```tf
module "nemo" {
...
controller_count = 1
worker_count = 2
controller_clc_snippets = [
controller_snippets = [
file("./custom-files"),
file("./custom-units"),
]
worker_clc_snippets = [
worker_snippets = [
file("./custom-files"),
file("./custom-units")",
]
@ -89,17 +145,12 @@ module "nemo" {
}
```
[Bare-Metal](/cl/bare-metal/#cluster) clusters allow different Container Linux snippets to be used for each node (since hardware may be heterogeneous). Populate the optional `clc_snippets` map variable with any controller or worker name keys and lists of snippets.
On [Bare-Metal](/cl/bare-metal/#cluster), different CLCs may be used for each node (since hardware may be heterogeneous). Extend the `snippets` map variable by mapping a controller or worker name key to a list of snippets.
```
```tf
module "mercury" {
...
controller_names = ["node1"]
worker_names = [
"node2",
"node3",
]
clc_snippets = {
snippets = {
"node2" = [file("./units/hello.yaml")]
"node3" = [
file("./units/world.yaml"),
@ -110,32 +161,6 @@ module "mercury" {
}
```
Plan the resources to be created.
```
$ terraform plan
Plan: 54 to add, 0 to change, 0 to destroy.
```
Most syntax errors in CLCs can be caught during planning. For example, mangle the indentation in one of the CLC files:
```
$ terraform plan
...
error parsing Container Linux Config: error: yaml: line 3: did not find expected '-' indicator
```
Undo the mangle. Apply the changes to create the cluster per the tutorial.
```
$ terraform apply
```
Container Linux Configs (and the CoreOS Ignition system) create immutable infrastructure. Disk provisioning is performed only on first boot from disk. That means if you change a snippet used by an instance, Terraform will (correctly) try to destroy and recreate that instance. Be careful!
!!! danger
Destroying and recreating controller instances is destructive! etcd runs on controller instances and stores data there. Do not modify controller snippets. See [blue/green](/topics/maintenance/#upgrades) clusters.
## Architecture
Typhoon chooses variables to expose with purpose. If you must customize clusters in ways that aren't supported by input variables, fork Typhoon and maintain a repository with customizations. Reference the repository by changing the username.

View File

@ -67,7 +67,7 @@ The AWS internal `workers` module supports a number of [variables](https://githu
| disk_type | Type of the EBS volume | "gp2" | standard, gp2, io1 |
| disk_iops | IOPS of the EBS volume | 0 (i.e. auto) | 400 |
| spot_price | Spot price in USD for worker instances or 0 to use on-demand instances | 0 | 0.10 |
| clc_snippets | Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| snippets | Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| service_cidr | Must match `service_cidr` of cluster | "10.3.0.0/16" | "10.3.0.0/24" |
| node_labels | List of initial node labels | [] | ["worker-pool=foo"] |
@ -79,7 +79,7 @@ Create a cluster following the Azure [tutorial](../cl/azure.md#cluster). Define
```tf
module "ramius-worker-pool" {
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes/workers?ref=v1.17.1"
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes/workers?ref=v1.18.1"
# Azure
region = module.ramius.region
@ -133,7 +133,7 @@ The Azure internal `workers` module supports a number of [variables](https://git
| vm_type | Machine type for instances | "Standard_DS1_v2" | See below |
| os_image | Channel for a Container Linux derivative | "coreos-stable" | coreos-stable, coreos-beta, coreos-alpha |
| priority | Set priority to Low to use reduced cost surplus capacity, with the tradeoff that instances can be deallocated at any time | "Regular" | "Low" |
| clc_snippets | Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| snippets | Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| node_labels | List of initial node labels | [] | ["worker-pool=foo"] |
@ -145,7 +145,7 @@ Create a cluster following the Google Cloud [tutorial](../cl/google-cloud.md#clu
```tf
module "yavin-worker-pool" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes/workers?ref=v1.17.1"
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes/workers?ref=v1.18.1"
# Google Cloud
region = "europe-west2"
@ -176,11 +176,11 @@ Verify a managed instance group of workers joins the cluster within a few minute
```
$ kubectl get nodes
NAME STATUS AGE VERSION
yavin-controller-0.c.example-com.internal Ready 6m v1.17.1
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.17.1
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.17.1
yavin-16x-worker-jrbf.c.example-com.internal Ready 3m v1.17.1
yavin-16x-worker-mzdm.c.example-com.internal Ready 3m v1.17.1
yavin-controller-0.c.example-com.internal Ready 6m v1.18.1
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.18.1
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.18.1
yavin-16x-worker-jrbf.c.example-com.internal Ready 3m v1.18.1
yavin-16x-worker-mzdm.c.example-com.internal Ready 3m v1.18.1
```
### Variables
@ -209,7 +209,7 @@ Check the list of regions [docs](https://cloud.google.com/compute/docs/regions-z
| os_image | Container Linux image for compute instances | "coreos-stable" | "coreos-alpha", "coreos-beta" |
| disk_size | Size of the disk in GB | 40 | 100 |
| preemptible | If true, Compute Engine will terminate instances randomly within 24 hours | false | true |
| clc_snippets | Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| snippets | Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| service_cidr | Must match `service_cidr` of cluster | "10.3.0.0/16" | "10.3.0.0/24" |
| node_labels | List of initial node labels | [] | ["worker-pool=foo"] |

View File

@ -1,5 +1,15 @@
# Announce <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo-small.png">
## Jan 23, 2020
Typhoon for Fedora CoreOS promoted to alpha!
Last summer, Typhoon released the first preview of Kubernetes on Fedora CoreOS for bare-metal and AWS, developing many ideas and patterns from Typhoon for Container Linux and Fedora Atomic. Since then, Typhoon for Fedora CoreOS has evolved and gained features alongside Typhoon, while Fedora CoreOS itself has evolved and improved too.
Fedora recently [announced](https://fedoramagazine.org/fedora-coreos-out-of-preview/) that Fedora CoreOS is available for general use. To align with that change and to better indicate the maturing status, Typhoon for Fedora CoreOS has been promoted to alpha. Many thanks to folks who have worked to make this possbile!
About: For newcomers, Typhoon is a minimal and free (cost and freedom) Kubernetes distribution providing upstream Kubernetes, declarative configuration via Terraform, and support for AWS, Azure, Google Cloud, DigitalOcean, and bare-metal. It is run by former CoreOS engineer [@dghubble](https://twitter.com/dghubble) to power his clusters, with freedom [motivations](https://typhoon.psdn.io/#motivation).
## Jul 18, 2019
Introducing a preview of Typhoon Kubernetes clusters with Fedora CoreOS!
@ -8,8 +18,6 @@ Fedora recently [announced](https://lists.fedoraproject.org/archives/list/coreos
While Typhoon uses Container Linux (or Flatcar Linux) for stable modules, the project hasn't been a stranger to Fedora ideas, once developing a [Fedora Atomic](https://typhoon.psdn.io/announce/#april-26-2018) variant in 2018. That makes the Fedora CoreOS fushion both exciting and familiar. Typhoon with Fedora CoreOS uses Ignition v3 for provisioning, uses rpm-ostree for layering and updates, tries swapping system containers for podman, and brings SELinux enforcement ([table](https://typhoon.psdn.io/architecture/operating-systems/)). This is an early preview (don't go to prod), but do try it out and help identify and solve issues (getting started links above).
About: For newcomers, Typhoon is a minimal and free (cost and freedom) Kubernetes distribution providing upstream Kubernetes, declarative configuration via Terraform, and support for AWS, Azure, Google Cloud, DigitalOcean, and bare-metal. It is run by former CoreOS engineer [@dghubble](https://twitter.com/dghubble) to power his clusters with freedom [motivations](https://typhoon.psdn.io/#motivation).
## March 27, 2019
Last April, Typhoon [introduced](#april-26-2018) alpha support for creating Kubernetes clusters with Fedora Atomic on AWS, Google Cloud, DigitalOcean, and bare-metal. Fedora Atomic shared many of Container Linux's aims for a container-optimized operating system, introduced novel ideas, and provided technical diversification for an uncertain future. However, Project Atomic efforts were merged into Fedora CoreOS and future Fedora Atomic releases are [not expected](http://www.projectatomic.io/blog/2018/06/welcome-to-fedora-coreos/). *Typhoon modules for Fedora Atomic will not be updated much beyond Kubernetes v1.13*. They may later be removed.
@ -46,7 +54,7 @@ Typhoon for Fedora Atomic reflects many of the same principles that created Typh
Meanwhile, Fedora Atomic adds some promising new low-level technologies:
* [ostree](https://github.com/ostreedev/ostree) & [rpm-ostree](https://github.com/projectatomic/rpm-ostree) - a hybrid, layered, image and package system that lets you perform atomic updates and rollbacks, layer on packages, "rebase" your system, or manage a remote tree repo. See Dusty Mabe's great [intro](https://dustymabe.com/2017/09/01/atomic-host-101-lab-part-3-rebase-upgrade-rollback/).
* [ostree](https://github.com/ostreedev/ostree) & [rpm-ostree](https://github.com/projectatomic/rpm-ostree) - a hybrid, layered, image and package system that lets you perform atomic updates and rollbacks, layer on packages, "rebase" your system, or manage a remote tree repo. See Dusty Mabe's great [intro](https://dustymabe.com/2017/09/01/atomic-host-101-lab-part-3-rebase-upgrade-rollback/).
* [system containers](http://www.projectatomic.io/blog/2016/09/intro-to-system-containers/) - OCI container images that embed systemd and runc metadata for starting low-level host services before container runtimes are ready. Typhoon uses system containers under runc for `etcd`, `kubelet`, and `bootkube` on Fedora Atomic (instead of rkt-fly).

View File

@ -79,6 +79,23 @@ resource "aws_security_group_rule" "some-app" {
}
```
## Routes
Add a custom [route](https://www.terraform.io/docs/providers/aws/r/route.html) to the VPC route table.
```tf
data "aws_route_table" "default" {
vpc_id = module.temptest.vpc_id
subnet_id = module.tempest.subnet_ids[0]
}
resource "aws_route" "peering" {
route_table_id = data.aws_route_table.default.id
destination_cidr_block = "192.168.4.0/24"
...
}
```
## IPv6
AWS Network Load Balancers do not support `dualstack`.

View File

@ -1,6 +1,6 @@
# Operating Systems
Typhoon supports [Container Linux](https://coreos.com/why/), [Flatcar Linux](https://www.flatcar-linux.org/) and [Fedora CoreOS](https://getfedora.org/coreos/) (preview). These operating systems were chosen because they offer:
Typhoon supports [Fedora CoreOS](https://getfedora.org/coreos/), [Flatcar Linux](https://www.flatcar-linux.org/) and Container Linux (EOL in May 2020). These operating systems were chosen because they offer:
* Minimalism and focus on clustered operation
* Automated and atomic operating system upgrades
@ -9,16 +9,18 @@ Typhoon supports [Container Linux](https://coreos.com/why/), [Flatcar Linux](htt
Together, they diversify Typhoon to support a range of container technologies.
* Container Linux: Gentoo core, rkt-fly, docker
* Fedora CoreOS: rpm-ostree, podman, moby
* Flatcar Linux: Gentoo core, rkt-fly, docker
## Host Properties
| Property | Container Linux / Flatcar Linux | Fedora CoreOS |
| Property | Flatcar Linux | Fedora CoreOS |
|-------------------|---------------------------------|---------------|
| Kernel | ~4.19.x | ~5.5.x |
| systemd | 241 | 243 |
| Ignition system | Ignition v2.x spec | Ignition v3.x spec |
| Container Engine | docker | docker |
| storage driver | overlay2 | overlay2 |
| Container Engine | docker 18.06.3-ce | docker 18.09.8 |
| storage driver | overlay2 (extfs) | overlay2 (xfs) |
| logging driver | json-file | journald |
| cgroup driver | cgroupfs (except Flatcar edge) | systemd |
| Networking | systemd-networkd | NetworkManager |
@ -26,13 +28,13 @@ Together, they diversify Typhoon to support a range of container technologies.
## Kubernetes Properties
| Property | Container Linux | Fedora CoreOS |
| Property | Flatcar Linux | Fedora CoreOS |
|-------------------|-----------------|---------------|
| single-master | all platforms | all platforms |
| multi-master | all platforms | all platforms |
| control plane | static pods | static pods |
| kubelet image | upstream hyperkube | upstream hyperkube |
| control plane images | upstream hyperkube | upstream hyperkube |
| kubelet image | kubelet [image](https://github.com/poseidon/kubelet) with upstream binary | kubelet [image](https://github.com/poseidon/kubelet) with upstream binary |
| control plane images | upstream images | upstream images |
| on-host etcd | rkt-fly | podman |
| on-host kubelet | rkt-fly | podman |
| CNI plugins | calico or flannel | calico or flannel |

View File

@ -1,6 +1,6 @@
# AWS
In this tutorial, we'll create a Kubernetes v1.17.1 cluster on AWS with Container Linux.
In this tutorial, we'll create a Kubernetes v1.18.1 cluster on AWS with CoreOS Container Linux or Flatcar Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a VPC, gateway, subnets, security groups, controller instances, worker auto-scaling group, network load balancer, and TLS assets.
@ -18,15 +18,15 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your sy
```sh
$ terraform version
Terraform v0.12.16
Terraform v0.12.21
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.4.0/terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.4.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.4.0
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.5.0/terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.5.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.5.0
```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -49,13 +49,13 @@ Configure the AWS provider to use your access key credentials in a `providers.tf
```tf
provider "aws" {
version = "2.41.0"
version = "2.53.0"
region = "eu-central-1"
shared_credentials_file = "/home/user/.config/aws/credentials"
}
provider "ct" {
version = "0.4.0"
version = "0.5.0"
}
```
@ -70,7 +70,7 @@ Define a Kubernetes cluster using the module `aws/container-linux/kubernetes`.
```tf
module "tempest" {
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.17.1"
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.18.1"
# AWS
cluster_name = "tempest"
@ -143,36 +143,33 @@ List nodes in the cluster.
$ export KUBECONFIG=/home/user/.kube/configs/tempest-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-3-155 Ready <none> 10m v1.17.1
ip-10-0-26-65 Ready <none> 10m v1.17.1
ip-10-0-41-21 Ready <none> 10m v1.17.1
ip-10-0-3-155 Ready <none> 10m v1.18.1
ip-10-0-26-65 Ready <none> 10m v1.18.1
ip-10-0-41-21 Ready <none> 10m v1.18.1
```
List the pods.
```
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-1m5bf 2/2 Running 0 34m
kube-system calico-node-7jmr1 2/2 Running 0 34m
kube-system calico-node-bknc8 2/2 Running 0 34m
kube-system coredns-1187388186-wx1lg 1/1 Running 0 34m
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-1m5bf 2/2 Running 0 34m
kube-system calico-node-7jmr1 2/2 Running 0 34m
kube-system calico-node-bknc8 2/2 Running 0 34m
kube-system coredns-1187388186-wx1lg 1/1 Running 0 34m
kube-system coredns-1187388186-qjnvp 1/1 Running 0 34m
kube-system kube-apiserver-ip-10-0-3-155 1/1 Running 0 34m
kube-system kube-controller-manager-ip-10-0-3-155 1/1 Running 0 34m
kube-system kube-proxy-14wxv 1/1 Running 0 34m
kube-system kube-proxy-9vxh2 1/1 Running 0 34m
kube-system kube-proxy-sbbsh 1/1 Running 0 34m
kube-system kube-scheduler-ip-10-0-3-155 1/1 Running 1 34m
kube-system kube-apiserver-ip-10-0-3-155 1/1 Running 0 34m
kube-system kube-controller-manager-ip-10-0-3-155 1/1 Running 0 34m
kube-system kube-proxy-14wxv 1/1 Running 0 34m
kube-system kube-proxy-9vxh2 1/1 Running 0 34m
kube-system kube-proxy-sbbsh 1/1 Running 0 34m
kube-system kube-scheduler-ip-10-0-3-155 1/1 Running 1 34m
```
## Going Further
Learn about [maintenance](/topics/maintenance/) and [addons](/addons/overview/).
!!! note
On Container Linux clusters, install the `CLUO` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
## Variables
Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/aws/container-linux/kubernetes/variables.tf) source.
@ -207,7 +204,6 @@ Reference the DNS zone id with `aws_route53_zone.zone-for-clusters.zone_id`.
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/tempest" |
| controller_count | Number of controllers (i.e. masters) | 1 | 1 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | EC2 instance type for controllers | "t3.small" | See below |
@ -218,8 +214,8 @@ Reference the DNS zone id with `aws_route53_zone.zone-for-clusters.zone_id`.
| disk_iops | IOPS of the EBS volume | 0 (i.e. auto) | 400 |
| worker_target_groups | Target group ARNs to which worker instances should be added | [] | [aws_lb_target_group.app.id] |
| worker_price | Spot price in USD for worker instances or 0 to use on-demand instances | 0/null | 0.10 |
| controller_clc_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/) |
| worker_clc_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/) |
| controller_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/) |
| worker_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/) |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
| network_mtu | CNI interface MTU (calico only) | 1480 | 8981 |
| host_cidr | CIDR IPv4 range to assign to EC2 instances | "10.0.0.0/16" | "10.1.0.0/16" |

View File

@ -3,7 +3,7 @@
!!! danger
Typhoon for Azure is alpha. For production, use AWS, Google Cloud, or bare-metal. As Azure matures, check [errata](https://github.com/poseidon/typhoon/wiki/Errata) for known shortcomings.
In this tutorial, we'll create a Kubernetes v1.17.1 cluster on Azure with Container Linux.
In this tutorial, we'll create a Kubernetes v1.18.1 cluster on Azure with CoreOS Container Linux or Flatcar Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a resource group, virtual network, subnets, security groups, controller availability set, worker scale set, load balancer, and TLS assets.
@ -21,15 +21,15 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your sy
```sh
$ terraform version
Terraform v0.12.16
Terraform v0.12.21
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.4.0/terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.4.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.4.0
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.5.0/terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.5.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.5.0
```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -50,11 +50,11 @@ Configure the Azure provider in a `providers.tf` file.
```tf
provider "azurerm" {
version = "1.38.0"
version = "2.1.0"
}
provider "ct" {
version = "0.4.0"
version = "0.5.0"
}
```
@ -66,7 +66,7 @@ Define a Kubernetes cluster using the module `azure/container-linux/kubernetes`.
```tf
module "ramius" {
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes?ref=v1.17.1"
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes?ref=v1.18.1"
# Azure
cluster_name = "ramius"
@ -85,6 +85,15 @@ module "ramius" {
Reference the [variables docs](#variables) or the [variables.tf](https://github.com/poseidon/typhoon/blob/master/azure/container-linux/kubernetes/variables.tf) source.
### Flatcar Linux Only
Flatcar Linux publishes images to the Azure Marketplace and requires accepting their legal terms.
```
az vm image terms show --publish kinvolk --offer flatcar-container-linux --plan stable
az vm image terms accept --publish kinvolk --offer flatcar-container-linux --plan stable
```
## ssh-agent
Initial bootstrapping requires `bootstrap.service` be started on one controller node. Terraform uses `ssh-agent` to automate this step. Add your SSH private key to `ssh-agent`.
@ -140,9 +149,9 @@ List nodes in the cluster.
$ export KUBECONFIG=/home/user/.kube/configs/ramius-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ramius-controller-0 Ready <none> 24m v1.17.1
ramius-worker-000001 Ready <none> 25m v1.17.1
ramius-worker-000002 Ready <none> 24m v1.17.1
ramius-controller-0 Ready <none> 24m v1.18.1
ramius-worker-000001 Ready <none> 25m v1.18.1
ramius-worker-000002 Ready <none> 24m v1.18.1
```
List the pods.
@ -152,9 +161,9 @@ $ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7c6fbb4f4b-b6qzx 1/1 Running 0 26m
kube-system coredns-7c6fbb4f4b-j2k3d 1/1 Running 0 26m
kube-system calico-node-1m5bf 2/2 Running 0 26m
kube-system calico-node-7jmr1 2/2 Running 0 26m
kube-system calico-node-bknc8 2/2 Running 0 26m
kube-system calico-node-1m5bf 2/2 Running 0 26m
kube-system calico-node-7jmr1 2/2 Running 0 26m
kube-system calico-node-bknc8 2/2 Running 0 26m
kube-system kube-apiserver-ramius-controller-0 1/1 Running 0 26m
kube-system kube-controller-manager-ramius-controller-0 1/1 Running 0 26m
kube-system kube-proxy-j4vpq 1/1 Running 0 26m
@ -167,9 +176,6 @@ kube-system kube-scheduler-ramius-controller-0 1/1 Running 0
Learn about [maintenance](/topics/maintenance/) and [addons](/addons/overview/).
!!! note
On Container Linux clusters, install the `CLUO` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
## Variables
Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/azure/container-linux/kubernetes/variables.tf) source.
@ -218,16 +224,15 @@ Reference the DNS zone with `azurerm_dns_zone.clusters.name` and its resource gr
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/ramius" |
| controller_count | Number of controllers (i.e. masters) | 1 | 1 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | Machine type for controllers | "Standard_B2s" | See below |
| worker_type | Machine type for workers | "Standard_DS1_v2" | See below |
| os_image | Channel for a Container Linux derivative | "coreos-stable" | coreos-stable, coreos-beta, coreos-alpha |
| os_image | Channel for a Container Linux derivative | "coreos-stable" | coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta |
| disk_size | Size of the disk in GB | 40 | 100 |
| worker_priority | Set priority to Low to use reduced cost surplus capacity, with the tradeoff that instances can be deallocated at any time | Regular | Low |
| controller_clc_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| worker_clc_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| worker_priority | Set priority to Spot to use reduced cost surplus capacity, with the tradeoff that instances can be deallocated at any time | Regular | Spot |
| controller_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| worker_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/#usage) |
| networking | Choice of networking provider | "calico" | "flannel" or "calico" |
| host_cidr | CIDR IPv4 range to assign to instances | "10.0.0.0/16" | "10.0.0.0/20" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
@ -242,6 +247,6 @@ Check the list of valid [machine types](https://azure.microsoft.com/en-us/pricin
!!! warning
Do not choose a `controller_type` smaller than `Standard_B2s`. Smaller instances are not sufficient for running a controller.
#### Low Priority
#### Spot Priority
Add `worker_priority=Low` to use [Low Priority](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-use-low-priority) workers that run on Azure's surplus capacity at lower cost, but with the tradeoff that they can be deallocated at random. Low priority VMs are Azure's analog to AWS spot instances or GCP premptible instances.
Add `worker_priority=Spot` to use [Spot Priority](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/spot-vms) workers that run on Azure's surplus capacity at lower cost, but with the tradeoff that they can be deallocated at random. Spot priority VMs are Azure's analog to AWS spot instances or GCP premptible instances.

View File

@ -1,6 +1,6 @@
# Bare-Metal
In this tutorial, we'll network boot and provision a Kubernetes v1.17.1 cluster on bare-metal with Container Linux.
In this tutorial, we'll network boot and provision a Kubernetes v1.18.1 cluster on bare-metal with CoreOS Container Linux or Flatcar Linux.
First, we'll deploy a [Matchbox](https://github.com/poseidon/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Container Linux to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers via Ignition.
@ -27,7 +27,7 @@ Configure each machine to boot from the disk through IPMI or the BIOS menu.
```
ipmitool -H node1 -U USER -P PASS chassis bootdev disk options=persistent
```
During provisioning, you'll explicitly set the boot device to `pxe` for the next boot only. Machines will install (overwrite) the operating system to disk on PXE boot and reboot into the disk install.
!!! tip ""
@ -111,7 +111,7 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your sy
```sh
$ terraform version
Terraform v0.12.16
Terraform v0.12.21
```
Add the [terraform-provider-matchbox](https://github.com/poseidon/terraform-provider-matchbox) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
@ -125,9 +125,9 @@ mv terraform-provider-matchbox-v0.3.0-linux-amd64/terraform-provider-matchbox ~/
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.4.0/terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.4.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.4.0
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.5.0/terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.5.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.5.0
```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -150,7 +150,7 @@ provider "matchbox" {
}
provider "ct" {
version = "0.4.0"
version = "0.5.0"
}
```
@ -160,8 +160,8 @@ Define a Kubernetes cluster using the module `bare-metal/container-linux/kuberne
```tf
module "mercury" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes?ref=v1.17.1"
source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes?ref=v1.18.1"
# bare-metal
cluster_name = "mercury"
matchbox_http_endpoint = "http://matchbox.example.com"
@ -299,9 +299,9 @@ List nodes in the cluster.
$ export KUBECONFIG=/home/user/.kube/configs/mercury-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1.example.com Ready <none> 10m v1.17.1
node2.example.com Ready <none> 10m v1.17.1
node3.example.com Ready <none> 10m v1.17.1
node1.example.com Ready <none> 10m v1.18.1
node2.example.com Ready <none> 10m v1.18.1
node3.example.com Ready <none> 10m v1.18.1
```
List the pods.
@ -326,9 +326,6 @@ kube-system kube-scheduler-node1.example.com 1/1 Running 0
Learn about [maintenance](/topics/maintenance/) and [addons](/addons/overview/).
!!! note
On Container Linux clusters, install the `CLUO` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
## Variables
Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/bare-metal/container-linux/kubernetes/variables.tf) source.
@ -350,15 +347,16 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/bare-me
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/mercury" |
| download_protocol | Protocol iPXE uses to download the kernel and initrd. iPXE must be compiled with [crypto](https://ipxe.org/crypto) support for https. Unused if cached_install is true | "https" | "http" |
| cached_install | PXE boot and install from the Matchbox `/assets` cache. Admin MUST have downloaded Container Linux or Flatcar images into the cache | false | true |
| install_disk | Disk device where Container Linux should be installed | "/dev/sda" | "/dev/sdb" |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
| network_mtu | CNI interface MTU (calico-only) | 1480 | - |
| clc_snippets | Map from machine names to lists of Container Linux Config snippets | {} | [example](/advanced/customization/#usage) |
| network_mtu | CNI interface MTU (calico-only) | 1480 | - |
| snippets | Map from machine names to lists of Container Linux Config snippets | {} | [examples](/advanced/customization/) |
| network_ip_autodetection_method | Method to detect host IPv4 address (calico-only) | "first-found" | "can-reach=10.0.0.1" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| kernel_args | Additional kernel args to provide at PXE boot | [] | ["kvm-intel.nested=1"] |
| worker_node_labels | Map from worker name to list of initial node labels | {} | {"node2" = ["role=special"]} |
| worker_node_taints | Map from worker name to list of initial node taints | {} | {"node2" = ["role=special:NoSchedule"]} |

View File

@ -1,6 +1,6 @@
# Digital Ocean
In this tutorial, we'll create a Kubernetes v1.17.1 cluster on DigitalOcean with Container Linux.
In this tutorial, we'll create a Kubernetes v1.18.1 cluster on DigitalOcean with CoreOS Container Linux or Flatcar Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create controller droplets, worker droplets, DNS records, tags, and TLS assets.
@ -18,15 +18,15 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your sy
```sh
$ terraform version
Terraform v0.12.16
Terraform v0.12.21
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.4.0/terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.4.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.4.0
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.5.0/terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.5.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.5.0
```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -50,12 +50,12 @@ Configure the DigitalOcean provider to use your token in a `providers.tf` file.
```tf
provider "digitalocean" {
version = "1.11.0"
version = "1.15.1"
token = "${chomp(file("~/.config/digital-ocean/token"))}"
}
provider "ct" {
version = "0.4.0"
version = "0.5.0"
}
```
@ -65,16 +65,17 @@ Define a Kubernetes cluster using the module `digital-ocean/container-linux/kube
```tf
module "nemo" {
source = "git::https://github.com/poseidon/typhoon//digital-ocean/container-linux/kubernetes?ref=v1.17.1"
source = "git::https://github.com/poseidon/typhoon//digital-ocean/container-linux/kubernetes?ref=v1.18.1"
# Digital Ocean
cluster_name = "nemo"
region = "nyc3"
dns_zone = "digital-ocean.example.com"
os_image = "coreos-stable"
# configuration
ssh_fingerprints = ["d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7"]
# optional
worker_count = 2
}
@ -82,6 +83,28 @@ module "nemo" {
Reference the [variables docs](#variables) or the [variables.tf](https://github.com/poseidon/typhoon/blob/master/digital-ocean/container-linux/kubernetes/variables.tf) source.
### Flatcar Linux Only
!!! warning
Typhoon for Flatcar Linux on DigitalOcean is alpha. Also IPv6 is unsupported with DigitalOcean custom images.
Flatcar Linux publishes DigitalOcean images, but does not upload them. DigitalOcean allows [custom boot images](https://blog.digitalocean.com/custom-images/) by file or URL.
[Download](https://www.flatcar-linux.org/releases/) the Flatcar Linux DigitalOcean bin image (or copy the URL) and [upload](https://cloud.digitalocean.com/images/custom_images) it as a custom image. Rename the image with the channel and version to refer to these images over time.
```tf
module "nemo" {
...
os_image = data.digitalocean_image.flatcar-stable.id
}
data "digitalocean_image" "flatcar-stable" {
name = "flatcar-stable-2303.4.0.bin.bz2"
}
```
Set the [os_image](#variables) to the custom image id.
## ssh-agent
Initial bootstrapping requires `bootstrap.service` be started on one controller node. Terraform uses `ssh-agent` to automate this step. Add your SSH private key to `ssh-agent`.
@ -138,9 +161,9 @@ List nodes in the cluster.
$ export KUBECONFIG=/home/user/.kube/configs/nemo-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
10.132.110.130 Ready <none> 10m v1.17.1
10.132.115.81 Ready <none> 10m v1.17.1
10.132.124.107 Ready <none> 10m v1.17.1
10.132.110.130 Ready <none> 10m v1.18.1
10.132.115.81 Ready <none> 10m v1.18.1
10.132.124.107 Ready <none> 10m v1.18.1
```
List the pods.
@ -149,9 +172,9 @@ List the pods.
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-1187388186-ld1j7 1/1 Running 0 11m
kube-system coredns-1187388186-rdhf7 1/1 Running 0 11m
kube-system calico-node-1m5bf 2/2 Running 0 11m
kube-system calico-node-7jmr1 2/2 Running 0 11m
kube-system calico-node-bknc8 2/2 Running 0 11m
kube-system calico-node-1m5bf 2/2 Running 0 11m
kube-system calico-node-7jmr1 2/2 Running 0 11m
kube-system calico-node-bknc8 2/2 Running 0 11m
kube-system kube-apiserver-ip-10.132.115.81 1/1 Running 0 11m
kube-system kube-controller-manager-ip-10.132.115.81 1/1 Running 0 11m
kube-system kube-proxy-6kxjf 1/1 Running 0 11m
@ -164,9 +187,6 @@ kube-system kube-scheduler-ip-10.132.115.81 1/1 Running 0
Learn about [maintenance](/topics/maintenance/) and [addons](/addons/overview/).
!!! note
On Container Linux clusters, install the `CLUO` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
## Variables
Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/digital-ocean/container-linux/kubernetes/variables.tf) source.
@ -184,7 +204,7 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/digital
Clusters create DNS A records `${cluster_name}.${dns_zone}` to resolve to controller droplets (round robin). This FQDN is used by workers and `kubectl` to access the apiserver(s). In this example, the cluster's apiserver would be accessible at `nemo.do.example.com`.
You'll need a registered domain name or delegated subdomain in Digital Ocean Domains (i.e. DNS zones). You can set this up once and create many clusters with unique names.
You'll need a registered domain name or delegated subdomain in DigitalOcean Domains (i.e. DNS zones). You can set this up once and create many clusters with unique names.
```tf
# Declare a DigitalOcean record to also create a zone file
@ -219,14 +239,13 @@ Digital Ocean requires the SSH public key be uploaded to your account, so you ma
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/nemo" |
| controller_count | Number of controllers (i.e. masters) | 1 | 1 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | Droplet type for controllers | "s-2vcpu-2gb" | s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb, ... |
| worker_type | Droplet type for workers | "s-1vcpu-2gb" | s-1vcpu-2gb, s-2vcpu-2gb, ... |
| image | Container Linux image for instances | "coreos-stable" | coreos-stable, coreos-beta, coreos-alpha |
| controller_clc_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/) |
| worker_clc_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/) |
| os_image | Container Linux image for instances | "coreos-stable" | coreos-stable, coreos-beta, coreos-alpha, "custom-image-id" |
| controller_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/) |
| worker_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/) |
| networking | Choice of networking provider | "calico" | "flannel" or "calico" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |

View File

@ -1,6 +1,6 @@
# Google Cloud
In this tutorial, we'll create a Kubernetes v1.17.1 cluster on Google Compute Engine with Container Linux.
In this tutorial, we'll create a Kubernetes v1.18.1 cluster on Google Compute Engine with CoreOS Container Linux or Flatcar Linux.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a network, firewall rules, health checks, controller instances, worker managed instance group, load balancers, and TLS assets.
@ -18,15 +18,15 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your sy
```sh
$ terraform version
Terraform v0.12.16
Terraform v0.12.21
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.4.0/terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.4.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.4.0
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.5.0/terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.5.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.5.0
```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -49,14 +49,14 @@ Configure the Google Cloud provider to use your service account key, project-id,
```tf
provider "google" {
version = "3.4.0"
version = "3.12.0"
project = "project-id"
region = "us-central1"
credentials = file("~/.config/google-cloud/terraform.json")
}
provider "ct" {
version = "0.4.0"
version = "0.5.0"
}
```
@ -71,7 +71,7 @@ Define a Kubernetes cluster using the module `google-cloud/container-linux/kuber
```tf
module "yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.17.1"
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.18.1"
# Google Cloud
cluster_name = "yavin"
@ -81,7 +81,7 @@ module "yavin" {
# configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
# optional
worker_count = 2
}
@ -89,6 +89,28 @@ module "yavin" {
Reference the [variables docs](#variables) or the [variables.tf](https://github.com/poseidon/typhoon/blob/master/google-cloud/container-linux/kubernetes/variables.tf) source.
### Flatcar Linux Only
!!! warning
Typhoon for Flatcar Linux on Google Cloud is alpha.
Flatcar Linux publishes Google Cloud images, but does not upload them. Google Cloud allows [custom boot images](https://cloud.google.com/compute/docs/images/import-existing-image) to be uploaded to a bucket and imported into a project.
[Download](https://www.flatcar-linux.org/releases/) the Flatcar Linux GCE gzipped tarball and upload it to a Google Cloud storage bucket.
```
gsutil list
gsutil cp flatcar_production_gce.tar.gz gs://BUCKET
```
Create a Compute Engine image from the file.
```
gcloud compute images create flatcar-linux-2303-4-0 --source-uri gs://BUCKET_NAME/flatcar_production_gce.tar.gz
```
Set the [os_image](#variables) to the image name (e.g. `flatcar-linux-2303-4-0`)
## ssh-agent
Initial bootstrapping requires `bootstrap.service` be started on one controller node. Terraform uses `ssh-agent` to automate this step. Add your SSH private key to `ssh-agent`.
@ -145,9 +167,9 @@ List nodes in the cluster.
$ export KUBECONFIG=/home/user/.kube/configs/yavin-config
$ kubectl get nodes
NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.17.1
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.17.1
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.17.1
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.18.1
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.18.1
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.18.1
```
List the pods.
@ -172,9 +194,6 @@ kube-system kube-scheduler-controller-0 1/1 Running 0
Learn about [maintenance](/topics/maintenance/) and [addons](/addons/overview/).
!!! note
On Container Linux clusters, install the `CLUO` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
## Variables
Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/google-cloud/container-linux/kubernetes/variables.tf) source.
@ -212,16 +231,15 @@ resource "google_dns_managed_zone" "zone-for-clusters" {
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/yavin" |
| controller_count | Number of controllers (i.e. masters) | 1 | 3 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | Machine type for controllers | "n1-standard-1" | See below |
| worker_type | Machine type for workers | "n1-standard-1" | See below |
| os_image | Container Linux image for compute instances | "coreos-stable" | "coreos-stable-1632-3-0-v20180215" |
| os_image | Container Linux image for compute instances | "coreos-stable" | "flatcar-linux-2303-4-0" |
| disk_size | Size of the disk in GB | 40 | 100 |
| worker_preemptible | If enabled, Compute Engine will terminate workers randomly within 24 hours | false | true |
| controller_clc_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/) |
| worker_clc_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/) |
| controller_snippets | Controller Container Linux Config snippets | [] | [example](/advanced/customization/) |
| worker_snippets | Worker Container Linux Config snippets | [] | [example](/advanced/customization/) |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |

View File

@ -1,9 +1,6 @@
# AWS
!!! danger
Typhoon for Fedora CoreOS is an early preview! Fedora CoreOS itself is a preview! Expect bugs and design shifts. Please help both projects solve problems. Report Fedora CoreOS bugs to [Fedora](https://github.com/coreos/fedora-coreos-tracker/issues). Report Typhoon issues to Typhoon.
In this tutorial, we'll create a Kubernetes v1.17.1 cluster on AWS with Fedora CoreOS.
In this tutorial, we'll create a Kubernetes v1.18.1 cluster on AWS with Fedora CoreOS.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a VPC, gateway, subnets, security groups, controller instances, worker auto-scaling group, network load balancer, and TLS assets.
@ -21,15 +18,15 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your sy
```sh
$ terraform version
Terraform v0.12.16
Terraform v0.12.21
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.4.0/terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.4.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.4.0
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.5.0/terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.5.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.5.0
```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -52,13 +49,13 @@ Configure the AWS provider to use your access key credentials in a `providers.tf
```tf
provider "aws" {
version = "2.41.0"
version = "2.53.0"
region = "eu-central-1"
shared_credentials_file = "/home/user/.config/aws/credentials"
}
provider "ct" {
version = "0.4.0"
version = "0.5.0"
}
```
@ -73,7 +70,7 @@ Define a Kubernetes cluster using the module `aws/fedora-coreos/kubernetes`.
```tf
module "tempest" {
source = "git::https://github.com/poseidon/typhoon//aws/fedora-coreos/kubernetes?ref=v1.17.1"
source = "git::https://github.com/poseidon/typhoon//aws/fedora-coreos/kubernetes?ref=v1.18.1"
# AWS
cluster_name = "tempest"
@ -146,9 +143,9 @@ List nodes in the cluster.
$ export KUBECONFIG=/home/user/.kube/configs/tempest-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-3-155 Ready <none> 10m v1.17.1
ip-10-0-26-65 Ready <none> 10m v1.17.1
ip-10-0-41-21 Ready <none> 10m v1.17.1
ip-10-0-3-155 Ready <none> 10m v1.18.1
ip-10-0-26-65 Ready <none> 10m v1.18.1
ip-10-0-41-21 Ready <none> 10m v1.18.1
```
List the pods.
@ -207,7 +204,6 @@ Reference the DNS zone id with `aws_route53_zone.zone-for-clusters.zone_id`.
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/tempest" |
| controller_count | Number of controllers (i.e. masters) | 1 | 1 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | EC2 instance type for controllers | "t3.small" | See below |
@ -218,8 +214,8 @@ Reference the DNS zone id with `aws_route53_zone.zone-for-clusters.zone_id`.
| disk_iops | IOPS of the EBS volume | 0 (i.e. auto) | 400 |
| worker_target_groups | Target group ARNs to which worker instances should be added | [] | [aws_lb_target_group.app.id] |
| worker_price | Spot price in USD for worker instances or 0 to use on-demand instances | 0 | 0.10 |
| controller_snippets | Controller Fedora CoreOS Config snippets | [] | UNSUPPORTED |
| worker_clc_snippets | Worker Fedora CoreOS Config snippets | [] | UNSUPPORTED |
| controller_snippets | Controller Fedora CoreOS Config snippets | [] | [examples](/advanced/customization/) |
| worker_snippets | Worker Fedora CoreOS Config snippets | [] | [examples](/advanced/customization/) |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
| network_mtu | CNI interface MTU (calico only) | 1480 | 8981 |
| host_cidr | CIDR IPv4 range to assign to EC2 instances | "10.0.0.0/16" | "10.1.0.0/16" |

View File

@ -1,9 +1,6 @@
# Bare-Metal
!!! danger
Typhoon for Fedora CoreOS is an early preview! Fedora CoreOS itself is a preview! Expect bugs and design shifts. Please help both projects solve problems. Report Fedora CoreOS bugs to [Fedora](https://github.com/coreos/fedora-coreos-tracker/issues). Report Typhoon issues to Typhoon.
In this tutorial, we'll network boot and provision a Kubernetes v1.17.1 cluster on bare-metal with Fedora CoreOS.
In this tutorial, we'll network boot and provision a Kubernetes v1.18.1 cluster on bare-metal with Fedora CoreOS.
First, we'll deploy a [Matchbox](https://github.com/poseidon/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Fedora CoreOS to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers via Ignition.
@ -30,7 +27,7 @@ Configure each machine to boot from the disk through IPMI or the BIOS menu.
```
ipmitool -H node1 -U USER -P PASS chassis bootdev disk options=persistent
```
During provisioning, you'll explicitly set the boot device to `pxe` for the next boot only. Machines will install (overwrite) the operating system to disk on PXE boot and reboot into the disk install.
!!! tip ""
@ -106,7 +103,7 @@ Read about the [many ways](https://coreos.com/matchbox/docs/latest/network-setup
TFTP chainloading to modern boot firmware, like iPXE, avoids issues with old NICs and allows faster transfer protocols like HTTP to be used.
!!! warning
Compile iPXE from [source](https://github.com/ipxe/ipxe) with support for [HTTPS downloads](https://ipxe.org/crypto). iPXE's pre-built firmware binaries do not enable this. Fedora does not provide images over HTTP.
Compile iPXE from [source](https://github.com/ipxe/ipxe) with support for [HTTPS downloads](https://ipxe.org/crypto). iPXE's pre-built firmware binaries do not enable this. Fedora CoreOS downloads are HTTPS-only.
## Terraform Setup
@ -114,7 +111,7 @@ Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your sy
```sh
$ terraform version
Terraform v0.12.16
Terraform v0.12.21
```
Add the [terraform-provider-matchbox](https://github.com/poseidon/terraform-provider-matchbox) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
@ -128,9 +125,9 @@ mv terraform-provider-matchbox-v0.3.0-linux-amd64/terraform-provider-matchbox ~/
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.4.0/terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.4.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.4.0
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.5.0/terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.5.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.5.0
```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
@ -153,7 +150,7 @@ provider "matchbox" {
}
provider "ct" {
version = "0.4.0"
version = "0.5.0"
}
```
@ -163,14 +160,13 @@ Define a Kubernetes cluster using the module `bare-metal/fedora-coreos/kubernete
```tf
module "mercury" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/fedora-coreos/kubernetes?ref=v1.17.1"
source = "git::https://github.com/poseidon/typhoon//bare-metal/fedora-coreos/kubernetes?ref=v1.18.1"
# bare-metal
cluster_name = "mercury"
matchbox_http_endpoint = "http://matchbox.example.com"
os_stream = "testing"
os_version = "30.20191002.0"
cached_install = true
os_stream = "stable"
os_version = "31.20200113.3.1"
# configuration
k8s_domain_name = "node1.example.com"
@ -293,9 +289,9 @@ List nodes in the cluster.
$ export KUBECONFIG=/home/user/.kube/configs/mercury-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1.example.com Ready <none> 10m v1.17.1
node2.example.com Ready <none> 10m v1.17.1
node3.example.com Ready <none> 10m v1.17.1
node1.example.com Ready <none> 10m v1.18.1
node2.example.com Ready <none> 10m v1.18.1
node3.example.com Ready <none> 10m v1.18.1
```
List the pods.
@ -330,8 +326,8 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/bare-me
|:-----|:------------|:--------|
| cluster_name | Unique cluster name | "mercury" |
| matchbox_http_endpoint | Matchbox HTTP read-only endpoint | "http://matchbox.example.com:port" |
| os_stream | Fedora CoreOS release stream | "testing" |
| os_version | Fedora CoreOS version to PXE and install | "30.20190716.1" |
| os_stream | Fedora CoreOS release stream | "stable" |
| os_version | Fedora CoreOS version to PXE and install | "31.20200113.3.1" |
| k8s_domain_name | FQDN resolving to the controller(s) nodes. Workers and kubectl will communicate with this endpoint | "myk8s.example.com" |
| ssh_authorized_key | SSH public key for user 'core' | "ssh-rsa AAAAB3Nz..." |
| controllers | List of controller machine detail objects (unique name, identifying MAC address, FQDN) | `[{name="node1", mac="52:54:00:a1:9c:ae", domain="node1.example.com"}]` |
@ -341,14 +337,15 @@ Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/bare-me
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/mercury" |
| cached_install | PXE boot and install from the Matchbox `/assets` cache. Admin MUST have downloaded Fedora CoreOS images into the cache | false | true |
| install_disk | Disk device where Fedora CoreOS should be installed | "sda" (not "/dev/sda" like Container Linux) | "sdb" |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
| network_mtu | CNI interface MTU (calico-only) | 1480 | - |
| snippets | Map from machine names to lists of Fedora CoreOS Config snippets | {} | UNSUPPORTED |
| network_mtu | CNI interface MTU (calico-only) | 1480 | - |
| snippets | Map from machine names to lists of Fedora CoreOS Config snippets | {} | [examples](/advanced/customization/) |
| network_ip_autodetection_method | Method to detect host IPv4 address (calico-only) | "first-found" | "can-reach=10.0.0.1" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| kernel_args | Additional kernel args to provide at PXE boot | [] | ["kvm-intel.nested=1"] |
| worker_node_labels | Map from worker name to list of initial node labels | {} | {"node2" = ["role=special"]} |
| worker_node_taints | Map from worker name to list of initial node taints | {} | {"node2" = ["role=special:NoSchedule"]} |

View File

@ -0,0 +1,247 @@
# Digital Ocean
In this tutorial, we'll create a Kubernetes v1.18.1 cluster on DigitalOcean with Fedora CoreOS.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create controller droplets, worker droplets, DNS records, tags, and TLS assets.
Controller hosts are provisioned to run an `etcd-member` peer and a `kubelet` service. Worker hosts run a `kubelet` service. Controller nodes run `kube-apiserver`, `kube-scheduler`, `kube-controller-manager`, and `coredns`, while `kube-proxy` and `calico` (or `flannel`) run on every node. A generated `kubeconfig` provides `kubectl` access to the cluster.
## Requirements
* Digital Ocean Account and Token
* Digital Ocean Domain (registered Domain Name or delegated subdomain)
* Terraform v0.12.6+ and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
## Terraform Setup
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your system.
```sh
$ terraform version
Terraform v0.12.21
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.5.0/terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.5.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.5.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.5.0
```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
```
cd infra/clusters
```
## Provider
Login to [DigitalOcean](https://cloud.digitalocean.com) or create an [account](https://cloud.digitalocean.com/registrations/new), if you don't have one.
Generate a Personal Access Token with read/write scope from the [API tab](https://cloud.digitalocean.com/settings/api/tokens). Write the token to a file that can be referenced in configs.
```sh
mkdir -p ~/.config/digital-ocean
echo "TOKEN" > ~/.config/digital-ocean/token
```
Configure the DigitalOcean provider to use your token in a `providers.tf` file.
```tf
provider "digitalocean" {
version = "1.15.1"
token = "${chomp(file("~/.config/digital-ocean/token"))}"
}
provider "ct" {
version = "0.5.0"
}
```
## Fedora CoreOS Images
Fedora CoreOS publishes images for DigitalOcean, but does not yet upload them. DigitalOcean allows [custom images](https://blog.digitalocean.com/custom-images/) to be uploaded via URL or file.
Import a [Fedora CoreOS](https://getfedora.org/en/coreos/download?tab=cloud_operators&stream=stable) image via URL to desired a region(s). Reference the DigitalOcean image and set the `os_image` in the next step.
```tf
data "digitalocean_image" "fedora-coreos-31-20200323-3-2" {
name = "fedora-coreos-31.20200323.3.2-digitalocean.x86_64.qcow2.gz"
}
```
## Cluster
Define a Kubernetes cluster using the module `digital-ocean/fedora-coreos/kubernetes`.
```tf
module "nemo" {
source = "git::https://github.com/poseidon/typhoon//digital-ocean/fedora-coreos/kubernetes?ref=v1.18.1"
# Digital Ocean
cluster_name = "nemo"
region = "nyc3"
dns_zone = "digital-ocean.example.com"
os_image = data.digitalocean_image.fedora-coreos-31-20200323-3-2.id
# configuration
ssh_fingerprints = ["d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7"]
# optional
worker_count = 2
}
```
Reference the [variables docs](#variables) or the [variables.tf](https://github.com/poseidon/typhoon/blob/master/digital-ocean/fedora-coreos/kubernetes/variables.tf) source.
## ssh-agent
Initial bootstrapping requires `bootstrap.service` be started on one controller node. Terraform uses `ssh-agent` to automate this step. Add your SSH private key to `ssh-agent`.
```sh
ssh-add ~/.ssh/id_rsa
ssh-add -L
```
## Apply
Initialize the config directory if this is the first use with Terraform.
```sh
terraform init
```
Plan the resources to be created.
```sh
$ terraform plan
Plan: 54 to add, 0 to change, 0 to destroy.
```
Apply the changes to create the cluster.
```sh
$ terraform apply
module.nemo.null_resource.bootstrap: Still creating... (30s elapsed)
module.nemo.null_resource.bootstrap: Provisioning with 'remote-exec'...
...
module.nemo.null_resource.bootstrap: Still creating... (6m20s elapsed)
module.nemo.null_resource.bootstrap: Creation complete (ID: 7599298447329218468)
Apply complete! Resources: 42 added, 0 changed, 0 destroyed.
```
In 3-6 minutes, the Kubernetes cluster will be ready.
## Verify
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Obtain the generated cluster `kubeconfig` from module outputs (e.g. write to a local file).
```
resource "local_file" "kubeconfig-nemo" {
content = module.nemo.kubeconfig-admin
filename = "/home/user/.kube/configs/nemo-config"
}
```
List nodes in the cluster.
```
$ export KUBECONFIG=/home/user/.kube/configs/nemo-config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
10.132.110.130 Ready <none> 10m v1.18.1
10.132.115.81 Ready <none> 10m v1.18.1
10.132.124.107 Ready <none> 10m v1.18.1
```
List the pods.
```
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-1187388186-ld1j7 1/1 Running 0 11m
kube-system coredns-1187388186-rdhf7 1/1 Running 0 11m
kube-system calico-node-1m5bf 2/2 Running 0 11m
kube-system calico-node-7jmr1 2/2 Running 0 11m
kube-system calico-node-bknc8 2/2 Running 0 11m
kube-system kube-apiserver-ip-10.132.115.81 1/1 Running 0 11m
kube-system kube-controller-manager-ip-10.132.115.81 1/1 Running 0 11m
kube-system kube-proxy-6kxjf 1/1 Running 0 11m
kube-system kube-proxy-fh3td 1/1 Running 0 11m
kube-system kube-proxy-k35rc 1/1 Running 0 11m
kube-system kube-scheduler-ip-10.132.115.81 1/1 Running 0 11m
```
## Going Further
Learn about [maintenance](/topics/maintenance/) and [addons](/addons/overview/).
## Variables
Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/digital-ocean/fedora-coreos/kubernetes/variables.tf) source.
### Required
| Name | Description | Example |
|:-----|:------------|:--------|
| cluster_name | Unique cluster name (prepended to dns_zone) | "nemo" |
| region | Digital Ocean region | "nyc1", "sfo2", "fra1", tor1" |
| dns_zone | Digital Ocean domain (i.e. DNS zone) | "do.example.com" |
| os_image | Fedora CoreOS image for instances | "custom-image-id" |
| ssh_fingerprints | SSH public key fingerprints | ["d7:9d..."] |
#### DNS Zone
Clusters create DNS A records `${cluster_name}.${dns_zone}` to resolve to controller droplets (round robin). This FQDN is used by workers and `kubectl` to access the apiserver(s). In this example, the cluster's apiserver would be accessible at `nemo.do.example.com`.
You'll need a registered domain name or delegated subdomain in DigitalOcean Domains (i.e. DNS zones). You can set this up once and create many clusters with unique names.
```tf
# Declare a DigitalOcean record to also create a zone file
resource "digitalocean_domain" "zone-for-clusters" {
name = "do.example.com"
ip_address = "8.8.8.8"
}
```
!!! tip ""
If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on DigitalOcean (e.g. do.mydomain.com) and [update nameservers](https://www.digitalocean.com/community/tutorials/how-to-set-up-a-host-name-with-digitalocean).
#### SSH Fingerprints
DigitalOcean droplets are created with your SSH public key "fingerprint" (i.e. MD5 hash) to allow access. If your SSH public key is at `~/.ssh/id_rsa`, find the fingerprint with,
```bash
ssh-keygen -E md5 -lf ~/.ssh/id_rsa.pub | awk '{print $2}'
MD5:d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7
```
If you use `ssh-agent` (e.g. Yubikey for SSH), find the fingerprint with,
```
ssh-add -l -E md5
2048 MD5:d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7 cardno:000603633110 (RSA)
```
Digital Ocean requires the SSH public key be uploaded to your account, so you may also find the fingerprint under Settings -> Security. Finally, if you don't have an SSH key, [create one now](https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/).
### Optional
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| controller_count | Number of controllers (i.e. masters) | 1 | 1 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | Droplet type for controllers | "s-2vcpu-2gb" | s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb, ... |
| worker_type | Droplet type for workers | "s-1vcpu-2gb" | s-1vcpu-2gb, s-2vcpu-2gb, ... |
| controller_snippets | Controller Fedora CoreOS Config snippets | [] | [example](/advanced/customization/) |
| worker_snippets | Worker Fedora CoreOS Config snippets | [] | [example](/advanced/customization/) |
| networking | Choice of networking provider | "calico" | "flannel" or "calico" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
Check the list of valid [droplet types](https://developers.digitalocean.com/documentation/changelog/api-v2/new-size-slugs-for-droplet-plan-changes/) or use `doctl compute size list`.
!!! warning
Do not choose a `controller_type` smaller than 2GB. Smaller droplets are not sufficient for running a controller and bootstrapping will fail.

Some files were not shown because too many files have changed in this diff Show More