Compare commits

...

1010 Commits

Author SHA1 Message Date
daa5fc4171 Merge remote-tracking branch 'upstream/main' 2024-12-02 11:05:29 +01:00
17060445f7 Bump mkdocs-material from 9.5.45 to v9.5.46 2024-11-29 08:54:47 -08:00
10dd385c38 Bump registry.k8s.io/coredns/coredns image from v1.11.4 to v1.12.0 2024-11-29 08:54:38 -08:00
bc59d5153e Update Kubernetes from v1.31.2 to v1.31.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1313
* Update CoreDNS from v1.11.3 to v1.11.4
* Update Cilium from v1.16.3 to v1.16.4
* Plan to drop support for using Calico CNI, recommend everyone use the Cilium default
2024-11-24 08:43:54 -08:00
cec2a097d4 Bump quay.io/cilium/cilium image from v1.16.3 to v1.16.4 2024-11-24 08:36:50 -08:00
afbb55b79e Bump quay.io/cilium/operator-generic image from v1.16.3 to v1.16.4 2024-11-24 08:36:46 -08:00
5cb48f01bd Bump mkdocs-material from 9.5.44 to v9.5.45 2024-11-24 08:36:42 -08:00
dfb307b1a7 Use consistent resources naming btw Azure Flatcar/FCOS
* Fix Azure Public IP name in the Flatcar Linux configuration
2024-11-23 21:20:00 -08:00
a908d30821 Bump registry.k8s.io/coredns/coredns image from v1.11.3 to v1.11.4 2024-11-14 13:31:17 -08:00
2b99ccaa39 nginx/bare-metal: fix selector 2024-11-11 10:00:35 -08:00
93c6c2fed3 nginx: Add endpointslices.discovery.k8s.io to all rbac documents 2024-11-11 10:00:35 -08:00
93c52df929 Bump mkdocs-material from 9.5.42 to v9.5.44 2024-11-11 09:53:16 -08:00
ef740832c9 Bump docker.io/flannel/flannel image from v0.26.0 to v0.26.1 2024-11-11 09:41:02 -08:00
9b28867ea8 Bump pymdown-extensions from 10.11.2 to v10.12 2024-10-30 20:02:18 -07:00
61ffc0bc19 Update Kubernetes from v1.31.1 to v1.31.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1312
* Update Cilium from v1.16.1 to v1.16.3
* Update flannel from v0.25.6 to v0.26.0
2024-10-26 08:33:43 -07:00
e143061bcf Bump mkdocs-material from 9.5.39 to v9.5.42 2024-10-26 08:21:10 -07:00
c3cb5a3f1b Bump quay.io/cilium/cilium image from v1.16.2 to v1.16.3 2024-10-26 08:20:58 -07:00
81265483c6 Bump quay.io/cilium/operator-generic image from v1.16.2 to v1.16.3 2024-10-26 08:19:17 -07:00
a4e0ade8d9 Bump docker.io/flannel/flannel image from v0.25.7 to v0.26.0 2024-10-26 08:18:52 -07:00
3d4905bb3a Bump pymdown-extensions from 10.9 to v10.11.2 2024-10-08 21:33:42 -07:00
5932b651e3 doc: set file_permission 0600 for kubeconfig file
It's only documentation, but kubeconfig file contains sensitive info so it's better to secure it a little
2024-10-08 21:33:31 -07:00
6a5b808b17 Add region to gcp instance template resource
* Configure the regional worker instance templates with the
region of the cluster. This defaults to the provider's region
which isn't always what you want and if left off causes an error
* Close #1512
2024-10-08 21:28:29 -07:00
e6989514a5 Bump mkdocs-material from 9.5.36 to v9.5.39 2024-10-08 21:07:25 -07:00
edd9328554 Bump quay.io/cilium/cilium image from v1.16.1 to v1.16.2 2024-10-08 21:07:18 -07:00
8656a2d75b Bump quay.io/cilium/operator-generic image from v1.16.1 to v1.16.2 2024-10-08 21:07:13 -07:00
16c26f4384 Bump docker.io/flannel/flannel image from v0.25.6 to v0.25.7 2024-10-08 21:07:05 -07:00
c87c21c7e2 Bump mkdocs-material from 9.5.35 to v9.5.36 2024-09-21 19:31:03 -07:00
598f707cbd Update Kubernetes from v1.31.0 to v1.31.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1311
2024-09-20 14:43:39 -07:00
3f844e3c57 google: Add controller_disk_type and worker_disk_type variables (#1513)
* Add controller_disk_type and worker_disk_type variables
* Properly pass disk_type to worker nodes
2024-09-20 14:31:17 -07:00
b2fad7771f Bump mkdocs from 1.6.0 to v1.6.1 2024-09-20 14:20:43 -07:00
3ae8794c6c Bump mkdocs-material from 9.5.34 to v9.5.35 2024-09-20 13:06:40 -07:00
6878fa9fe6 Bump mkdocs-material from 9.5.33 to v9.5.34 2024-09-09 19:55:42 -07:00
c72e99834c Bump docker.io/flannel/flannel image from v0.25.5 to v0.25.6 2024-08-28 19:45:28 -07:00
7d2d8e16e5 google: Use regional instance templates for workers
* Use regional instance templates for the worker node regional
managed instance groups. Regional instance templates are kept in
the associated region, whereas the older "global" instance templates
were kept in a particular region (regardless of where the MIG region)
so outages in a region X could affect clusters in a region Y which
is undesired
2024-08-27 21:35:02 -07:00
be9ba51269 Bump mkdocs-material from 9.5.32 to v9.5.33 2024-08-23 21:51:36 -07:00
9a2448f711 Remove upper bound on azurerm provider version
* Allow folks to start upgrading to azurerm provider v4.0.0,
don't set an upper bound on versions going forward
2024-08-23 21:51:29 -07:00
3412060c3c Use Cilium kube-proxy replacement when Cilium CNI is used
* When using the Cilium component, disable bootstrapping the
kube-proxy DaemonSet. Instead, configure Cilium to provide its
kube-proxy replacement with BPF
* Update the self-managed Cilium component to use kube-proxy
replacement as well
2024-08-23 12:33:32 -07:00
808b8a948f aws: Switch EC2 instances to use resource-based hostnames
* Use EC2 resource-based hostnames instead of IP-based hostnames. The Amazon
DNS server can resolve A and AAAA queries to IPv4 and IPv6 node addresses
* For example, nodes used to be named like `ip-10-11-12-13.us-east-1.compute.internal`
but going forward use the instance id `i-0123456789abcdef.us-east-1.compute.internal`
* Tag controller node EBS volumes with a name based on the controller node name
2024-08-22 20:02:53 -07:00
effa13c141 Fix flannel-cni container image
* Close #1496
2024-08-22 19:26:19 -07:00
b8645f3ec2 Bump mkdocs-material from 9.5.31 to v9.5.32 2024-08-22 10:36:50 -07:00
10be34daa2 Update Kubernetes from v1.30.4 to v1.31.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1310
2024-08-17 08:32:35 -07:00
1cb49e1267 Bump quay.io/cilium/cilium image from v1.16.0 to v1.16.1 2024-08-16 08:31:11 -07:00
d79f94f4f5 Bump quay.io/cilium/operator-generic image from v1.16.0 to v1.16.1 2024-08-16 08:31:01 -07:00
320d76c934 Update Kubernetes from v1.30.3 to v1.30.4
* Update Cilium from v1.16.0 to v1.16.1
2024-08-16 08:27:07 -07:00
2daa23be50 Update default Cilium and CoreDNS components
* Update the CoreDNS and Cilium versons used by default when
folks aren't managing the components themselves
2024-08-05 08:47:06 -07:00
6e2daded02 Remove some seldom used variables and set reasonable
* Set reasonable values and remove some variable clutter
* enable_reporting is only used with Calico and we can just default
to false, I doubt anyone uses Calico and cares much about reporting
metrics to upstream Calico
2024-08-02 20:45:37 -07:00
83f1bd2373 Update ARM64 cluster and hybrid cluster docs
* Typhoon now supports arbitrary combinations of controller, worker,
and worker pool architectures so we can drop the specific details of
full-cluster vs hybrid cluster. Just pick the architecture for each
group of nodes accordingly.
* However, if a custom node taint is set, continue to configure the
cluster's daemonsets accordingly with `daemonset_tolerations`
2024-08-02 20:34:23 -07:00
67e5ecf6f2 Bump mkdocs-material from 9.5.30 to v9.5.31 2024-08-02 16:46:36 -07:00
0120b9f38d Remove the cluster_domain_suffix variable
* Drop support for `cluster_domain_suffix` customization and
always use `cluster.local`. Many components in the Kubernetes
ecosystem assume this default suffix and its very rare to be
setting a special value here these days
* Cleanup a few variables that are seldom used
2024-08-02 15:05:25 -07:00
af27661432 Configure controller and worker node architecture separately
* On platforms that support ARM64 instances, configure controller
and worker node host architectures separately
* For example, you can run arm64 controllers and amd64 workers
* Add `controller_arch` and `worker_arch` variables
* Remove `arch` variable
2024-08-02 15:04:57 -07:00
516786d7bb google: Configure controller and worker disk sizes
* Add `controller_disk_size` and `worker_disk_size` variables
* Remove `disk_size` variable
2024-08-02 13:07:41 -07:00
1104b4bf28 AWS: Add CPU pricing mode and controller/worker disk variables
* Add `controller_disk_type`, `controller_disk_size`, and `controller_disk_iops`
variables
* Add `worker_disk_type`, `worker_disk_size`, and `worker_disk_iops` variables
and fix propagation to worker nodes
* Remove `disk_type`, `disk_size`, and `disk_iops` variables
* Add `controller_cpu_credits` and `worker_cpu_credits` variables to set CPU
pricing mode for burstable instance types
2024-07-31 15:02:28 -07:00
39b5079bc3 Bump registry.k8s.io/coredns/coredns image from v1.11.1 to v1.11.3 2024-07-31 13:28:30 -07:00
858d665d9b Bump quay.io/cilium/cilium image from v1.15.7 to v1.16.0 2024-07-28 11:59:07 -07:00
8cea37cdd9 Bump quay.io/cilium/operator-generic image from v1.15.7 to v1.16.0 2024-07-28 11:58:58 -07:00
4251ca937a Bump pymdown-extensions from 10.8.1 to v10.9 2024-07-28 11:58:50 -07:00
329987187b Bump mkdocs-material from 9.5.29 to v9.5.30 2024-07-26 09:37:41 -07:00
d046026511 Fix incorrect terraform-render-bootstrap SHA 2024-07-25 21:41:54 -07:00
0669d44026 Update Kubernetes from v1.30.2 to v1.30.3
* Update builtin Cilium manifests from v1.15.6 to v1.15.7
* Update builtin flannel manifests from v0.25.4 to v0.25.5
2024-07-20 11:04:32 -07:00
672bbad10b Generate Azure Virtual Network IPv6 ULA space at random
* Private IPv6 address space should be assigned randomly within
an organization per https://datatracker.ietf.org/doc/html/rfc4193
2024-07-20 11:01:50 -07:00
be0e516974 Bump mkdocs-material from 9.5.28 to v9.5.29 2024-07-20 10:44:04 -07:00
6a61afcd3b Bump docker.io/flannel/flannel image from v0.25.4 to v0.25.5 2024-07-20 10:36:12 -07:00
ca1f897b35 Bump quay.io/cilium/cilium image from v1.15.6 to v1.15.7 2024-07-14 13:42:35 -07:00
d4514db00c Bump quay.io/cilium/operator-generic image from v1.15.6 to v1.15.7 2024-07-14 13:42:26 -07:00
0d10d180f8 Change worker node pools from uniform to flexible orchestration mode
* Use flexible orchestration mode. Azure has started to recommend this
mode because it allows interacting with VMSS instances like regular VMs
via the CLI or via the Azure Portal
* Add options to allow workers nodes to use ephemeral local disks
  * Add `controller_disk_type` and `controller_disk_size` variables
  * Add `worker_disk_type`, `worker_disk_size`, and `worker_ephemeral_disk` variables
2024-07-14 11:58:15 -07:00
a4fab61066 Remove an IPv4 address from Azure clusters
* Consolidate load balancer frontend IPs to just the minimal IPv4
and IPv6 addresses that are needed per load balancer. apiserver and
ingress use separate ports, so there is not a true need for a separate
public IPv4 address just for apiserver
* Some might prefer a separate IP just because it slightly hides the
apiserver, but these are public hosted endpoints that can be discovered
* Reduce the cost of an Azure cluster since IPv4 public IPs are billed
($3.60/mo/cluster)
2024-07-10 22:29:43 -07:00
24b7f31c55 Rename Azure cluster region variable to location
* Rename the region variable to location to align with Azure
platform conventions, where resources are created within an
Azure location, which are themselves part of broader geographical
regions
2024-07-09 07:56:58 -07:00
48d4973957 Add IPv6 support for Typhoon Azure clusters
* Define a dual-stack virtual network with both IPv4 and IPv6 private
address space. Change `host_cidr` variable (string) to a `network_cidr`
variable (object) with "ipv4" and "ipv6" fields that list CIDR strings.
* Define dual-stack controller and worker subnets. Disable Azure
default outbound access (a deprecated fallback mechanism)
* Enable dual-stack load balancing to Kubernetes Ingress by adding
a public IPv6 frontend IP and LB rule to the load balancer.
* Enable worker outbound IPv6 connectivity through load balancer
SNAT by adding an IPv6 frontend IP and outbound rule
* Configure controller nodes with a public IPv6 address to provide
direct outbound IPv6 connectivity
* Add an IPv6 worker backend pool. Azure requires separate IPv4 and
IPv6 backend pools, though the health probe can be shared
* Extend network security group rules for IPv6 source/destinations

Checklist:

Access to controller and worker nodes via IPv6 addresses:

  * SSH access to controller nodes via public IPv6 address
  * SSH access to worker nodes via (private) IPv6 address (via
    controller)

Outbound IPv6 connectivity from controller and worker nodes:

```
nc -6 -zv ipv6.google.com 80
Ncat: Version 7.94 ( https://nmap.org/ncat )
Ncat: Connected to [2607:f8b0:4001:c16::66]:80.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.
```

Serve Ingress traffic via IPv4 or IPv6 just requires setting
up A and AAAA records and running the ingress controller with
`hostNetwork: true` since, hostPort only forwards IPv4 traffic
2024-07-09 07:55:00 -07:00
3483ed8bd5 Bump mkdocs-material from 9.5.27 to v9.5.28 2024-07-03 22:23:23 -07:00
931d6d18de Update Kubernetes from v1.30.1 to v1.30.2
* Update CoreDNS from v1.9.4 to v1.11.1
* Update Cilium from v1.15.5 to v1.15.6
* Update flannel from v0.25.1 to v0.25.4
2024-06-17 08:20:03 -07:00
da99a01f43 Bump mkdocs-material from 9.5.26 to v9.5.27 2024-06-16 16:57:27 -07:00
5090e60fe0 Bump quay.io/cilium/operator-generic image from v1.15.5 to v1.15.6 2024-06-15 08:01:29 -07:00
158a681a8b Bump quay.io/cilium/cilium image from v1.15.5 to v1.15.6 2024-06-15 08:00:23 -07:00
8fd2c95cec Bump docker.io/flannel/flannel image from v0.25.3 to v0.25.4 2024-06-15 07:55:44 -07:00
9be5250a71 Bump mkdocs-material from 9.5.25 to v9.5.26 2024-06-09 15:58:05 -07:00
d6e4f49cd9 Bump docker.io/flannel/flannel image from v0.25.2 to v0.25.3 2024-05-31 17:13:30 -07:00
2d020a2ce3 Bump mkdocs-material from 9.5.24 to v9.5.25 2024-05-27 07:43:40 -07:00
e942ae9f4a Bump docker.io/flannel/flannel image from v0.25.1 to v0.25.2 2024-05-26 12:45:03 -07:00
fa8f3d81b4 Bump mkdocs-material from 9.5.23 to v9.5.24 2024-05-26 12:23:13 -07:00
c48b04ea88 Update docs to mention components 2024-05-19 17:10:47 -07:00
7b8a51070f Add Terraform modules for CoreDNS, Cilium, and flannel
* With the new component system, these components can be managed
independent from the cluster and rolled or edited in advanced
ways
2024-05-19 17:00:10 -07:00
533ace7011 Update Cilium from v1.15.4 to v1.15.5
* https://github.com/cilium/cilium/releases/tag/v1.15.5
2024-05-19 16:38:08 -07:00
b3c384fbc0 Introduce the component system for managing pre-installed addons
* Previously: Typhoon provisions clusters with kube-system components
like CoreDNS, kube-proxy, and a chosen CNI provider (among flannel,
Calico, or Cilium) pre-installed. This is convenient since clusters
come with "batteries included". But it also means upgrading these
components is generally done in lock-step, by upgrading to a new
Typhoon / Kubernetes release
* It can be valuable to manage these components with a separate
plan/apply process or through automations and deploy systems. For
example, this allows managing CoreDNS separately from the cluster's
lifecycle.
* These "components" will continue to be pre-installed by default,
but a new `components` variable allows them to be disabled and
managed as "addons", components you apply after cluster creation
and manage on a rolling basis. For some of these, we may provide
Terraform modules to aide in managing these components.

```
module "cluster" {
  # defaults
  components = {
    enable = true
    coredns = {
      enable = true
    }
    kube_proxy = {
      enable = true
    }
    # Only the CNI set in var.networking will be installed
    flannel = {
      enable = true
    }
    calico = {
      enable = true
    }
    cilium = {
      enable = true
    }
  }
}
```

An earlier variable `install_container_networking = true/false` has
been removed, since it can now be achieved with this more extensible
and general components mechanism by setting the chosen networking
provider enable field to false.
2024-05-19 16:33:57 -07:00
563feacd29 Update Kubernetes from v1.30.0 to v1.30.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1301
2024-05-15 21:59:00 -07:00
178d1e6eb1 Bump mkdocs-material from 9.5.22 to v9.5.23 2024-05-15 20:52:03 -07:00
3f34e047f1 azure: Add controller security group and subnet outputs
* Output the network security group name and address prefixes
for controller nodes, to allow adding custom network security
rules that apply specifically to controller nodes
2024-05-14 21:34:31 -07:00
cc80ec9b98 Add firewall and security rules for Cilium/Hubble metrics
* Add firewall or security riles to allow node-to-node traffic
on ports 9962-9965 for Cilium and Hubble metrics. Cilium runs
with host network, so these require cloud firewall changes
2024-05-13 21:27:38 -07:00
1d63592c42 Bump mkdocs-material from 9.5.21 to v9.5.22 2024-05-13 06:57:20 -07:00
d08cd317d9 Allow CoreDNS and kube-proxy to be optional components
* Allow for more minimal base cluster setups, that manage CoreDNS or
kube-proxy as applications, with rolling updates, or deploy systems.
Or in the case of kube-proxy, its becoming more common to not install
it and instead use Cilium
* Add a `components` pass-through variable to configure pre-installed
components like kube-proxy and CoreDNS. These components can be
disabled (individually or together) to allow for managing components
with separate plan/apply processes or automations
* terraform-render-bootstrap manifest assets are now structured as
manifests/{coredns,kube-proxy,network} so adapt the controller
layout scripts accordingly
* This is similar to some changes in v1.29.2 that allowed for the
container networking provider manifests to be skipped

Related: https://github.com/poseidon/typhoon/pull/1419, https://github.com/poseidon/typhoon/pull/1421
2024-05-12 21:20:27 -07:00
78d5100181 Update Cilium and flannel container images
* Update Cilium from v1.15.3 to v1.25.4
* Update flannel from v0.24.4 to v0.25.1
2024-05-12 08:27:27 -07:00
e8a42ae33e Bump provider ct to v0.13.0 2024-05-04 09:01:19 -07:00
ed0fa5c9a9 Bump pygments from 2.17.2 to v2.18.0 2024-05-04 09:00:38 -07:00
15608fa6ae Bump mkdocs-material from 9.5.19 to v9.5.21 2024-05-04 08:45:24 -07:00
9e9362154d Bump pymdown-extensions from 10.8 to 10.8.1
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 10.8 to 10.8.1.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/10.8...10.8.1)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-04 08:37:41 -07:00
7d8c0631cd Update mkdocs and mkdocs-material together
* There was a bit of discussion upstream about the pinning but that
is resolved https://github.com/squidfunk/mkdocs-material/issues/7076
2024-04-25 21:47:51 -07:00
6ac5a0222b Update Kubernetes from v1.29.3 to v1.30.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1300
2024-04-23 20:51:54 -07:00
ed9a031d39 Bump pymdown-extensions from 10.7.1 to 10.8
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 10.7.1 to 10.8.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/10.7.1...10.8)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-22 22:10:24 -07:00
88112d4de2 Bump mkdocs-material from 9.5.16 to 9.5.18
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.16 to 9.5.18.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.16...9.5.18)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-22 22:10:17 -07:00
bda94bd278 Add release.yaml to help auto-populate release notes
* Auto-populated release notes have a nice way of highlighting
new contributors and sorting dependency updates to the bottom.
I'll still keep the hand-written changelog notes at the top
because they're written for those who want a better summary
than just a bunch of PR titles
* Remove the PR template since its often unused
2024-04-03 22:54:06 -07:00
cafcdbc3e7 Update etcd from v3.5.12 to v3.5.13 and bump Calico/Cilium
* Update Cilium from v1.15.2 to v1.15.3
* Update Calico from v3.27.2 to v3.27.3
2024-04-03 22:51:07 -07:00
4bc10a8a4c Bump mkdocs-material from 9.5.15 to 9.5.16
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.15 to 9.5.16.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.15...9.5.16)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-02 09:57:57 -07:00
4c3dd07ab3 Bump mkdocs-material from 9.5.14 to 9.5.15
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.14 to 9.5.15.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.14...9.5.15)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-25 11:23:35 -07:00
8524aa00bc Update Kubernetes from v1.29.2 to v1.29.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1293
2024-03-23 00:47:10 -07:00
734c8c2107 Allow stopping Google Cloud controller nodes to resize them (#1424)
* Google Cloud requires VMs be stopped in order to update their properties. This is only allowed if explicitly enabled
2024-03-22 11:23:00 -07:00
fbe36b8b16 Update Cilium and flannel container image versions
* https://github.com/cilium/cilium/releases/tag/v1.15.2
* https://github.com/flannel-io/flannel/releases/tag/v0.24.4
2024-03-22 11:19:49 -07:00
8038669504 Bump pymdown-extensions from 10.7 to 10.7.1
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 10.7 to 10.7.1.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/10.7...10.7.1)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-20 13:03:45 -07:00
7af83404e1 Bump mkdocs-material from 9.5.12 to 9.5.14
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.12 to 9.5.14.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.12...9.5.14)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-18 15:26:17 -07:00
e9c7c4a4c1 Bump mkdocs-material from 9.5.11 to 9.5.12
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.11 to 9.5.12.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.11...9.5.12)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-07 08:39:47 -08:00
ed82c41423 Bump mkdocs-material from 9.5.10 to 9.5.11
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.10 to 9.5.11.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.10...9.5.11)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-26 09:08:32 -08:00
41907a0ba6 Update Calico from v3.26.3 to v3.27.2
* Update fixes Calico incompatibility with Fedora CoreOS

Rel: https://github.com/projectcalico/calico/issues/8372
2024-02-25 12:11:56 -08:00
ab66d11edf Bump mkdocs-material from 9.5.9 to 9.5.10
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.9 to 9.5.10.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.9...9.5.10)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-24 19:38:25 -08:00
2325a503e1 Add an install_container_networking variable (default true)
* When `true`, the chosen container `networking` provider is installed during cluster bootstrap
* Set `false` to self-manage the container networking provider. This allows flannel, Calico, or Cilium
to be managed via Terraform (like any other Kubernetes resources). Nodes will be NotReady until you
apply the self-managed container networking provider. This may become the default in future.
2024-02-24 18:49:38 -08:00
7a46eb03ae Update Cilium from v1.14.3 to v1.15.1
* https://github.com/cilium/cilium/releases/tag/v1.15.1
2024-02-23 22:59:31 -08:00
0e7977694f Allow CNI networking to be set to none
* Set CNI networking to "none" to skip installing any CNI provider
(i.e. no flannel, Calico, or Cilium). In this mode, cluster nodes
will be NotReady until you add your own CNI stack
* Motivation: I now tend to manage CNI components as addon modules
just like other applications overlaid onto a cluster. It allows for
faster iteration and may eventually become the recommendation
2024-02-23 22:57:47 -08:00
f2f625984e Update Kubernetes from v1.29.1 to v1.29.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1292
2024-02-18 18:31:31 -08:00
ac3eab4e00 Bump mkdocs-material from 9.5.7 to 9.5.9
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.7 to 9.5.9.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.7...9.5.9)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-18 17:52:24 -08:00
aecb7775a8 Update etcd from v3.5.10 to v3.5.12
* https://github.com/etcd-io/etcd/releases/tag/v3.5.11
* https://github.com/etcd-io/etcd/releases/tag/v3.5.12
2024-02-18 15:36:37 -08:00
301f460d25 Bump mkdocs-material from 9.5.6 to 9.5.7
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.6 to 9.5.7.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.6...9.5.7)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-06 19:43:09 -08:00
e247673a20 Update Kubernetes from v1.29.0 to v1.29.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1291
2024-02-04 10:47:42 -08:00
808eafd178 Fix AWS launch template to retain support for IMDVv1
* AWS has recently started defaulting launch templates to IMDSv2
being "required". aws_launch_template is supposed to default to
"optional" but it doesn't.
* Requiring IMDSv2 sessions breaks a number of applications which
don't use AWS SDKs and were never meant to be complex applications
(e.g. shell scripts and the like)
2024-02-04 10:38:50 -08:00
4d4c5413de Bump mkdocs-material from 9.5.4 to 9.5.6
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.4 to 9.5.6.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.4...9.5.6)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-30 20:35:16 -08:00
fbf4544cfd Bump mkdocs-material from 9.5.3 to 9.5.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.3 to 9.5.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.3...9.5.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-28 23:11:59 -08:00
af719e46f2 feat ensured that appropriate rbacs are set to allow the ingressclass on gcp (#1409) 2024-01-12 20:16:10 -08:00
25c9ec8e3d Bump pymdown-extensions from 10.5 to 10.7
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 10.5 to 10.7.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/10.5...10.7)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-07 19:36:43 -08:00
5bea4b7d9c Bump mkdocs-material from 9.5.2 to 9.5.3
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.2 to 9.5.3.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.2...9.5.3)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-07 19:31:11 -08:00
84e4f02917 Update Kubernetes from v1.28.4 to v1.29.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md
2023-12-22 10:27:24 -08:00
5e06f29810 Bump mkdocs-material from 9.4.14 to 9.5.2
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.4.14 to 9.5.2.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.4.14...9.5.2)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-12-17 11:38:47 -08:00
0d997def31 Add release note for v1.28.4 2023-12-10 21:02:21 -08:00
0ad69f8899 Bump pygments from 2.16.1 to 2.17.2
Bumps [pygments](https://github.com/pygments/pygments) from 2.16.1 to 2.17.2.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.16.1...2.17.2)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-27 21:52:12 -08:00
35435e56ae Bump pymdown-extensions from 10.3.1 to 10.5
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 10.3.1 to 10.5.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/10.3.1...10.5)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-27 21:49:45 -08:00
493030de82 Bump mkdocs-material from 9.4.8 to 9.4.14
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.4.8 to 9.4.14.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.4.8...9.4.14)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-27 21:30:22 -08:00
8254d8f3db Update Kubernetes from v1.28.3 to v1.28.4
* https://github.com/kubernetes/kubernetes/releases/tag/v1.28.4
2023-11-21 06:16:58 -08:00
4691a11afd Bump mkdocs-material from 9.4.7 to 9.4.8
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.4.7 to 9.4.8.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.4.7...9.4.8)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-07 22:27:30 -08:00
516517fafe Merge remote-tracking branch 'upstream/main' 2023-11-02 11:56:22 +01:00
5b47d79253 Bump mkdocs-material from 9.4.6 to 9.4.7
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.4.6 to 9.4.7.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.4.6...9.4.7)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-31 09:03:02 -07:00
435fa196da Relax the provider version constraint for Google Cloud
* Allow upgrading to the v5.x Google Cloud Terrform Provider
* Relax the version constraint to ease future compatibility,
though it does allow users to upgrade prematurely
2023-10-30 09:05:06 -07:00
39af942f4d Update etcd from v3.5.9 to v3.5.10
* https://github.com/etcd-io/etcd/releases/tag/v3.5.10
2023-10-29 18:21:40 -07:00
4c8bfa4615 Update Calico from v3.26.1 to v3.26.3 2023-10-29 18:19:10 -07:00
386a004072 Update Cilium from v1.14.2 to to v1.14.3 2023-10-29 18:17:55 -07:00
291107e4c9 Workaround problems in Cilium v1.14 partial kube-proxy replacement
* With Cilium v1.14, Cilium's kube-proxy partial mode changed to
either be enabled or disabled (not partial). This somtimes leaves
Cilium (and the host) unable to reach the kube-apiserver via the
in-cluster Kubernetes Service IP, until the host is rebooted
* As a workaround, configure Cilium to rely on external DNS resolvers
to find the IP address of the apiserver. This is less portable
and less "clean" than using in-cluster discovery, but also what
Cilium wants users to do. Revert this when the upstream issue
https://github.com/cilium/cilium/issues/27982 is resolved
2023-10-29 16:16:56 -07:00
2062144597 Bump pymdown-extensions from 10.3 to 10.3.1
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 10.3 to 10.3.1.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/10.3...10.3.1)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-23 11:52:12 -07:00
c7732d58ae Bump mkdocs-material from 9.4.4 to 9.4.6
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.4.4 to 9.4.6.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.4.4...9.4.6)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-23 09:35:49 -07:00
005a1119f3 Update Kubernetes from v1.28.2 to v1.28.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1283
2023-10-22 18:43:54 -07:00
21f7142464 Merge remote-tracking branch 'upstream/main' 2023-10-20 14:00:37 +02:00
68df37451e Update outputs.tf for bare-metal/flatcar-linux to include kubeconfig output 2023-10-15 22:15:35 -07:00
bf9e74f5a1 Bump mkdocs-material from 9.4.2 to 9.4.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.4.2 to 9.4.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.4.2...9.4.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-15 22:13:24 -07:00
73e7448f53 Merge remote-tracking branch 'upstream/main' 2023-10-11 13:31:16 +02:00
6bd6d46fb2 Bump mkdocs-material from 9.3.1 to 9.4.2
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.3.1 to 9.4.2.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.3.1...9.4.2)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-26 23:01:38 -07:00
c8105d7d42 Bump mkdocs from 1.5.2 to 1.5.3
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.5.2 to 1.5.3.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.5.2...1.5.3)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-26 05:37:24 -07:00
215c9fe75d Add venv to gitignore for the repo 2023-09-21 22:12:44 +02:00
0ce8dfbb95 Workaround to allow use of ed25519 keys on Azure
* Allow passing a dummy RSA key to Azure to satisfy its obtuse
requirements (recommend deleting the corresponding private key)
* Then `ssh_authorized_key` can be used to provide Fedora CoreOS
or Flatcar Linux with a modern ed25519 public key to set in the
authorized_keys via Ignition
2023-09-17 23:21:42 +02:00
8cbcaa5fc6 Update Cilium from v1.14.1 to v1.14.2
* https://github.com/cilium/cilium/releases/tag/v1.14.2
2023-09-16 17:10:07 +02:00
f5bc1fb1fd Update Kubernetes from v1.28.1 to v1.28.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1282
2023-09-14 13:01:33 -07:00
7475d5fd27 Bump mkdocs-material from 9.2.7 to 9.3.1
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.2.7 to 9.3.1.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.2.7...9.3.1)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-11 22:44:33 -07:00
fbace61af7 Bump mkdocs-material from 9.2.5 to 9.2.7
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.2.5 to 9.2.7.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.2.5...9.2.7)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-04 13:44:32 -07:00
e3bf18ce41 Bump pymdown-extensions from 10.1 to 10.3
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 10.1 to 10.3.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/10.1.0...10.3)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-03 12:30:46 -07:00
ebdd8988cf Bump mkdocs-material from 9.1.21 to 9.2.5
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.21 to 9.2.5.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.21...9.2.5)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-03 12:29:30 -07:00
126973082a Update Kubernetes from v1.28.0 to v1.28.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1281
2023-08-26 13:29:48 -07:00
61135da5bb Emulate Cilium KubeProxyReplacement partial mode
* Details: https://github.com/poseidon/terraform-render-bootstrap/pull/363
2023-08-26 11:31:28 -07:00
fc951c7dbf Fix Cilium v1.14 support for HostPort pods
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/362
2023-08-21 19:58:19 -07:00
c259142c28 Update Cilium from v1.14.0 to v1.14.1 2023-08-20 16:09:22 -07:00
81eed2e909 Update Kubernetes from v1.27.4 to v1.28.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1280
2023-08-20 15:41:23 -07:00
bdaa0e81b0 Bump pygments from 2.15.1 to 2.16.1
Bumps [pygments](https://github.com/pygments/pygments) from 2.15.1 to 2.16.1.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.15.1...2.16.1)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-07 21:52:12 -07:00
c1e0eba7b6 Bump mkdocs from 1.4.3 to 1.5.2
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.4.3 to 1.5.2.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.4.3...1.5.2)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-04 23:21:02 -07:00
66fda88d20 Bump mkdocs-material from 9.1.19 to 9.1.21
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.19 to 9.1.21.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.19...9.1.21)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-04 22:40:45 -07:00
27cecd0f94 fix typo in variable name 2023-08-03 14:26:39 +02:00
634deaf92e Adding install_snippets support.
During the "real" first boot (install boot), we need tu run butane
config to manipulate disks, so we add install_snippets variable to do
so.

This snippets are added to the install.yaml butane configuration
2023-08-03 14:16:24 +02:00
cd699ee1aa Update docs on flatcar-linux bare-metal kubernetes worker module usage. 2023-08-02 12:07:53 +02:00
d29e6e3de1 Upgrade Cilium from v1.13.4 to v1.14.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/360
* Also update flannel from v0.22.0 to v0.22.1
2023-07-30 09:36:23 -07:00
be37170e59 Bump mkdocs-material from 9.1.18 to 9.1.19
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.18 to 9.1.19.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.18...9.1.19)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-24 19:46:01 -07:00
0a6183f859 Update Kubernetes from v1.27.3 to v1.27.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1274
2023-07-21 08:00:50 -07:00
1888c272eb Bump pymdown-extensions from 10.0.1 to 10.1
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 10.0.1 to 10.1.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/10.0.1...10.1.0)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-19 22:40:18 -07:00
880821391a Bump mkdocs-material from 9.1.17 to 9.1.18
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.17 to 9.1.18.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.17...9.1.18)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-03 11:08:52 -07:00
9314807dfd Bump mkdocs-material from 9.1.16 to 9.1.17
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.16 to 9.1.17.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.16...9.1.17)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-27 10:15:54 -04:00
56d71c0eca Bump mkdocs-material from 9.1.15 to 9.1.16
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.15 to 9.1.16.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.15...9.1.16)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-19 13:10:49 -07:00
9a28fe79a1 Upgrade Calico from v3.25.1 to v3.26.1
* Add new CRD bgpfilters and new ClusterRoles calico-cni-plugin

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/358
2023-06-19 12:28:53 -07:00
7255f82d71 Update Kubernetes fromv 1.27.2 to v1.27.3
* Update Cilium v1.13.3 to v1.13.4

Rel: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1273
2023-06-16 08:28:17 -07:00
6f4b4cc508 Update Cilium from v1.13.2 to v1.13.3
* Also update flannel v0.21.2 to v0.22.0

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/355
2023-06-11 19:59:10 -07:00
094811dc73 Relax aws Terraform Provider version constraint
* aws provider v5.0+ works alright and should be permitted,
relax the version constraint for the Typhoon AWS kubernetes
module and worker module for Fedora CoreOS and Flatcar Linux
2023-06-11 19:46:01 -07:00
2a5a43f3a4 Update etcd from v3.5.8 to v3.5.9
* https://github.com/etcd-io/etcd/releases/tag/v3.5.9
2023-06-11 19:28:23 -07:00
784f60f624 Enable boot diagnostics for Azure controller and worker VMs
* When invalid Ignition snippets are provided to Typhoon, it
can be useful to view Azure's boot logs for the instance, which
requires boot diagnostics be enabled
2023-06-11 19:24:09 -07:00
58e0ff9f5e Bump mkdocs-material from 9.1.14 to 9.1.15
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.14 to 9.1.15.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.14...9.1.15)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-05 19:30:38 -07:00
9e63f1247a Consolidate the mkdocs to GitHub Pages publish workflow
* Use a shared GitHub Workflow to build the mkdocs site and
publish to GitHub Pages (when the release-docs branch is updated)
2023-05-26 10:22:21 -07:00
ecc9a73df4 Add a GitHub Workflow to push to GitHub Pages
* Automatically push to GitHub pages when the release-docs
branch is updated
2023-05-25 09:21:21 -07:00
1665cfb613 Bump pymdown-extensions from 10.0 to 10.0.1
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 10.0 to 10.0.1.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/10.0...10.0.1)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-23 17:48:16 -07:00
1919ff1355 Bump mkdocs-material from 9.1.13 to 9.1.14
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.13 to 9.1.14.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.13...9.1.14)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-23 17:48:04 -07:00
8ebf31073c Update Kubernetes from v1.27.1 to v1.27.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1272
2023-05-21 14:02:49 -07:00
867ca6a94e Bump mkdocs-material from 9.1.11 to 9.1.13
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.11 to 9.1.13.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.11...9.1.13)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-17 21:50:09 -07:00
819dd111ed Bump pymdown-extensions from 9.11 to 10.0
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.11 to 10.0.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.11...10.0)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-17 21:17:44 -07:00
c16cc08375 Bump mkdocs-material from 9.1.8 to 9.1.11
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.8 to 9.1.11.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.8...9.1.11)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-10 22:38:14 -07:00
64472d5bf7 Bump mkdocs from 1.4.2 to 1.4.3
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.4.2 to 1.4.3.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.4.2...1.4.3)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-10 22:09:29 -07:00
ae82c57eee Bump pygments from 2.15.0 to 2.15.1
Bumps [pygments](https://github.com/pygments/pygments) from 2.15.0 to 2.15.1.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.15.0...2.15.1)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-28 08:12:57 -07:00
fe23fca72b Bump mkdocs-material from 9.1.6 to 9.1.8
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.6 to 9.1.8.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.6...9.1.8)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-28 08:08:15 -07:00
4ef1908299 Fix: extra kernel_args added to bare-metal workers 2023-04-28 08:07:54 -07:00
2272472d59 Omit -o flag to flatcar-install unless oem_type is defined 2023-04-25 19:02:30 -07:00
fc444d25f8 Update poseidon/ct provider and Butane Config version
* Update Fedora CoreOS Butane configs from v1.4.0 to v1.5.0
* Require Fedora CoreOS Butane snippets update to v1.1.0
* Require poseidon/ct Terraform provider v0.13 or newer
* Use Ignition v3.4.0 spec for all node provisioning
2023-04-21 08:58:20 -07:00
5feb4c63f7 Update Cilium from v1.13.1 to v1.13.2
* https://github.com/cilium/cilium/releases/tag/v1.13.2
2023-04-20 08:44:31 -07:00
501e6d25e0 Update Kubernetes from v1.27.0 to v1.27.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1271
2023-04-15 23:16:51 -07:00
1e76e1a200 Update etcd from v3.5.7 to v3.5.8
* https://github.com/etcd-io/etcd/releases/tag/v3.5.8
2023-04-15 22:54:31 -07:00
4322857bec Update Kubernetes from v1.26.3 to v1.27.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1270
2023-04-15 22:49:12 -07:00
e3bfa1c89b Bump pygments from 2.14.0 to 2.15.0
Bumps [pygments](https://github.com/pygments/pygments) from 2.14.0 to 2.15.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.14.0...2.15.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-12 22:12:36 -07:00
47213a8e8f Bump pymdown-extensions from 9.10 to 9.11
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.10 to 9.11.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.10...9.11)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-12 10:16:35 -07:00
8943c0f55e Bump mkdocs-material from 9.1.5 to 9.1.6
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.5 to 9.1.6.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.5...9.1.6)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-12 09:43:33 -07:00
44d84cf324 Bump mkdocs-material from 9.1.4 to 9.1.5
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.4 to 9.1.5.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.4...9.1.5)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-05 08:15:00 -07:00
ec2e0b2fd7 Fix CHANGES.md line about oem_type variable
* Move line about oem_type variable to v1.26.3 release notes
2023-04-02 08:53:10 -07:00
6bd2a1a528 Expose flatcar-install OEM parameter
By exposing this parameter it is possible to install OEM specific software
during the `flatcar-install` invocation.
2023-04-01 09:38:29 -07:00
5f303212d2 Update Cilium to use an init container to install CNI plugins
* https://github.com/poseidon/terraform-render-bootstrap/pull/348
2023-03-29 10:35:21 -07:00
bcee364b4c Bump mkdocs-material from 9.1.3 to 9.1.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.3 to 9.1.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.3...9.1.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-28 18:35:57 -07:00
3670ec7ed7 Update Kubernetes from v1.26.2 to v1.26.3
* Update Cilium from v1.13.0 to v1.13.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1263
2023-03-21 18:18:19 -07:00
1e3af87392 Bump mkdocs-material from 9.1.2 to 9.1.3
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.2 to 9.1.3.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.2...9.1.3)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-21 17:28:01 -07:00
2b3cd451d2 Update Cilium from v1.12.6 to v1.13.0
* https://github.com/cilium/cilium/releases/tag/v1.13.0
2023-03-14 11:16:14 -07:00
ff937b0b7e Bump mkdocs-material from 9.1.1 to 9.1.2
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.1 to 9.1.2.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.1.1...9.1.2)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-12 12:20:23 -07:00
4891a66e29 Update CHANGES.md with release notes 2023-03-10 18:10:51 -08:00
3ff6c2fdf7 Bump mkdocs-material from 9.0.15 to 9.1.1
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.0.15 to 9.1.1.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.0.15...9.1.1)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-07 08:57:38 -08:00
517863c31a Bump pymdown-extensions from 9.9.2 to 9.10
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.9.2 to 9.10.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.9.2...9.10)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-07 08:57:14 -08:00
76ebc08fd2 Update Kubernetes from v1.26.1 to v1.26.2
* https://github.com/poseidon/terraform-render-bootstrap/pull/345
2023-03-01 17:13:16 -08:00
86e8484e0a Change bare-metal workers variable to optional
* To accompany the restructure of the bare-metal modules to
allow discrete workers to be defined and attached to a cluster
(#1295), the `workers` variable (older way, used for defining
homogeneous workers inline) should be optional and default
to an empty list
* Add docs covering inline vs discrete metal workers

Fix #1301
2023-03-01 14:37:47 -08:00
cf20e686c0 Bump mkdocs-material from 9.0.13 to 9.0.15
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.0.13 to 9.0.15.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.0.13...9.0.15)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-01 13:50:37 -08:00
420ddd2154 Bump mkdocs-material from 9.0.12 to 9.0.13
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.0.12 to 9.0.13.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.0.12...9.0.13)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-20 10:44:45 -08:00
435b3d4c88 Bump mkdocs-material from 9.0.11 to 9.0.12
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.0.11 to 9.0.12.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.0.11...9.0.12)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-15 09:44:54 -08:00
f3c327007d Update flannel from v0.20.2 to v0.21.1
* https://github.com/flannel-io/flannel/releases/tag/v0.21.1
2023-02-09 09:56:25 -08:00
406fb444f0 Update Cilium from v1.12.5 to v1.12.6
* https://github.com/cilium/cilium/releases/tag/v1.12.6
2023-02-09 09:45:40 -08:00
1caea3388c Restructure bare-metal module to use a worker submodule
* Add an internal `worker` module to the bare-metal module, to
allow individual bare-metal machines to be defined and joined
to an existing bare-metal cluster. This is similar to the "worker
pools" modules for adding sets of nodes to cloud (AWS, GCP, Azure)
clusters, but on metal, each piece of hardware is potentially
unique

New: Using the new `worker` module, a Kubernetes cluster can be defined
without any `workers` (i.e. just a control-plane). Use the `worker`
module to define each piece machine that should join the bare-metal
cluster and customize it in detail. This style is quite flexible and
suited for clusters with hardware that varies quite a bit.

```tf
module "mercury" {
  source = "git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes?ref=v1.26.2"

  # bare-metal
  cluster_name            = "mercury"
  matchbox_http_endpoint  = "http://matchbox.example.com"
  os_channel              = "flatcar-stable"
  os_version              = "2345.3.1"

  # configuration
  k8s_domain_name    = "node1.example.com"
  ssh_authorized_key = "ssh-rsa AAAAB3Nz..."

  # machines
  controllers = [{
    name   = "node1"
    mac    = "52:54:00:a1:9c:ae"
    domain = "node1.example.com"
  }]
}
```

```tf
module "mercury-node1" {
  source = "git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes/worker?ref=v1.26.2"

  cluster_name = "mercury"

  # bare-metal
  matchbox_http_endpoint  = "http://matchbox.example.com"
  os_channel              = "flatcar-stable"
  os_version              = "2345.3.1"

  # configuration
  name               = "node2"
  mac                = "52:54:00:b2:2f:86"
  domain             = "node2.example.com"
  kubeconfig         = module.mercury.kubeconfig
  ssh_authorized_key = "ssh-rsa AAAAB3Nz..."

  # optional
  snippets       = []
  node_labels    = []
  node_tains     = []
  install_disk   = "/dev/vda"
  cached_install = false
}
```

For clusters with fairly similar hardware, you may continue to
define `workers` directly within the cluster definition. This
reduces some repetition, but is not quite as flexible.

```tf
module "mercury" {
  source = "git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes?ref=v1.26.1"

  # bare-metal
  cluster_name            = "mercury"
  matchbox_http_endpoint  = "http://matchbox.example.com"
  os_channel              = "flatcar-stable"
  os_version              = "2345.3.1"

  # configuration
  k8s_domain_name    = "node1.example.com"
  ssh_authorized_key = "ssh-rsa AAAAB3Nz..."

  # machines
  controllers = [{
    name   = "node1"
    mac    = "52:54:00:a1:9c:ae"
    domain = "node1.example.com"
  }]
  workers = [
    {
      name   = "node2",
      mac    = "52:54:00:b2:2f:86"
      domain = "node2.example.com"
    },
    {
      name   = "node3",
      mac    = "52:54:00:c3:61:77"
      domain = "node3.example.com"
    }
  ]
}
```

Optional variables `snippets`, `worker_node_labels`, and
`worker_node_taints` are still defined as a map from machine name
to a list of snippets, labels, or taints respectively to allow some
degree of per-machine customization. However, fields like
`install_disk`, `kernel_args`, `cached_install` and future options
will not be designed this way. Instead, if your machines vary it
is recommended to use the new `worker` module to define each node
2023-02-09 08:29:28 -08:00
d04d88023d Bump mkdocs-material from 9.0.6 to 9.0.11
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.0.6 to 9.0.11.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.0.6...9.0.11)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-07 09:12:43 -08:00
a205922d06 Update Calico from v3.24.5 to v3.25.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/342
2023-01-24 08:29:08 -08:00
b5ba65d4c2 Update etcd from v3.5.6 to v3.5.7
* https://github.com/etcd-io/etcd/releases/tag/v3.5.7
2023-01-24 08:29:08 -08:00
e696fd2b22 Bump mkdocs-material from 9.0.5 to 9.0.6
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.0.5 to 9.0.6.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.0.5...9.0.6)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-23 09:08:20 -08:00
3ff9b792ca Bump pymdown-extensions from 9.9.1 to 9.9.2
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.9.1 to 9.9.2.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.9.1...9.9.2)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-23 09:02:30 -08:00
c4f1d2d1c8 Bump pymdown-extensions from 9.9 to 9.9.1
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.9 to 9.9.1.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.9...9.9.1)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-19 08:54:46 -08:00
a1d7b5cd1e Bump mkdocs-material from 9.0.3 to 9.0.5
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.0.3 to 9.0.5.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.0.3...9.0.5)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-19 08:51:56 -08:00
e7591030e0 Remove Twitter badge from README, we're on the Fediverse now 2023-01-19 08:43:49 -08:00
f2bf5ac3fb Update Kubernetes from v1.26.0 to v1.26.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1261
2023-01-19 08:27:56 -08:00
9cd1c5b17a Bump mkdocs-material from 9.0.0 to 9.0.3
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.0.0 to 9.0.3.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.0.0...9.0.3)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-11 20:48:11 -08:00
d6f739dedb Bump mkdocs-material from 8.5.11 to 9.0.0
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.5.11 to 9.0.0.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Upgrade guide](https://github.com/squidfunk/mkdocs-material/blob/master/docs/upgrade.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.5.11...9.0.0)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-02 22:29:06 -08:00
6bb7a36cf2 Bump pygments from 2.13.0 to 2.14.0
Bumps [pygments](https://github.com/pygments/pygments) from 2.13.0 to 2.14.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.13.0...2.14.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-02 22:20:59 -08:00
0afe9d65ed Update Cilium from v1.12.4 to v1.12.5
* https://github.com/cilium/cilium/releases/tag/v1.12.5
2022-12-21 08:13:35 -08:00
11e540000f Update CHANGES to reiterate Terraform Module Registry deprecation
* Terraform supports sourcing modules from either Git repos or from
their own hosted Terraform Module Registry, introduced a few years ago
* Typhoon docs have always shown using Git-based module sources, not
the Terraform Module Registry. For example, module usage should be
`source = "git::https://github.com/poseidon/typhoon/...` not
`source = poseidon/kubernetes/...`
* Typhoon published Flatcar Linux modules (CoreOS Container Linux at the time)
to Terraform Module Registry, but the approach has a number of drawbacks
for publishers and for users.
  * Terraform's Module Registry requires subtree mirroring Typhoon to special
  terraform-platform-kubernetes repos. This distorts Git history,
  requires special automation, and the registry's naming requirements
  don't allow us to publish our full matrix of modules (Fedora CoreOS
  and Flatcar Linux, across AWS, Azure, GCP, on-prem, and DigitalOcean)
  * Terraform's Module Registry only supports release versions (no commit SHAs
  or forks)
* Ultimately, the Terraform Module Registry limits user flexibility, has
tedious publishing constraints, and introduces centralization where the
current decentralized Git-based approach is simpler and more featureful

Note: This does not affect Terraform _Providers_ like `poseidon/matchbox`
or `poseidon/ct`. For Terraform providers, Terraform's centralized
platform eases provider plugin installation and provides value
2022-12-10 10:00:22 -08:00
d6cbcf9f96 Update Kubernetes from v1.26.0-rc.1 to v1.26.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1260
2022-12-08 08:47:24 -08:00
ce52a2cd35 Update Nginx Ingress and monitoring addon components
* Update ingress-nginx, Prometheus, node-exporter, and
  kube-state-metrics
2022-12-05 09:38:38 -08:00
bd9a908125 Bump mkdocs-material from 8.5.10 to 8.5.11
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.5.10 to 8.5.11.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.5.10...8.5.11)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-12-05 09:35:43 -08:00
0dc8740c77 Update Kubernetes from v1.26.0-rc.0 to v1.26.0-rc.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1260-rc1
2022-12-05 09:31:45 -08:00
a9b12b6bca Update Kubernetes from v1.25.4 to v1.26.0-rc.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1260-rc0
2022-11-30 08:47:40 -08:00
d419c58ab1 Add Equinix to the sponsors list
* Thank you Equinix!
2022-11-30 00:30:39 -08:00
da76d32aba Migrate AWS launch configurations to launch templates
* Same features, but AWS will soon require launch templates
* Starting Dec 31, 2022 AWS will not add new instance types
(e.g. graviton 4) to launch configuration support

Rel: https://aws.amazon.com/blogs/compute/amazon-ec2-auto-scaling-will-no-longer-add-support-for-new-ec2-features-to-launch-configurations/
2022-11-30 00:26:03 -08:00
f0e5982b3c Bump pymdown-extensions from 9.8 to 9.9
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.8 to 9.9.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.8...9.9)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-29 08:43:17 -08:00
a8990b3045 Fix flannel container image registry location
* https://github.com/poseidon/terraform-render-bootstrap/pull/336
2022-11-23 16:18:30 -08:00
f597f7cda3 Update Prometheus and Grafana addons 2022-11-23 11:06:03 -08:00
b4857c123e Update flannel from v0.15.1 to v0.20.1
* https://github.com/flannel-io/flannel/releases/tag/v0.20.1
2022-11-23 11:03:29 -08:00
50bffaae8f Update etcd from v3.5.5 to v3.5.6 in CHANGES.md 2022-11-23 11:01:24 -08:00
a193762eed Update etcd from v3.5.5 to v3.5.6
* https://github.com/etcd-io/etcd/releases/tag/v3.5.6
2022-11-23 10:59:17 -08:00
adf33df99b Update Cilium from v1.12.3 to v1.12.4
* https://github.com/cilium/cilium/releases/tag/v1.12.4
2022-11-23 10:58:27 -08:00
29a005b7b4 Update CHANGELOG links 2022-11-17 07:55:58 -08:00
ccebc2313d Bump mkdocs-material from 8.5.8 to 8.5.10
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.5.8 to 8.5.10.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.5.8...8.5.10)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-14 18:29:11 -08:00
1f86592d13 Bump pymdown-extensions from 9.7 to 9.8
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.7 to 9.8.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.7...9.8)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-14 18:26:34 -08:00
6a521257d0 Link to new Mastodon accounts
* @typhoon@fosstodon.org will announce Typhoon releases, like the
@typhoon8s Twitter account does today
* @poseidon@fosstodon.org will announce Poseidon Labs news and
general projects, like the @poseidonlabs Twitter account does today
2022-11-10 09:48:30 -08:00
26dbc7e91d Update Kubernetes from v1.25.3 to v1.25.4
* Update Calico from v3.24.3 to v3.24.5
* Update Prometheus and Grafana addons
2022-11-10 09:42:21 -08:00
de668e696a Bump mkdocs-material from 8.5.7 to 8.5.8
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.5.7 to 8.5.8.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.5.7...8.5.8)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-07 09:45:38 -08:00
d3b2217444 Bump mkdocs from 1.4.1 to 1.4.2
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.4.1 to 1.4.2.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.4.1...1.4.2)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-07 09:36:20 -08:00
937acc4b5a Re-enable Graceful Node Shutdown feature
* Kubelet GracefulNodeShutdown works, but only partially handles
gracefully stopping the Kubelet. The most noticeable drawback
is that Completed Pods are left around
* Use a project like poseidon/scuttle or a similar systemd unit
as a snippet to add drain and/or delete behaviors if desired
* This reverts commit 1786e34f33.

Rel:

* https://www.psdn.io/posts/kubelet-graceful-shutdown/
* https://github.com/poseidon/scuttle
2022-11-02 20:49:01 -07:00
b0a6dc8115 Bump mkdocs-material from 8.5.6 to 8.5.7
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.5.6 to 8.5.7.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.5.6...8.5.7)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-25 19:27:41 -07:00
420ff6ff04 Bump pymdown-extensions from 9.6 to 9.7
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.6 to 9.7.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.6...9.7)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-25 17:50:48 -07:00
9b733d79c7 Update Calico v3.24.2 to v3.24.3
* https://github.com/projectcalico/calico/releases/tag/v3.24.3
* Add patch to allow Kubelet kubeconfig to drain nodes if desired
in addition to just deleting them in shutdown integrations. See
https://github.com/poseidon/terraform-render-bootstrap/pull/330
2022-10-23 22:00:15 -07:00
35a9e22b1f Update Calico from v3.24.1 to v3.24.2
* https://github.com/projectcalico/calico/releases/tag/v3.24.2
2022-10-20 09:28:19 -07:00
0f38a6d405 Remove defunct delete-node.service from worker nodes
* delete-node.service used to be used to remove nodes from the
cluster on shutdown, but its long since it last worked properly
* If there is still a desire for this concept, it can be added
with a custom snippet and with a better systemd unit
2022-10-20 08:43:48 -07:00
a535581ef2 Remove unused Wants=network.target from etcd-member
* network.target is a passive unit that's not actually pulled
in by units requiring or wanting it, its only used for shutdown
ordering
> "Services using the network should ... avoid any Wants=network.target or even Requires=network.target"

Rel: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
2022-10-20 08:32:55 -07:00
08d13e7215 Improve release notes slightly with links 2022-10-20 08:30:30 -07:00
3ff2d38fa5 Update Cilium from v1.12.2 to v1.12.3
* https://github.com/cilium/cilium/releases/tag/v1.12.3
2022-10-17 17:25:23 -07:00
d6d8eb8d79 Bump mkdocs from 1.4.0 to 1.4.1
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.4.0...1.4.1)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-17 16:56:19 -07:00
f04e1d25a8 Add Flatcar Linux ARM64 support on Azure
* Kinvolk now publishes Flatcar Linux images for ARM64
* For now, amd64 image must specify a plan while arm64 images
must NOT specify a plan due to how Kinvolk publishes.

Rel: https://github.com/flatcar/Flatcar/issues/872
2022-10-17 08:36:57 -07:00
b68f8bb2a9 Switch Azure Fedora CoreOS default worker type
* Change default Azure worker_type from Standard_DS1_v2 to Standard_D2as_v5
  * Get 2 VCPU, 7 GiB, 12500Mbps (vs 1 VCPU, 3.5GiB, 750 Mbps)
  * Small increase in pay-as-you-go price ($53.29 -> $62.78)
  * Small increase in spot price ($5.64/mo -> $7.37/mo)
  * Change from Intel to AMD EPYC (`D2as_v5` cheaper than `D2s_v5`)

Rel:

* https://github.com/poseidon/typhoon/pull/1248
* https://learn.microsoft.com/en-us/azure/virtual-machines/dasv5-dadsv5-series#dasv5-series
* https://learn.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series#dsv2-series
2022-10-13 21:23:57 -07:00
651151805d Update Kubernetes v1.25.2 to v1.25.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1253
2022-10-13 21:02:39 -07:00
8d2c8b8db6 Switch to Flatcar Azure gen2 images and change worker type
* Switch from Azure Hypervisor generation 1 to generation 2
* Change default Azure `worker_type` from Standard_DS1_v2 to Standard_D2as_v5
  * Get 2 VCPU, 7 GiB, 12500Mbps (vs 1 VCPU, 3.5GiB, 750 Mbps)
  * Small increase in pay-as-you-go price ($53.29 -> $62.78)
  * Small increase in spot price ($5.64/mo -> $7.37/mo)
  * Change from Intel to AMD EPYC (`D2as_v5` cheaper than `D2s_v5`)

Notes: Azure makes you accept terms for each plan:

```
az vm image terms accept --publish kinvolk --offer flatcar-container-linux-free --plan stable-gen2
```

Rel:

* https://learn.microsoft.com/en-us/azure/virtual-machines/dasv5-dadsv5-series#dasv5-series
* https://learn.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series#dsv2-series
2022-10-13 09:57:52 -07:00
675ac63159 Remove note about not supporting ARM64 with Calico CNI
* Calico v3.22.0 introduced multi-arch container images so Typhoon's
ARM64 support has allowed choosing Calico CNI since Typhoon v1.23.5
2022-10-11 23:21:02 -07:00
b4c8b1729c Switch addons images from k8s.gcr.io to registry.k8s.io
* Switch addon manifests to use the new Kubernetes image registry

Rel:

* https://github.com/poseidon/typhoon/pull/1206
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#moved-container-registry-service-from-k8sgcrio-to-registryk8sio
2022-10-09 16:14:28 -07:00
e82241169a Update Prometheus from v2.38.0 to v2.39.1
* https://github.com/prometheus/prometheus/releases/tag/v2.39.1
2022-10-09 16:12:35 -07:00
ffe4929ff6 Bump mkdocs-material from 8.5.3 to 8.5.6
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.5.3 to 8.5.6.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.5.3...8.5.6)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-09 14:44:06 -07:00
88b3925318 Bump pymdown-extensions from 9.5 to 9.6
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.5 to 9.6.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.5...9.6)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-03 15:34:37 -07:00
29876dc85a Bump mkdocs from 1.3.1 to 1.4.0
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.3.1 to 1.4.0.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.3.1...1.4.0)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-03 14:49:24 -07:00
7e29e35457 Bump mkdocs-material from 8.5.2 to 8.5.3
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.5.2 to 8.5.3.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.5.2...8.5.3)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-28 08:57:03 -07:00
3ee462a24c Update Kubernetes from v1.25.1 to v1.25.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1252
2022-09-22 08:15:30 -07:00
f833b7205d Sync recommended Terraform providers in docs 2022-09-20 08:30:15 -07:00
558e293f78 Update Nginx Ingress and Grafana addons 2022-09-20 08:28:30 -07:00
90782ea820 Remove workaround for preventing search . propagation
* Kubelet v1.25.1 has the fix https://github.com/kubernetes/kubernetes/pull/112157
2022-09-19 22:37:02 -07:00
8dc7cc614c Bump mkdocs-material from 8.4.4 to 8.5.2
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.4.4 to 8.5.2.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.4.4...8.5.2)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-19 22:16:32 -07:00
74d4d56dbd Remove workaround for v1.25.0 ConfigMap rendering issue
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: https://github.com/kubernetes/kubernetes/issues/112081
2022-09-19 09:10:24 -07:00
5abe84b520 Update etcd from v3.5.4 to v3.5.5
* https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-3.5.md#v355
2022-09-15 09:01:45 -07:00
951209d113 Update Cilium from v1.12.1 to v1.12.2
* https://github.com/cilium/cilium/releases/tag/v1.12.2
2022-09-15 08:28:37 -07:00
09751cc0e8 Update Kubernetes from v1.25.0 to v1.25.1
* https://github.com/kubernetes/kubernetes/releases/tag/v1.25.1
2022-09-15 08:23:22 -07:00
c14300f0be Update Calico from v3.23.3 to v3.24.1
* https://github.com/projectcalico/calico/releases/tag/v3.24.1
2022-09-14 08:09:38 -07:00
37de9ca2ae Bump mkdocs-material from 8.4.2 to 8.4.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.4.2 to 8.4.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.4.2...8.4.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-14 07:42:59 -07:00
1786e34f33 Revert Graceful Node Shutdown feature
* Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in
Kubernetes v1.25.0 https://github.com/poseidon/typhoon/pull/1222)
* Graceful node shutdown shutdown allows 30s for critical pods to
shutdown and 15s for regular pods to shutdown before releasing the
inhibitor lock to allow the host to shutdown
* Unfortunately, both pods and the node are shutdown at the same
time at the end of the 45s period without further configuration
options. As a result, regular pods and the node are shutdown at the
same time. In practice, enabling this feature leaves Error or Completed
pods in kube-apiserver state until manually cleaned up. This feature
is not ready for general use
* Fix issue where Error/Completed pods are accumulating whenever any
node restarts (or auto-updates), visible in kubectl get pods
* This issue wasn't apparent in initial testing and seems to only
affect non-critical pods (due to critical pods being killed earlier)
But its very apparent on our real clusters

Rel: https://github.com/kubernetes/kubernetes/issues/110755
2022-09-10 14:58:44 -07:00
5f612c82e2 Update kube-state-metrics and Grafana addons 2022-09-01 08:58:32 -07:00
e60a321185 Sync Terraform providers shown in docs 2022-09-01 08:07:15 -07:00
5ad74883fe Bump mkdocs-material from 8.4.1 to 8.4.2
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.4.1 to 8.4.2.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.4.1...8.4.2)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-01 08:06:34 -07:00
4ad473cd3c Add workaround patch to strip "search ." from resolv.conf
* systemd adds "search ." to hosts /run/systemd/resolve/resolv.conf
on hosts with a fqdn hostname
* Kubelet v1.25 began propagating "search ." from the host node
into containers' `/etc/resolv.conf`
* musl-based DNS resolvers don't behave correctly when `search .`
is used in their `/etc/resolv.conf`. This breaks Alpine images
* Adapt the same workaround used by Openshift to strip the "search ."
* This only applies to bare-metal Typhoon nodes (where hostnames are
set to fqdn's), nodes on cloud platforms aren't affected in the
Typhoon configuration

Kubernetes tracking issue: https://github.com/kubernetes/kubernetes/issues/112135

Rel:

* https://github.com/systemd/systemd/pull/17201
* https://github.com/kubernetes/kubernetes/pull/109441
* https://github.com/coreos/fedora-coreos-tracker/issues/1287
* https://github.com/openshift/okd-machine-os/pull/159
2022-08-31 08:05:45 -07:00
393a38deff Configure Graceful Node Shutdown and lengthen max inhibitor delay
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* https://github.com/kubernetes/kubernetes/issues/107043
* https://github.com/coreos/fedora-coreos-tracker/issues/821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
2022-08-28 10:37:33 -07:00
76d92e9c2d Change podman log-driver from journald to k8s-file
* When podman runs the Kubelet container, logging to journald means
log lines are duplicated in the journal. journalctl -u kubelet shows
Kubelet's logs and the same log messages from podman. Using the
k8s-file driver alleviates this problem
* Fix Kubelet and etcd-member logs to be more readable and reduce
unneccessary Kubelet log volume
2022-08-27 17:15:22 -07:00
275fc0f9e8 Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature
* Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring
feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in
Pods, as shown by conformance tests
* Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back
to alpha so its not enabled by default, but that will require a release
* Disable the feature gate directly as a workaround for now to make
Kubernetes v1.25.0 usable

```
FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478
```

Rel:

* https://github.com/kubernetes/kubernetes/pull/112076
* https://github.com/kubernetes/kubernetes/pull/107329
2022-08-27 09:49:35 -07:00
3fb59a3289 Migrate most Kubelet flags to KubeletConfiguration file
* Add a KubeletConfiguration file to replace most Kubelet
flags, to prepare for upcoming changes
* Pass Kubelet the --config flag to specify the location of
the KubeletConfiguration
* Remove flsgs / configuration where it matches the defaults
  * Remove --cgroups-per-qos, defaults to true
  * Remove --container-runtime, defaults to remote
  * Remove enforce-node-allocatable=pods, defaults to pods

Rel:

* https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
2022-08-27 09:28:15 -07:00
a31dbceac6 Update Kubernetes from v1.24.4 to v1.25.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md
2022-08-25 09:18:14 -07:00
1dcf56127b Bump mkdocs-material from 8.4.0 to 8.4.1
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.4.0 to 8.4.1.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.4.0...8.4.1)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-08-23 08:53:12 -07:00
bf06412dfd Update Prometheus and Grafana addons 2022-08-21 08:56:00 -07:00
505818b7d5 Update docs showing the terraform plan resources count
* Although I don't plan to keep these in sync, some users are
confused when the docs don't match the actual resource count
2022-08-21 08:52:35 -07:00
0d27811265 Update recommended Terraform provider versions 2022-08-18 09:08:55 -07:00
c13d060b38 Add docs for GCP MIG update and AWS instance refresh
* Document that worker instances are rolling replaced when
changes to their configuration are applied
2022-08-18 09:02:38 -07:00
e87d5aabc3 Adjust Google Cloud worker health checks to use kube-proxy healthz
* Change the workers managed instance group to health check nodes
via HTTP probe of the kube-proxy port 10256 /healthz endpoints
* Advantages: kube-proxy is a lower value target (in case there
were bugs in firewalls) that Kubelet, its more representative than
health checking Kubelet (Kubelet must run AND kube-proxy Daemonset
must be healthy), and its already used by kube-proxy liveness probes
(better discoverability via kubectl or alerts on pods crashlooping)
* Another motivator is that GKE clusters also use kube-proxy port
10256 checks to assess node health
2022-08-17 20:50:52 -07:00
760b4cd5ee Update Kubernetes from v1.24.3 to v1.24.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244
2022-08-17 20:09:30 -07:00
fcd8ff2b17 Update Cilium from v1.12.0 to v1.12.1
* https://github.com/cilium/cilium/releases/tag/v1.12.1
2022-08-17 08:53:56 -07:00
ef2d2af0c7 Bump mkdocs-material from 8.3.9 to 8.4.0
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.3.9 to 8.4.0.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.3.9...8.4.0)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-08-16 08:29:51 -07:00
8e2027ed2d Bump pygments from 2.12.0 to 2.13.0
Bumps [pygments](https://github.com/pygments/pygments) from 2.12.0 to 2.13.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.12.0...2.13.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-08-16 08:26:45 -07:00
52427a4271 Refresh instances in autoscaling group when launch configuration changes
* Changes to worker launch configurations start an autoscaling group instance
refresh to replace instances
* Instance refresh creates surge instances, waits for a warm-up period, then
deletes old instances
* Changing worker_type, disk_*, worker_price, worker_target_groups, or Butane
worker_snippets on existing worker nodes will replace instances
* New AMIs or changing `os_stream` will be ignored, to allow Fedora CoreOS or
Flatcar Linux to keep themselves updated
* Previously, new launch configurations were made in the same way, but not
applied to instances unless manually replaced
2022-08-14 21:43:49 -07:00
20b76d6e00 Roll instance template changes to worker managed instance groups
* When a worker managed instance group's (MIG) instance template
changes (including machine type, disk size, or Butane snippets
but excluding new AMIs), use Google Cloud's rolling update features
to ensure instances match declared state
* Ignore new AMIs since Fedora CoreOS and Flatcar Linux nodes
already auto-update and reboot themselves
* Rolling updates will create surge instances, wait for health
checks, then delete old instances (0 unavilable instances)
* Instances are replaced to ensure new Ignition/Butane snippets
are respected
* Add managed instance group autohealing (i.e. health checks) to
ensure new instances' Kubelet is running

Renames

* Name apiserver and kubelet health checks consistently
* Rename MIG from `${var.name}-worker-group` to `${var.name}-worker`

Rel: https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups
2022-08-14 13:06:53 -07:00
6facfca4ed Switch Kubernetes image registry from k8s.gcr.io to registry.k8s.io
* Announce: https://groups.google.com/g/kubernetes-sig-testing/c/U7b_im9vRrM

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/319
2022-08-13 16:16:21 -07:00
ed8c6a5aeb Upgrade CoreDNS from v1.8.5 to v1.9.3
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/318
2022-08-13 15:43:03 -07:00
003af72cc8 Rename google-cloud/fedora-coreos/kubernetes/workers fcc to butane
* Should have been part of https://github.com/poseidon/typhoon/pull/1203
2022-08-13 15:40:16 -07:00
b321b90a4f Update Grafana from v9.0.6 to v9.0.7 2022-08-13 15:39:44 -07:00
e5d0e2d48b Rename Fedora CoreOS fcc directory to butane
* Align both Fedora CoreOS and Flatcar Linux keeping Butane
Configs in a directory called butane
2022-08-10 09:10:18 -07:00
679f8b878f Update Grafana from v9.0.5 to v9.0.6 2022-08-10 08:23:04 -07:00
87a8278c9d Improve AWS autoscaling group and launch config names
* Rename launch configuration to use a name_prefix named after the
cluster and worker to improve identifiability
* Shorten AWS autoscaling group name to not include the launch config
id. Years ago this used to be needed to update the ASG but the AWS
provider detects changes to the launch configuration just fine
2022-08-08 20:46:08 -07:00
93b7f2554e Remove ineffective iptables-legacy.stamp
* Typhoon Fedora CoreOS is already using iptables nf_tables since
F36. The file to pin to legacy iptables was renamed to
/etc/coreos/iptables-legacy.stamp
2022-08-08 20:27:21 -07:00
62d47ad3f0 Update Cilium from v1.11.7 to v1.12.0
* https://github.com/cilium/cilium/releases/tag/v1.12.0
2022-08-08 19:59:03 -07:00
6eb7861f96 Update Grafana liveness and readiness probes
* Use the liveness and readiness probes that Grafana recommends
* Update Grafana from v9.0.3 to v9.0.5
2022-08-08 09:22:44 -07:00
ffbacbccf7 Update node-exporter DaemonSet to fix permission denied
* Add toleration to run node-exporter on controller nodes
* Add HostToContainer mount propagation and security context group
settings from upstream
* Fix SELinux denied accessing /host/proc/1/mounts. The mounts file
is has an SELinux type attribute init_t, but that won't allow running
the node-exporter binary so we have to use spc_t. This should be more
targeted at just the SELinux issue than making the Pod privileged
* Remove excluded mount points and filesystem types, the defaults are
https://github.com/prometheus/node_exporter/blob/v1.3.1/collector/filesystem_linux.go#L35

```
caller=collector.go:169 level=error msg="collector failed" name=filesystem duration_seconds=0.000666766 err="open /host/proc/1/mounts: permission denied"
```

```
[ 3664.880899] audit: type=1400 audit(1659639161.568:4400): avc:  denied  { search } for  pid=28325 comm="node_exporter" name="1" dev="proc" ino=22542 scontext=system_u:system_r:container_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```
2022-08-08 09:19:46 -07:00
16c2785878 Update docs on using Butane snippets for customization
* Typhoon now consistently uses Butane Configs for snippets
(variant `fcos` or `flatcar`). Previously snippets were either
Butane Configs (on FCOS) or Container Linux Configs (on Flatcar)
* Update docs on uploading Flatcar Linux DigitalOcean images
* Update docs on uploading Fedora CoreOS Azure images
2022-08-03 20:28:53 -07:00
4a469513dd Migrate Flatcar Linux from Ignition spec v2.3.0 to v3.3.0
* Requires poseidon v0.11+ and Flatcar Linux 3185.0.0+ (action required)
* Previously, Flatcar Linux configs have been parsed as Container
Linux Configs to Ignition v2.2.0 specs by poseidon/ct
* Flatcar Linux starting in 3185.0.0 now supports Ignition v3.x specs
(which are rendered from Butane Configs, like Fedora CoreOS)
* poseidon/ct v0.11.0 adds support for the flatcar Butane Config
variant so that Flatcar Linux can use Ignition v3.x

Rel:

* [Flatcar Support](https://flatcar-linux.org/docs/latest/provisioning/ignition/specification/#ignition-v3)
* [poseidon/ct support](https://github.com/poseidon/terraform-provider-ct/pull/131)
2022-08-03 08:32:52 -07:00
47d8431fe0 Fix bug provisioning multi-controller clusters on Google Cloud
* Google Cloud Terraform provider resource google_dns_record_set's
name field provides the full domain name with a trailing ".". This
isn't a new behavior, Google has behaved this way as long as I can
remember
* etcd domain names are passed to the bootstrap module to generate
TLS certificates. What seems to be new(ish?) is that etcd peers
see example.foo and example.foo. as different domains during TLS
SANs validation. As a result, clusters with multiple controller
nodes fail to run etcd-member, which manifests as cluster provisioning
hanging. Single controller/master clusters (default) are unaffected
* Fix etcd-member.service error in multi-controller clusters:

```
"error":"x509: certificate is valid for conformance-etcd0.redacted.,
conform-etcd1.redacted., conform-etcd2.redacted., not conform-etcd1.redacted"}
```
2022-08-02 20:21:02 -07:00
256b87812e Remove Terraform template provider dependency
* Use Terraform builtin templatefile functionality
* Remove dependency on deprecated Terraform template provider

Rel:

* https://registry.terraform.io/providers/hashicorp/template/2.2.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/293
2022-08-02 18:15:03 -07:00
ca6eef365f Add badges to README 2022-07-31 18:03:09 -07:00
c6794f1007 Update Calico from v3.23.1 to v3.23.3
* https://github.com/projectcalico/calico/releases/tag/v3.23.3
2022-07-30 18:15:33 -07:00
de6f27e119 Update FCOS iPXE initrd and kernel arg settings
* Add initrd=main kernel argument for UEFI
* Switch to using the coreos.live.rootfs_url kernel argument
instead of passing the rootfs as an appended initrd
* Remove coreos.inst.image_url kernel argument since coreos-installer
now defaults to installing from the embedded live system
* Remove rd.neednet=1 and dhcp=ip kernel args that aren't needed
* Remove serial console kernel args by default (these can be
added via var.kernel_args if needed)

Rel:
* https://github.com/poseidon/matchbox/pull/972 (thank you @bgilbert)
* https://github.com/poseidon/matchbox/pull/978
2022-07-30 16:27:08 -07:00
6a9c32d3a9 Migrate from internal hosting to GitHub pages
* Add Twitter card customizations that have been kept in
an internal fork
* Add CNAME needed for GitHub pages
2022-07-27 21:56:42 -07:00
a7e9e423f5 Bump mkdocs from 1.3.0 to 1.3.1
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.3.0 to 1.3.1.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.3.0...1.3.1)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-21 09:07:21 -07:00
83236eab57 Add table of details about static Pods
* Also remote outdated mentions of rkt-fly
2022-07-21 09:03:27 -07:00
7f445b0dba Add release note about master to main branch rename
* Update Terraform provider versions
2022-07-19 18:12:37 -07:00
f42b45451b Update Cilium from v1.11.6 to v1.11.7
* https://github.com/cilium/cilium/releases/tag/v1.11.7
2022-07-19 09:06:15 -07:00
767a653baa Update Prometheus, Grafana, and ingress-nginx addons
* Update ingress-nginx RBAC Role to include coordination.k8s.io leases
permissions that are required with ingress-nginx v1.3.0
2022-07-15 20:19:12 -07:00
0db5f86110 Update Kubernetes from v1.24.2 to v1.24.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1243
2022-07-13 20:59:15 -07:00
4908fdd247 Bump mkdocs-material from 8.3.8 to 8.3.9
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.3.8 to 8.3.9.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.3.8...8.3.9)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-05 17:54:48 -07:00
42bf82b325 Update Prometheus and Grafana addons
* Bump recommended Terraform provider versions
2022-07-02 11:28:34 -07:00
61cbfc044d Bump mkdocs-material from 8.3.6 to 8.3.8
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.3.6 to 8.3.8.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.3.6...8.3.8)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-29 08:11:42 -07:00
07df0c2552 Add warning about Terraform AWS provider version
* Sync Terraform provider versions with those used internally
2022-06-23 21:31:20 -07:00
45d6ff2e38 Bump mkdocs-material from 8.3.4 to 8.3.6
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.3.4 to 8.3.6.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.3.4...8.3.6)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-20 11:46:24 -07:00
8398182956 Update Cilium and Calico CNI providers
* Update Cilium from v1.11.5 to v1.11.6
* Update Calico from v3.22.2 to v3.23.1
2022-06-18 19:29:01 -07:00
6d6b48b201 Update Kubernetes from v1.24.1 to v1.24.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1242
2022-06-18 18:35:42 -07:00
2a8915fee9 Update Prometheus, kube-state-metrics, and Grafana addons
* Update monitoring addons
2022-06-18 18:32:17 -07:00
337b1eef3a Bump mkdocs-material from 8.3.2 to 8.3.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.3.2 to 8.3.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.3.2...8.3.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-15 22:01:42 -07:00
fe28bd0783 Bump pymdown-extensions from 9.3 to 9.5
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.3 to 9.5.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.3...9.5)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-07 08:56:22 -07:00
5e2f9a5c44 Bump mkdocs-material from 8.2.16 to 8.3.2
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.2.16 to 8.3.2.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.2.16...8.3.2)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-07 08:52:40 -07:00
31c7f0ba0e Update nginx-ingress addon from v1.2.0 to v1.2.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.2.1
2022-05-31 16:37:57 +01:00
b8549a1e32 Update Cilium from v1.11.4 to v1.11.5
* https://github.com/poseidon/terraform-render-bootstrap/pull/309
2022-05-31 15:23:07 +01:00
8e8bf305c3 Update Prometheus and Grafana addons 2022-05-31 14:29:55 +01:00
a447494ccd Bump mkdocs-material from 8.2.15 to 8.2.16
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.2.15 to 8.2.16.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.2.15...8.2.16)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-05-31 10:30:34 +01:00
c5573199db Update Kubernetes from v1.24.0 to v1.24.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1241
2022-05-28 09:39:14 +01:00
0be171cde7 Bump mkdocs-material from 8.2.14 to 8.2.15
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.2.14 to 8.2.15.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.2.14...8.2.15)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-05-27 10:02:31 +01:00
e3b1e6c52e Bump mkdocs-material from 8.2.13 to 8.2.14
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.2.13 to 8.2.14.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.2.13...8.2.14)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-05-09 18:48:45 -07:00
b0e0b132e4 Update Kubernetes from v1.23.6 to v1.24.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1240
2022-05-04 08:27:14 -07:00
4fba09e8f8 Bump mkdocs-material from 8.2.11 to 8.2.13
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.2.11 to 8.2.13.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.2.11...8.2.13)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-05-03 07:42:29 -07:00
02f78fbd1a Update Grafana from v8.4.5 to v8.5.1 2022-05-02 08:19:41 -07:00
a122867748 Update nginx-ingress, Prometheus, and Grafana addons
* Sync addons with versions used in Poseidon
2022-04-27 21:02:32 -07:00
91b38bf3fd Update etcd from v3.5.2 to v3.5.4
* https://github.com/etcd-io/etcd/releases/tag/v3.5.4
2022-04-27 20:57:02 -07:00
9a4887d028 Add bind mounts for selinux to fcos kubelets
fixes #1123

Enables the use of CSI drivers with a StorageClass that lacks an explicit context mount option. In cases where the kubelet lacks mounts for `/etc/selinux` and `/sys/fs/selinux`, it is unable to set the `:Z` option for the CRI volume definition automatically. See [KEP 1710](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1710-selinux-relabeling/README.md#volume-mounting) for more information on how SELinux is passed to the CRI by Kubelet.

Prior to this change, a not-explicitly-labelled mount would have an `unlabeled_t` SELinux type on the host. Following this change, the Kubelet and CRI work together to dynamically relabel mounts that lack an explicit context specification every time it is rebound to a pod with SELinux type `container_file_t` and appropriate context labels to match the specifics for the pod it is bound to. This enables applications running in containers to consume dynamically provisioned storage on SELinux enforcing systems without explicitly setting the context on the StorageClass or PersistentVolume.
2022-04-26 21:33:26 -07:00
35bca6df90 Bump mkdocs-material from 8.2.9 to 8.2.11
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.2.9 to 8.2.11.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.2.9...8.2.11)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-04-25 19:02:15 -07:00
d7f55c4e46 Remove use of deprecated key_algorithm field in TLS assets
* Fixes warning about use of deprecated field `key_algorithm` in
the `hashicorp/tls` provider. The key algorithm can now be inferred
directly from the private key so resources don't have to output
and pass around the algorithm
2022-04-20 19:52:03 -07:00
80c6e2e7e6 Update Kubernetes from v1.23.5 to v1.23.6
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1236
2022-04-20 19:39:05 -07:00
fddd8ac69d Fix Flatcar Linux nodes on Google Cloud not ignoring image changes
* Add `boot_disk[0].initialize_params` to the ignored fields for the
controller nodes
* Nodes will auto-update, Terraform should not attempt to delete and
recreate nodes (especially controllers!). Lack of this ignore causes
Terraform to propose deleting controller nodes when Flatcar Linux
releases new images
* Matches the configuration on Typhoon Fedora CoreOS (which does not
have the issue)
2022-04-20 18:53:00 -07:00
2f7d2a92e0 Update Cilium and Calico CNI providers
* Update Cilium from v1.11.3 to v1.11.4
* Update Calico from v3.22.1 to v3.22.2
2022-04-19 08:28:52 -07:00
6cd6bb38de Bump mkdocs-material from 8.2.8 to 8.2.9
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.2.8 to 8.2.9.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.2.8...8.2.9)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-04-12 07:53:43 -07:00
d91408258b Update nginx-ingress, Prometheus, and Grafana addons 2022-04-04 08:53:29 -07:00
2df1873b7f Update Cilium from v1.11.2 to v1.11.3
* https://github.com/cilium/cilium/releases/tag/v1.11.3
2022-04-01 16:44:30 -07:00
93ebfc7dd0 Allow upgrading Azure Terraform Provider to v3.x
* Change subnet references to source and destinations prefixes
(plural)
* Remove references to a resource group in some load balancing
components, which no longer require it (inferred)
* Rename `worker_address_prefix` output to `worker_address_prefixes`
2022-04-01 16:36:53 -07:00
5365ce8204 Mount /etc/machine-id from host into Kubelet
* Kubelet node's System UUID can be detected from the sysfs
filesystem without a host mount, but if you need to distinguish
between the host's machine-id and SystemUUID
* On cloud platforms, MachineID and SystemUUID are identical,
but on bare-metal the two differ
2022-04-01 16:32:06 -07:00
2ad33cebaf Bump mkdocs-material from 8.2.5 to 8.2.8
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.2.5 to 8.2.8.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.2.5...8.2.8)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-28 10:20:10 -07:00
a26abcf5b1 Bump mkdocs from 1.2.3 to 1.3.0
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.2.3 to 1.3.0.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.2.3...1.3.0)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-28 10:07:34 -07:00
b8c4629548 Bump pymdown-extensions from 9.2 to 9.3
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.2 to 9.3.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.2...9.3)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-21 10:35:37 -07:00
c5814308ab Refresh Terraform providers shown in docs
* Update a few OS component details
2022-03-19 19:30:43 -07:00
b47edca6be Refresh Prometheus rules and Grafana dashboards
* Update Prometheus rules and Grafana dashboards
* Add new networking dashboards
2022-03-19 17:08:00 -07:00
e61d4b92da Update Kubernetes from v1.23.4 to v1.23.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1235
2022-03-16 21:01:41 -07:00
dca745fa4a Update monitoring addon components
* Update Prometheus, kube-state-metrics, and Grafana
2022-03-11 11:50:16 -08:00
661347fa71 Update nginx-ingress from v1.1.1 to v1.1.2
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.1.2
2022-03-11 11:42:33 -08:00
69770b4827 Update Calico from v3.21.2 to v3.22.1
* https://github.com/projectcalico/calico/releases/tag/v3.22.1
* Fix https://github.com/projectcalico/calico/issues/5011
2022-03-11 11:22:29 -08:00
f797f97675 Update Cilium from v1.11.1 to v1.11.2
* https://github.com/cilium/cilium/releases/tag/v1.11.2
2022-03-11 10:08:24 -08:00
9fe0f2fa6c Bump mkdocs-material from 8.2.3 to 8.2.5
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.2.3 to 8.2.5.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.2.3...8.2.5)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-11 09:57:31 -08:00
268648c146 Bump mkdocs-material from 8.2.1 to 8.2.3
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.2.1 to 8.2.3.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.2.1...8.2.3)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-28 09:36:48 -08:00
6cf40722de Revert kube-state-metrics upgrade
* kube-state-metrics:v2.4.0 isn't published, skip it
2022-02-21 19:57:47 -08:00
c230cdec46 Update Grafana and kube-state-metrics addons 2022-02-21 19:36:16 -08:00
cabf5b2c34 Update recommended Terraform provider versions
* Update poseidon/ct version from v0.9.1 to v0.10.0
* Update aws provider to v4.x series
2022-02-21 19:27:54 -08:00
ba8a951863 Bump mkdocs-material from 8.1.11 to 8.2.1
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.1.11 to 8.2.1.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.1.11...8.2.1)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-21 09:53:27 -08:00
9aa99f1996 Allow upgrading AWS Terraform provider to v4.x
* https://github.com/hashicorp/terraform-provider-aws/releases/tag/v4.0.0
2022-02-17 09:35:15 -08:00
fc38ba45b1 Update Kubernetes from v1.23.3 to v1.23.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1234
2022-02-17 09:00:31 -08:00
28a42238c4 Update nginx-ingress, Prometheus, and Grafana addons
* Align `nginx-ingress` `--controller-class` with `IngressClass`
to provide a better example (e.g. if extended to multiple ingress
controllers)
2022-02-17 08:58:29 -08:00
de9b30a587 Bump mkdocs-material from 8.1.10 to 8.1.11
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.1.10 to 8.1.11.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.1.10...8.1.11)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-14 11:11:06 -08:00
affb40d59b Bump pymdown-extensions from 9.1 to 9.2
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.1 to 9.2.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.1...9.2)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-14 11:10:56 -08:00
15ac49b34d Bump mkdocs-material from 8.1.9 to 8.1.10
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.1.9 to 8.1.10.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.1.9...8.1.10)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-07 09:49:24 -08:00
6c70d06937 Update etcd from v3.5.1 to v3.5.2
* https://github.com/etcd-io/etcd/releases/tag/v3.5.2
2022-02-07 08:10:17 -08:00
cf4beeba34 Change default CNI provider from Calico to Cilium
* Cilium (v1.8) was added to Typhoon in v1.18.5 in June 2020
and its become more impressive since then. Its currently the
leading CNI provider choice.
* Calico has grown complex, has lots of CRDs, masks its
management complexity with an operator (which we won't use),
doesn't provide multi-arch images, and hasn't been compatible
with Kubernetes v1.23 (with ipvs) for several releases.
* Both have CNCF conformance quirks (flannel used for conformance),
but that's not the main factor in choosing the default
2022-02-07 08:07:00 -08:00
10b4ba14b6 Bump mkdocs-material from 8.1.8 to 8.1.9
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.1.8 to 8.1.9.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.1.8...8.1.9)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-01 10:26:39 -08:00
e06ee042ee Switch to using Flatcar Linux images on Google Cloud
* Use the official Kinvolk Flatcar Linux image on Google Cloud
* Change `os_image` from a custom image name to `flatcar-stable`
(default), `flatcar-beta`, or `flatcar-alpha` (**action required**)
* Change `os_image` from a required to an optional variable
* Promote Typhoon on Flatcar Linux / Google Cloud to stable
* Remove docs about needing to upload a Flatcar Linux image
manually on Google Cloud and drop support for custom images
2022-01-28 21:04:10 -08:00
a527f73f5a Update Kubernetes from v1.23.2 to v1.23.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1233
2022-01-27 09:23:37 -08:00
c21a0479c0 Bump mkdocs-material from 8.1.7 to 8.1.8
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.1.7 to 8.1.8.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.1.7...8.1.8)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-27 09:02:30 -08:00
f614c538cf Update Terraform provider recommendations in docs 2022-01-19 21:16:37 -08:00
3da8c1575c Update nginx-ingress and Grafana addons 2022-01-19 21:09:21 -08:00
dedd17d085 Upgrade to DigitalOcean Terraform provider v2.x
* Remove deprecated `private_networking` parameter
2022-01-19 18:32:17 -08:00
e274a451ff Update Kubernetes from v1.23.1 to v1.23.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1232
2022-01-19 17:59:49 -08:00
b2e36947ab Bump mkdocs-material from 8.1.5 to 8.1.7
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.1.5 to 8.1.7.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.1.5...8.1.7)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-19 16:42:21 -08:00
5af0a5c5b9 Add Flatcar Linux ARM64 examples
* Fix content tabs format for switching between example
code blocks
2022-01-14 12:52:45 -08:00
2265ab5375 Remove Kubelet --network-plugin=cni flag
* Now that `docker-shim` is no longer used, the Kubelet flag
is no longer needed and will be removed in v1.24
2022-01-14 10:43:07 -08:00
08ea9776f3 Mask docker.service to prevent socket activation
* Kubelet now uses `containerd` as the container runtime, but
`docker.service` still starts when `docker.sock` is probed bc
the service is socket activated. Prevent this by masking the
`docker.service` unit
2022-01-14 10:31:47 -08:00
2e8bc99164 Remove template provider usage from terraform-render-bootstrap 2022-01-14 10:27:24 -08:00
b18b0a9f3d Remove unused ETCD_UNSUPPORTED_ARCH variable
* etcd used to require a special variable to use the arm64
container image, but this is no longer required
2022-01-14 10:25:45 -08:00
beb9f1477a Add experimental Flatcar Linux arm64 support on AWS
* Add `arch` variable to Flatcar Linux AWS `kubernetes` and
`workers` modules. Accept `amd64` (default) or `arm64` to support
native arm64/aarch64 clusters or mixed/hybrid clusters with arm64
workers
* Requires `flannel` or `cilium` CNI

Similar to https://github.com/poseidon/typhoon/pull/875
2022-01-14 10:24:48 -08:00
f544a9c71f Switch Fedora CoreOS from docker-shim to containerd
* Migrate from `docker-shim` to `containerd` in preparation
for Kubernetes v1.24.0 dropping `docker-shim` support
* Much consideration was given to the container runtime
choice. https://github.com/poseidon/typhoon/issues/899
provides relevant rationales
2022-01-13 09:17:29 -08:00
415b7fa19a Bump pygments from 2.11.1 to 2.11.2
Bumps [pygments](https://github.com/pygments/pygments) from 2.11.1 to 2.11.2.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.11.1...2.11.2)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-13 09:03:25 -08:00
d0c29099ba Bump mkdocs-material from 8.1.4 to 8.1.5
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.1.4 to 8.1.5.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.1.4...8.1.5)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-11 20:42:31 -08:00
30e4070474 Bump mkdocs-material from 8.1.3 to 8.1.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.1.3 to 8.1.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.1.3...8.1.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-03 10:53:23 -08:00
43f6a19060 Bump pygments from 2.10.0 to 2.11.1
Bumps [pygments](https://github.com/pygments/pygments) from 2.10.0 to 2.11.1.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.10.0...2.11.1)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-03 10:48:25 -08:00
50215e373b Add Prometheus config for monitoring Kubernetes Ingress
* Allow Kubernetes Ingress resources to be probed via Blackbox
Exporter (if present) if annotated `prometheus.io/probe: "true"`
* Fix probes of Services via Blackbox Exporter. Require Blackbox
Exporter to be deployed in the same `monitoring` namespace, be
named `blackbox-exporter`, and use port 8080
2021-12-29 11:57:50 -08:00
a9f9c59b91 Configure Prometheus to allow a custom scrape query param
* Set `prometheus.io/param` on a Kubernetes Service to scrape
the service endpoints and pass a custom query parameter
* For example, scrape Consul with `?format=prometheus`

```yaml
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8500'
    prometheus.io/path: /v1/agent/metrics
    prometheus.io/param: format=prometheus
```
2021-12-29 11:47:10 -08:00
6ed048eb65 Workaround Terraform v1.1 file provisioner regression
* Terraform v1.1 changed the behavior of provisioners and
`remote-exec` in a way that breaks support for expansions
in commands (including file provisioner, where `destination`
is part of an `scp` command)
* Terraform will likely revert the change eventually, but I
suspect it will take a while
* Instead, we can stop relying on Terraform's expansion
behavior. `/home/core` is a suitable choice for `$HOME` on
both Flatcar Linux and Fedora CoreOS (harldink `/var/home/core`)

Rel: https://github.com/hashicorp/terraform/issues/30243
2021-12-28 13:25:23 -08:00
ce7b2fa21f Bump mkdocs-material from 8.1.1 to 8.1.3
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.1.1 to 8.1.3.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.1.1...8.1.3)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-23 14:33:26 -08:00
9e3807798f Update Kubernetes from v1.23.0 to v1.23.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1231
2021-12-20 08:36:19 -08:00
ef9c6aa423 Switch Flatcar Linux to using containerd CRI
* Use containerd as the Kubernetes Container Runtime
2021-12-15 08:42:13 -08:00
bb5e5811ec Update Prometheus and Grafana addons 2021-12-15 08:16:46 -08:00
16aa997604 Fix Azure backend_address_pool_id deprecation warning
* Change to `backend_address_pool_ids` list
2021-12-14 10:26:08 -08:00
fb6650b06b Bump mkdocs-material from 8.0.4 to 8.1.1
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.0.4 to 8.1.1.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.0.4...8.1.1)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-13 17:44:02 -08:00
43c6558aaf Update nginx-ingress and monitoring addons 2021-12-10 11:29:49 -08:00
125008fbb3 Update Cilium from v1.10.5 to v1.11.0
* https://github.com/cilium/cilium/releases/tag/v1.11.0
2021-12-10 11:26:05 -08:00
136107b448 Set Kubelet resolver config to /run/systemd/resolve/resolv.conf
* Both Flatcar Linux and Fedora CoreOS use systemd-resolved,
but they setup /etc/resolv.conf symlinks differently
* Prefer using /run/systemd/resolve/resolv.conf directly, which
also updates to reflect runtime changes (e.g. resolvectl)
2021-12-10 08:22:30 -08:00
e97c1cc9e5 Enable Kubernetes aggregation by default
* Change `enable_aggregation` default from false to true
* These days, Kubernetes control plane components emit annoying
messages related to assumptions baked into the Kubernetes API
Aggregation Layer if you don't enable it. Further the conformance
tests force you to remember to enable it if you care about passing
those
* This change is motivated by eliminating annoyances, rather than
any enthusiasm for Kubernetes' aggregation features

Rel: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/
2021-12-09 17:30:35 -08:00
39da5b53f5 Update operating system notes in architecture docs 2021-12-09 17:21:24 -08:00
41f739891b Normalize CA certs mounts in static Pods and kube-proxy
* Mount both /etc/ssl/certs and /etc/pki into control plane static
pods and kube-proxy, rather than choosing one based a variable
(set based on Flatcar Linux or Fedora CoreOS)
* Remove deprecated `--port` from `kube-scheduler` static Pod
2021-12-09 09:56:37 -08:00
861021ee98 Update Kubernetes from v1.22.4 to v1.23.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1230
* With Calico, add missing caliconodestatuses CRD added in v3.21.0
https://github.com/poseidon/terraform-render-bootstrap/pull/289
2021-12-09 09:28:41 -08:00
9d583ab377 Fix null provider version constraint on Google Cloud
* Part of https://github.com/poseidon/typhoon/pull/1074
2021-12-08 14:06:38 -08:00
c1d28e6f61 Change default disk_iops on Flatcar Linux
* Same as #1073, but for Flatcar Linux on AWS as well
2021-12-07 16:52:55 -08:00
a8fd21d250 Update minimum Terraform provider versions
* Update `null` provider to allow use of v3.1.x releases,
instead of being stuck on v2.1.2
* Update min versions in terraform-render-boostrap
https://github.com/poseidon/terraform-render-bootstrap/pull/287
* Document the recommended versions of Terraform cloud providers
2021-12-07 16:26:34 -08:00
9c626c9dbd Change default disk_iops from unset to 3000
* Since v1.21.3 switched controllers default disk type from
`gp2` to `gp3`, an iops diff has been shown (harmless, but
annoying)
* Controller nodes default to a 30GB `gp3` disk. `gp3` disks
do respect `iops` and the corresponding default is 3000
2021-12-07 15:44:09 -08:00
85252dec6e Switch FCOS workers to official Fedora CoreOS AMIs
* Fix worker nodes to use official Fedora CoreOS AMIs,
instead of the older Poseidon built AMIs (now removed).
This should have been part of #1038, but was missed in
code review
* Poseidon build AMIs have been deleted (so I don't have
to keep paying to host them for people)
2021-12-07 15:31:47 -08:00
298ea65d3e Bump mkdocs-material from 8.0.3 to 8.0.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.0.3 to 8.0.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.0.3...8.0.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-07 15:29:00 -08:00
c0ab15ba22 Bump mkdocs-material from 7.3.6 to 8.0.3
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.3.6 to 8.0.3.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Upgrade guide](https://github.com/squidfunk/mkdocs-material/blob/master/docs/upgrade.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.3.6...8.0.3)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-02 15:25:40 -08:00
5d7b6f611e Update nginx-ingess and Prometheus exporter addons 2021-11-21 09:28:17 -08:00
93594292eb Update Kubernetes from v1.22.3 to v1.22.4
* Update flannel from v0.15.0 to v0.15.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1224
2021-11-17 19:53:32 -08:00
0546608e77 Bump pymdown-extensions from 9.0 to 9.1
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.0 to 9.1.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.0...9.1)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-17 18:41:22 -08:00
94b2793e40 Update CoreDNS from v1.8.4 to v1.8.6
* https://coredns.io/2021/10/07/coredns-1.8.6-release/
2021-11-12 21:09:04 -08:00
4fd43b39ad Fix Flatcar Linux docker driver and add cgroups v2
* Remove `/sys/fs/cgroup/systemd` mount since Flatcar Linux
uses cgroups v2
* Flatcar Linux's `docker` switched from the `cgroupfs` to
`systemd` driver without notice
2021-11-12 21:07:20 -08:00
65083aca7d Update Calico and Flannel CNI providers
* Update Calico from v3.20.2 to v3.21.0
* Update Flannel from v0.14.0 to v0.15.0
2021-11-12 11:03:39 -08:00
07db4c1143 Allow use of google Terraform provider v4.0+
* https://github.com/hashicorp/terraform-provider-google/releases/tag/v4.0.0
2021-11-11 10:17:58 -08:00
e5d0ce5fd7 Bump mkdocs-material from 7.3.4 to 7.3.6
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.3.4 to 7.3.6.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.3.4...7.3.6)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-07 17:01:41 -08:00
b934a13605 Update Prometheus and Grafana addons 2021-11-07 17:00:40 -08:00
cd005a0b27 Prepare for v1.22.3 release 2021-10-28 11:58:55 -07:00
dd4a5a4e7e Update Kubernetes from v1.22.2 to v1.22.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1223
2021-10-28 10:11:06 -07:00
af835f976f Update flannel from v0.13.0 to v0.14.0
* https://github.com/flannel-io/flannel/releases/tag/v0.14.0
2021-10-28 10:09:06 -07:00
9e4a369f76 Bump mkdocs-material from 7.3.3 to 7.3.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.3.3 to 7.3.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.3.3...7.3.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-23 10:45:49 -07:00
831d897533 Bump mkdocs from 1.2.2 to 1.2.3
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.2.2 to 1.2.3.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.2.2...1.2.3)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-23 10:31:55 -07:00
17dce49982 Update etcd from v3.5.0 to v3.5.1
* https://github.com/etcd-io/etcd/releases/tag/v3.5.1
2021-10-17 11:28:27 -07:00
5744e10329 Update Cilium from v1.0.4 to v1.0.5
* https://github.com/cilium/cilium/releases/tag/v1.10.5
2021-10-17 11:26:59 -07:00
20748536df Update nginx-ingress from v1.0.2 to v1.0.4
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.0.4
2021-10-17 11:17:43 -07:00
f2e6256dd9 Update Prometheus, kube-state-metrics, and Grafana
* Update monitoring addons
2021-10-17 11:15:39 -07:00
443bd5a26b Add file to hold nodes on iptables-legacy
* Add `/etc/fedora-coreos/iptables-legacy.stamp` to declare
that `iptables-legacy` should be used instead of `iptables-nft`
(until support is added in future releases)
* https://github.com/coreos/fedora-coreos-tracker/issues/676
2021-10-11 20:30:49 -07:00
f8162b9be3 Update Calico from v3.20.1 to v3.20.2
* Use Calico's iptables legacy vs nft auto-detection
2021-10-11 20:28:48 -07:00
20ffbba4bf Bump mkdocs-material from 7.3.1 to 7.3.3
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.3.1 to 7.3.3.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.3.1...7.3.3)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-11 19:31:10 -07:00
15117fb95b Update Prometheus and nginx-ingress 2021-10-05 19:15:58 -07:00
10af8b4120 Bump mkdocs-material from 7.3.0 to 7.3.1
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.3.0 to 7.3.1.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.3.0...7.3.1)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-04 20:39:01 -07:00
e51b2903c1 Bump pymdown-extensions from 8.2 to 9.0
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 8.2 to 9.0.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/8.2...9.0)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-04 20:38:46 -07:00
cb72b261c7 Update Terraform provider poseidon/matchbox to v0.5+
* Relax version constraint to allow future minor version
releases to be used without a corresponding Typhoon change
2021-09-29 23:41:44 -07:00
209efd2f5b Update Prometheus, Grafana, and kube-state-metrics 2021-09-29 23:39:10 -07:00
388b1238bc Bump mkdocs-material from 7.2.8 to 7.3.0
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.2.8 to 7.3.0.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.2.8...7.3.0)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-27 20:40:44 -07:00
5a1e455220 Update nginx-ingress from v1.0.0 to v1.0.1 2021-09-24 09:38:18 -07:00
69f37c8b17 Update Prometheus from v2.29.2 to v2.30.0 2021-09-24 09:34:00 -07:00
b30de949b8 Update Calico and Cilium CNI
* Update Calico from v3.20.0 to v3.20.1
* Update Cilium from v1.10.3 to v1.10.4
2021-09-22 22:18:16 -07:00
4973178750 Bump mkdocs-material from 7.2.6 to 7.2.8
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.2.6 to 7.2.8.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.2.6...7.2.8)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-21 08:59:11 -07:00
bb7f31822e Update Kubernetes from v1.22.1 to v1.22.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1222
2021-09-15 19:56:24 -07:00
c6923b9ef3 Switch Fedora CoreOS to new ARM64 AMIs (#1038)
* Fedora CoreOS now publishes ARM64 AMIs
2021-09-12 11:49:13 -07:00
dae79d5916 Remove mention of freenode IRC
See #995
2021-09-12 10:10:49 -07:00
f4d5ac0ca7 Bump mkdocs-material from 7.2.5 to 7.2.6
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.2.5 to 7.2.6.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.2.5...7.2.6)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-11 12:49:06 -07:00
7e1b2cdba1 Discontinue Docker automated build publishing
* Poseidon infra publishes official multi-arch container
images for Kubelet to both Quay and Dockerhub (fallback).
There is no change here
* Automated builds by Quay and Dockerhub added separately
tagged images for those not able to trust our images and
preferring to trust Quay/Dockerhub. Going forward, we're
ending the use of Dockerhub automated builds. Docker has
moved automated builds to paid plans, even for open source
projects (we're not petitioning for a special exemption
given these are our unofficial images). Those still needing
Kubelet images built externally (i.e. not Poseidon Labs)
would still be able to use the Quay images tagged `build-SHA`
2021-09-01 11:52:57 -07:00
3bb20ce083 Bump mkdocs-material from 7.2.4 to 7.2.5
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.2.4 to 7.2.5.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.2.4...7.2.5)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-31 17:34:24 -07:00
eb29fb639b Update nginx-ingress, Prometheus, and Grafana addons 2021-08-24 22:14:57 -07:00
fcbdb50d93 Update Kubernetes from v1.22.0 to v1.22.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1221
2021-08-19 21:12:02 -07:00
efac611e9c Bump mkdocs-material from 7.2.2 to 7.2.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.2.2 to 7.2.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.2.2...7.2.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-16 11:51:08 -07:00
87ff431b80 Bump pygments from 2.9.0 to 2.10.0
Bumps [pygments](https://github.com/pygments/pygments) from 2.9.0 to 2.10.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.9.0...2.10.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-16 11:40:09 -07:00
0d8ceae1d9 Add etcd v3.5.0 note to CHANGES 2021-08-11 09:24:43 -07:00
c5cf803634 Update Grafana and kube-state-metrics addons 2021-08-10 22:17:16 -07:00
61ee01f462 Show SSH keys with ssh-ed25519 instead of sha-rsa in docs
* For Fedora CoreOS, users should not be using sha-rsa public
keys anymore, so make sure the docs examples reflect this
* https://github.com/poseidon/typhoon/issues/915
2021-08-10 21:48:18 -07:00
cbef202eec Update Prometheus discovery of kube components
* Kubernetes v1.22.0 disabled kube-controller-manager insecure
port, which was used internally for Prometheus metrics scraping
* Configure Prometheus to discover and scrape endpoints for
kube-scheduler and kube-controller-manager via the authenticated
https ports, via bearer token
* Change firewall ports to allow Prometheus (on worker nodes)
to scrape kube-scheduler and kube-controller-manager targets
that run on controller(s) with hostNetwork
* Disable the insecure port on kube-scheduler
2021-08-10 21:25:19 -07:00
0c99b909a9 Update nginx-ingress from v0.47.0 to v1.0.0-beta.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.0.0-beta.1
2021-08-07 12:46:00 -07:00
739db3b35f Update Grafana and node-exporter addons
* https://github.com/grafana/grafana/releases/tag/v8.1.0
* https://github.com/prometheus/node_exporter/releases/tag/v1.2.1
2021-08-05 23:24:57 -07:00
c68b035a63 Update Flatcar Linux and Fedora CoreOS notes 2021-08-05 23:22:45 -07:00
1a5949824c Update etcd from v3.4.16 to v3.5.0
* Use multi-arch container image instead of a special
"-arm64" suffix on arm64
* https://github.com/etcd-io/etcd/releases/tag/v3.5.0
2021-08-04 22:10:07 -07:00
9bac641511 Update Kubernetes from v1.21.3 to v1.22.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1220
2021-08-04 22:09:19 -07:00
37ff3c28eb Bump mkdocs-material from 7.1.11 to 7.2.2
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.11 to 7.2.2.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.11...7.2.2)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-02 19:22:58 -07:00
f03045f0dc Update Cilium for cgroups v2 support
* On Fedora CoreOS, Cilium cross-node service IP load balancing
stopped working for a time (first observable as CoreDNS pods
located on worker nodes not being able to reach the kubernetes
API service 10.3.0.1). This turned out to have two parts:
* Fedora CoreOS switched to cgroups v2 by default. In our early
testing with cgroups v2, Calico (default) was used. With the
cgroups v2 change, SELinux policy denied some eBPF operations.
Since fixed in all Fedora CoreOS channels
* Cilium requires new mounts to support cgroups v2, which are
added here

* https://github.com/coreos/fedora-coreos-tracker/issues/292
* https://github.com/coreos/fedora-coreos-tracker/issues/881
* https://github.com/cilium/cilium/pull/16259
2021-07-24 10:36:47 -07:00
b603bbde3d Update Butane Config from v1.2.0 to v1.4.0
* Rename Fedora CoreOS Config (FCC) to Butane Config
* Require any snippets customizations use version v1.4.0

* https://typhoon.psdn.io/advanced/customization/#hosts
2021-07-19 23:53:51 -07:00
810236f6df Bump mkdocs-material from 7.1.10 to 7.1.11
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.10 to 7.1.11.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.10...7.1.11)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-19 10:38:59 -07:00
3c3d3a2473 Bump mkdocs from 1.2.1 to 1.2.2
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.2.1 to 1.2.2.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.2.1...1.2.2)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-19 10:06:13 -07:00
1af9fd8094 Remove outdated Terraform migration docs
* Terraform v0.12.x and v0.13.x are now quite outdated,
remove the migration docs
2021-07-19 08:36:59 -07:00
c734fa7b84 Update node-exporter from v1.1.2 to v1.2.0
* https://github.com/prometheus/node_exporter/releases/tag/v1.2.0
2021-07-18 15:26:44 -07:00
fdade5b40c Update poseidon/ct provider from v0.8.0 to v0.9.0
* Continue targeting Ignition v3.2.0 for some time
2021-07-18 09:05:02 -07:00
171fd2c998 Update Kubernetes from v1.21.2 to v1.21.3
* https://github.com/kubernetes/kubernetes/releases/tag/v1.21.3
2021-07-17 18:22:24 -07:00
545bd79624 Update Grafana from v8.0.4 to v8.0.6
* https://github.com/grafana/grafana/releases/tag/v8.0.6
2021-07-16 12:02:36 -07:00
12b825c78f Bump mkdocs-material from 7.1.9 to 7.1.10
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.9 to 7.1.10.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.9...7.1.10)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-12 19:10:52 -07:00
66e7354c8a Change AWS default disk type from gp2 to gp3
* https://aws.amazon.com/about-aws/whats-new/2020/12/introducing-new-amazon-ebs-general-purpose-volumes-gp3/
2021-07-04 10:43:05 -07:00
3a71b2ccb1 Update Cilium from v1.10.1 to v1.10.2
* https://github.com/cilium/cilium/releases/tag/v1.10.2
2021-07-04 10:11:21 -07:00
c7e327417b Update Prometheus and Grafana addons 2021-07-04 10:02:44 -07:00
e313e733ab Bump mkdocs-material from 7.1.8 to 7.1.9
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.8 to 7.1.9.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.8...7.1.9)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-29 22:23:36 -07:00
d0e73b8174 Bump terraform-render-bootstrap 2021-06-27 18:11:43 -07:00
65ddd2419c Add Known Issues with FCOS to CHANGES 2021-06-27 16:51:59 -07:00
b0e9b1fa60 Update Prometheus and Grafana addons
* https://github.com/prometheus/prometheus/releases/tag/v2.28.0
* https://github.com/grafana/grafana/releases/tag/v8.0.3
2021-06-27 14:46:43 -07:00
485feb82c4 Update CoreDNS from v1.8.0 to v1.8.4
* https://coredns.io/2021/01/20/coredns-1.8.1-release/
* https://coredns.io/2021/02/23/coredns-1.8.2-release/
* https://coredns.io/2021/02/24/coredns-1.8.3-release/
* https://coredns.io/2021/05/28/coredns-1.8.4-release/
2021-06-23 23:31:25 -07:00
0b276b6b7e Update Kubernetes from v1.21.1 to v1.21.2
* https://github.com/kubernetes/kubernetes/releases/tag/v1.21.2
2021-06-17 16:15:20 -07:00
e8513e58bb Add support for Terraform v1.0.0
* https://github.com/hashicorp/terraform/releases/tag/v1.0.0
2021-06-17 13:32:56 -07:00
d77343be3a Workaround systemd 248 path units not working reliably
* On FCOS 34 / systemd 248, `kubelet.path` won't activate (stuck
waiting) when `/etc/kubernetes/kubeconfig` exists, even with
manual prodding of the file. The root cause isn't known, but
a workaround is to delay `/etc/kubernetes` directory creation
or to touch the directory later
* Fix DigitalOcean worker node kubelet.service being enabled
immediately. On bare-metal and DigitalOcean, the kubeconfig
should activate the Kubelet, so it doesn't crashloop needlessly
(nice to have, not required)
2021-06-16 10:19:39 -07:00
f2b01e1d75 Bump mkdocs-material from 7.1.7 to 7.1.8
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.7 to 7.1.8.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.7...7.1.8)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-14 15:06:18 -07:00
60c2107d7f Bump mkdocs from 1.1.2 to 1.2.1
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.1.2 to 1.2.1.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.1.2...1.2.1)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-14 15:01:52 -07:00
30cfeec6c1 Update nginx-ingress from v0.46.0 to v0.47.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.47.0
2021-06-07 10:11:07 -07:00
ba8774ee0d Bump mkdocs-material from 7.1.6 to 7.1.7
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.6 to 7.1.7.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.6...7.1.7)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-07 09:43:23 -07:00
24e63bd134 Update Prometheus, Grafana, kube-state-metrics addons 2021-06-07 09:40:06 -07:00
996bdd9112 Update Calico from v3.19.0 to v3.19.1
* https://docs.projectcalico.org/archive/v3.19/release-notes/
2021-06-02 14:51:15 -07:00
a34d78f55d Bump mkdocs-material from 7.1.5 to 7.1.6
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.5 to 7.1.6.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.5...7.1.6)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-31 14:39:01 -07:00
04b2e149ba Remove freenode IRC from help section
* Due to the takeover of freenode.net IRC, the channel
there should no longer be used
2021-05-26 11:31:25 -07:00
9f0126a410 Fix typo in CHANGES.md 2021-05-25 21:16:53 -07:00
a1bab9c96e Bump mkdocs-material from 7.1.4 to 7.1.5
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.4 to 7.1.5.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.4...7.1.5)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-24 11:39:13 -07:00
966fd280b0 Update Cilium from v0.10.0-rc1 to v0.10.0
* https://github.com/cilium/cilium/releases/tag/v1.10.0
2021-05-24 11:16:51 -07:00
e4e074c894 Update Cilium from v1.9.6 to v1.10.0-rc1
* Add multi-arch container images and arm64 support
* https://github.com/cilium/cilium/releases/tag/v1.10.0-rc1
2021-05-14 14:24:52 -07:00
d51da49925 Update docs for Kubernetes v1.21.1 and Terraform v0.15.x 2021-05-13 11:34:01 -07:00
2076a779a3 Update Kubernetes from v1.21.0 to v1.21.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1211
2021-05-13 11:23:26 -07:00
048094b256 Update etcd from v3.4.15 to v3.4.16
* https://github.com/etcd-io/etcd/blob/main/CHANGELOG-3.4.md
2021-05-13 10:53:04 -07:00
75b063c586 Update Prometheus from v2.25.2 to v2.27.0
* Update Grafana from v7.5.4 to v7.5.6
* https://github.com/prometheus/prometheus/releases/tag/v2.27.0
* https://github.com/grafana/grafana/releases/tag/v7.5.6
2021-05-12 11:47:07 -07:00
1620d1e456 Bump mkdocs-material from 7.1.3 to 7.1.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.3 to 7.1.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.3...7.1.4)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-10 14:53:17 -07:00
939bffbf98 Bump pymdown-extensions from 8.1.1 to 8.2
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 8.1.1 to 8.2.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/8.1.1...8.2)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-10 14:52:58 -07:00
bc96443710 Update nginx-ingress from v0.45.0 to v0.46.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.46.0
2021-05-05 12:06:20 -07:00
82a7422b3d Change Dependabot pip watcher to check weekly 2021-05-05 11:34:57 -07:00
132ab395a5 Bump pygments from 2.8.1 to 2.9.0
Bumps [pygments](https://github.com/pygments/pygments) from 2.8.1 to 2.9.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.8.1...2.9.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-05 11:32:02 -07:00
5f87eb3ec9 Update Fedora CoreOS Kubelet for cgroups v2
* Fedora CoreOS is beginning to switch from cgroups v1 to
cgroups v2 by default, which changes the sysfs hierarchy
* This will be needed when using a Fedora Coreos OS image
that enables cgroups v2 (`next` stream as of this writing)

Rel: https://github.com/coreos/fedora-coreos-tracker/issues/292
2021-04-26 11:48:58 -07:00
b152b9f973 Reduce the default disk_size from 40GB to 30GB
* We're typically reducing the `disk_size` in real clusters
since the space is under used. The default should be lower.
2021-04-26 11:43:26 -07:00
9c842395a8 Update Cilium from v1.9.5 to v1.9.6
* https://github.com/cilium/cilium/releases/tag/v1.9.6
2021-04-26 10:55:23 -07:00
6cb9c0341b Bump mkdocs-material from 7.1.2 to 7.1.3
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.2 to 7.1.3.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.2...7.1.3)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-26 10:35:00 -07:00
d4fd6d4adb Bump mkdocs-material from 7.1.1 to 7.1.2
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.1 to 7.1.2.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.1...7.1.2)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-23 14:26:27 -07:00
3664dfafc2 Update docs with video meetings and referral links
* Use our DigitalOcean referral code for new DigitalOcean
users. This gives new accounts free cloud credits and
provides a smaller cloud credit back to the project
* Link to the new video meeting via one-time Github Sponsor
feature that we're trying out
* List Fedora CoreOS ARM64 as a supported platform (alpha).
Before this was only mentioned in docs and on the blog.
2021-04-17 19:15:51 -07:00
e535ddd15a Update Grafana from v7.5.3 to v7.5.4
* https://github.com/grafana/grafana/releases/tag/v7.5.4
2021-04-17 11:38:14 -07:00
5752a8f041 Update kube-state-metrics from v2.0.0-rc.1 to v2.0.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0
2021-04-17 11:34:52 -07:00
68abbf7b0d Fix docs link on index page (#975)
* Fix Fedora CoreOS Google Cloud tutorial link
2021-04-17 10:52:59 -07:00
67047ead08 Update Terraform version to allow v0.15.0
* Require Terraform version v0.13 <= x < v0.16
2021-04-16 09:46:01 -07:00
c11e23fc50 Fix minor docs issues and missing changelog links 2021-04-13 09:35:11 -07:00
b647ad8806 Bump mkdocs-material from 7.1.0 to 7.1.1
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.1.0 to 7.1.1.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.1.0...7.1.1)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-12 20:29:01 -07:00
2eb1ac1b4d Update nginx-ingress from v0.44.0 to v0.45.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.45.0
2021-04-12 00:18:47 -07:00
cb2721ef7d Update Grafana from v7.5.2 to v7.5.3
* https://github.com/grafana/grafana/releases/tag/v7.5.3
2021-04-12 00:17:22 -07:00
fc06d28e13 Remove deprecated field on azurerm_lb_backend_address_pool
* Remove the deprecated `resource_group_name` field from Azure
`azurerm_lb_backend_address_pool` resources
2021-04-11 23:59:17 -07:00
a9078cb52b Add sponsorship badge to Github repo 2021-04-11 16:00:16 -07:00
ebd9570ede Update Fedora CoreOS Config version from v1.1.0 to v1.2.0
* Require [poseidon/ct](https://github.com/poseidon/terraform-provider-ct)
Terraform provider v0.8+
* Require any [snippets](https://typhoon.psdn.io/advanced/customization/#hosts)
customizations to update to v1.2.0

See upgrade [notes](https://typhoon.psdn.io/topics/maintenance/#upgrade-terraform-provider-ct)
2021-04-11 15:26:54 -07:00
34e8db7aae Update static Pod manifests for Kubernetes v1.21.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/257
2021-04-11 15:05:46 -07:00
084e8bea49 Allow custom initial node taints on worker pool nodes
* Add `node_taints` variable to worker modules to set custom
initial node taints on cloud platforms that support auto-scaling
worker pools of heterogeneous nodes (i.e. AWS, Azure, GCP)
* Worker pools could use custom `node_labels` to allowed workloads
to select among differentiated nodes, while custom `node_taints`
allows a worker pool's nodes to be tainted as special to prevent
scheduling, except by workloads that explicitly tolerate the
taint
* Expose `daemonset_tolerations` in AWS, Azure, and GCP kubernetes
cluster modules, to determine whether `kube-system` components
should tolerate the custom taint (advanced use covered in docs)

Rel: #550, #663
Closes #429
2021-04-11 15:00:11 -07:00
d73621c838 Update Kubernetes from v1.20.5 to v1.21.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1210
2021-04-08 21:44:31 -07:00
1a6481df04 Update Grafana from v7.5.1 to v7.5.2
* https://github.com/grafana/grafana/releases/tag/v7.5.2
2021-04-04 18:20:02 -07:00
798ec9a92f Change CNI config directory to /etc/cni/net.d
* Change CNI config directory from `/etc/kubernetes/cni/net.d`
to `/etc/cni/net.d` (Kubelet default)
* https://github.com/poseidon/terraform-render-bootstrap/pull/255
2021-04-02 00:03:48 -07:00
96aed4c3c3 Bump mkdocs-material from 7.0.6 to 7.1.0
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 7.0.6 to 7.1.0.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/docs/changelog.md)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/7.0.6...7.1.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-02 00:01:44 -07:00
7372d33af8 Update kube-state-metrics and Grafana
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-rc.1
* https://github.com/grafana/grafana/releases/tag/v7.5.1
2021-03-28 10:53:52 -07:00
451ec771a8 Update Terraform providers and CHANGES for release 2021-03-23 08:45:57 -07:00
4d9846b83e Add DigitalOcean as a OSS sponsorship partner
* Include DigitalOcean logo and link on repo and site
2021-03-21 11:34:36 -07:00
597ca4acce Update CoreDNS from v1.7.0 to v1.8.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/254
2021-03-20 16:47:25 -07:00
507c646e8b Add Kubelet provider-id on AWS
* Set the Kubelet `--provider-id` on AWS based on metadata from
Fedora CoreOS afterburn or Flatcar Linux coreos-metadata
* Based on https://github.com/poseidon/typhoon/pull/951
2021-03-19 12:43:37 -07:00
d8f7da6873 Add dependabot update watcher for docs pypi packages
* Update requirements.txt packages for mkdocs
2021-03-19 11:55:54 -07:00
048f1f514e Update Grafana from v7.4.3 to v7.4.5
* https://github.com/grafana/grafana/releases/tag/v7.4.5
2021-03-19 11:51:52 -07:00
b825cd9afe Update Prometheus from v2.25.1 to v2.25.2
* https://github.com/prometheus/prometheus/releases/tag/v2.25.2
2021-03-19 11:49:38 -07:00
796149d122 Update Kubernetes from v1.20.4 to v1.20.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1205
2021-03-19 11:27:31 -07:00
a66bccd590 Update Cilium from v1.9.4 to v1.9.5
* https://github.com/cilium/cilium/releases/tag/v1.9.5
2021-03-14 11:48:22 -07:00
30b1edfcc6 Mark bootstrap token as sensitive in plan/apply
* Mark the bootstrap token as sensitive, which is useful when
Terraform is run in automated CI/CD systems to avoid showing
the token
* https://github.com/poseidon/terraform-render-bootstrap/pull/251
2021-03-14 11:32:35 -07:00
a4afe06b64 Update Calico from v3.17.3 to v3.18.1
* https://docs.projectcalico.org/archive/v3.18/release-notes/
2021-03-14 10:35:24 -07:00
4d58be0816 Update Prometheus from v2.25.0 to v2.25.1
* https://github.com/prometheus/prometheus/releases/tag/v2.25.1
2021-03-14 09:43:15 -07:00
170b768ad8 Add KUBELET_IMAGE to Fedora CoreOS bootstrap.service (#945)
* Align with Flatcar Linux `bootstrap.service`
2021-03-14 09:35:42 -07:00
5bc1cd28c3 Switch kube-state-metrics image from quay to k8s.gcr.io
* kube-state-metrics is continuing publishing container images
to `k8s.gcr.io` instead of `quay.io`

Rel: https://github.com/kubernetes/kube-state-metrics/issues/1409
2021-03-11 10:56:18 -08:00
13fbac6c79 Update Grafana from v7.4.2 to v7.4.3
* https://github.com/grafana/grafana/releases/tag/v7.4.3
2021-03-05 17:19:54 -08:00
a8fa4a9a06 Update node-exporter and kube-state-metrics
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-rc.0
* https://github.com/prometheus/node_exporter/releases/tag/v1.1.2
2021-03-05 17:13:45 -08:00
a5c1a96df1 Update etcd from v3.4.14 to v3.4.15
* https://github.com/etcd-io/etcd/releases/tag/v3.4.15
2021-03-05 17:02:57 -08:00
6a091e245e Remove Flatcar Linux Edge os_image option
* Flatcar Linux has not published an Edge channel image since
April 2020 and recently removed mention of the channel from
their documentation https://github.com/kinvolk/Flatcar/pull/345
* Users of Flatcar Linux Edge should move to the stable, beta, or
alpha channel, barring any alternate advice from upstream Flatcar
Linux
2021-02-20 16:09:54 -08:00
590796ee62 Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2021-02-19 00:24:07 -08:00
ec389295fe Update Grafana from v7.4.0 to v7.4.2
* https://github.com/grafana/grafana/releases/tag/v7.4.2
2021-02-19 00:18:39 -08:00
3c807f3478 Update Prometheus from v2.24.1 to v2.25.0
* https://github.com/prometheus/prometheus/releases/tag/v2.25.0
2021-02-19 00:16:35 -08:00
e76fe80b45 Update Kubernetes from v1.20.3 to v1.20.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1204
2021-02-19 00:02:07 -08:00
32853aaa7b Update Kubernetes from v1.20.2 to v1.20.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1203
2021-02-17 22:29:33 -08:00
c32a54db40 Update node-exporter from v1.0.1 to v1.1.1
* https://github.com/prometheus/node_exporter/releases/tag/v1.1.1
2021-02-14 14:30:28 -08:00
9671b1c734 Update flannel-cni from v0.4.1 to v0.4.2
* https://github.com/poseidon/flannel-cni/releases/tag/v0.4.2
2021-02-14 12:04:59 -08:00
3b933e1ab3 Update Grafana from v7.3.7 to v7.4.0
* https://github.com/grafana/grafana/releases/tag/v7.4.0
2021-02-07 21:42:18 -08:00
58d8f6f505 Update Prometheus from v2.24.0 to v2.24.1
* https://github.com/prometheus/prometheus/releases/tag/v2.24.1
2021-02-04 22:28:32 -08:00
56853fe222 Update nginx-ingress from v0.43.0 to v0.44.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.44.0
2021-02-04 22:19:58 -08:00
18165d8076 Update Calico from v3.17.1 to v3.17.2
* https://github.com/projectcalico/calico/releases/tag/v3.17.2
2021-02-04 22:03:51 -08:00
50acf28ce5 Update Cilium from v1.9.3 to v1.9.4
* https://github.com/cilium/cilium/releases/tag/v1.9.4
2021-02-03 23:08:22 -08:00
ab793eb842 Update Cilium from v1.9.2 to v1.9.3
* https://github.com/cilium/cilium/releases/tag/v1.9.3
2021-01-26 17:13:52 -08:00
b74c958524 Update Cilium from v1.9.1 to v1.9.2
* https://github.com/cilium/cilium/releases/tag/v1.9.2
2021-01-20 22:06:45 -08:00
2024d3c32e Link to Github Sponsors in README and docs
* Update the Social Contract and Sponsors
2021-01-16 12:56:59 -08:00
11c434915f Update Grafana from v7.3.6 to v7.3.7
* https://github.com/grafana/grafana/releases/tag/v7.3.7
2021-01-16 10:46:56 -08:00
05f7df9e80 Update Kubernetes from v1.20.1 to v1.20.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1202
2021-01-13 17:46:51 -08:00
4220b9ce18 Add support for Terraform v0.14.4+
* Support Terraform v0.13.x and v0.14.4+
2021-01-12 21:43:12 -08:00
6a6af4aa16 Update Prometheus from v2.24.0-rc.0 to v2.24.0
* https://github.com/prometheus/prometheus/releases/tag/v2.24.0
2021-01-12 20:49:18 -08:00
3dcd10f3b8 Update Prometheus v2.23.0 to v2.24.0-rc.0
* https://github.com/prometheus/prometheus/releases/tag/v2.24.0-rc.0
2021-01-01 13:49:28 -08:00
22503993b9 Update nginx-ingress from v0.41.2 to v0.43.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.43.0
* https://github.com/kubernetes/ingress-nginx/issues/6696
2021-01-01 13:44:45 -08:00
cf3aa8885b Update Prometheus rules and Grafana dashboards
* Update Grafana from v7.3.5 to v7.3.6
2020-12-19 14:56:42 -08:00
ba61a137db Add notice about upstream Fedora CoreOS changes
* Highlight that short-term, use of Fedora CoreOS will
require non-RSA SSH keys or a workaround snippet
2020-12-19 14:10:42 -08:00
646bdd78e4 Update Kubernetes from v1.20.0 to v1.20.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1201
2020-12-19 12:56:28 -08:00
c163fbbbcd Update docs and README for release 2020-12-12 12:31:35 -08:00
dc7be431e0 Remove iSCSI mounts from Kubelet
* Remove Kubelet `/etc/iscsi` and `iscsiadm` host mounts that
were added on bare-metal, since these no longer work on either
Fedora CoreOS or Flatcar Linux with newer `iscsiadm`
* These special mounts on bare-metal date back to #350 which
added them to provide a way to use iSCSI in Kubernetes v1.10
* Today, storage should be handled by external CSI providers
which handle different storage systems, which doesn't rely
on Kubelet storage utils

Close #907
2020-12-12 11:41:02 -08:00
86e0f806b3 Revert "Add support for Terraform v0.14.x"
This reverts commit 968febb050.
2020-12-11 00:47:57 -08:00
96172ad269 Update Grafana from v7.3.4 to v7.3.5
* https://github.com/grafana/grafana/releases/tag/v7.3.5
2020-12-11 00:24:43 -08:00
3eb20a1f4b Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-12-11 00:15:29 -08:00
ee9ce3d0ab Update Calico from v3.17.0 to v3.17.1
* https://github.com/projectcalico/calico/releases/tag/v3.17.1
2020-12-10 22:48:38 -08:00
a8b8a9b454 Update Kubernetes from v1.20.0-rc.0 to v1.20.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1200
2020-12-08 18:28:13 -08:00
968febb050 Add support for Terraform v0.14.x
* Support Terraform v0.13.x and v0.14.x
2020-12-07 00:22:38 -08:00
bee455f83a Update Cilium from v1.9.0 to v1.9.1
* https://github.com/cilium/cilium/releases/tag/v1.9.1
2020-12-04 14:14:18 -08:00
3e89ea1b4a Promote Fedora CoreOS bare-metal to stable
* Fedora CoreOS is a good choice for use on bare-metal
2020-12-04 14:02:55 -08:00
e77dd6ecd4 Update Kubernetes from v1.19.4 to v1.20.0-rc.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1200-rc0
2020-12-03 16:01:28 -08:00
4fd4a0f540 Move control plane static pod TLS assets to /etc/kubernetes/pki
* Change control plane static pods to mount `/etc/kubernetes/pki`,
instead of `/etc/kubernetes/bootstrap-secrets` to better reflect
their purpose and match some loose conventions upstream
* Place control plane and bootstrap TLS assets and kubeconfig's
in `/etc/kubernetes/pki`
* Mount to `/etc/kubernetes/pki` (rather than `/etc/kubernetes/secrets`)
to match the host location (less surprise)

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/233
2020-12-02 23:26:42 -08:00
804dfea0f9 Add kubeconfig's for kube-scheduler and kube-controller-manager
* Generate TLS client certificates for `kube-scheduler` and
`kube-controller-manager` with `system:kube-scheduler` and
`system:kube-controller-manager` CNs
* Template separate kubeconfigs for kube-scheduler and
kube-controller manager (`scheduler.conf` and
`controller-manager.conf`). Rename admin for clarity
* Before v1.16.0, Typhoon scheduled a self-hosted control
plane, which allowed the steady-state kube-scheduler and
kube-controller-manager to use a scoped ServiceAccount.
With a static pod control plane, separate CN TLS client
certificates are the nearest equiv.
* https://kubernetes.io/docs/setup/best-practices/certificates/
* Remove unused Kubelet certificate, TLS bootstrap is used
instead
2020-12-01 22:02:15 -08:00
8ba23f364c Add TokenReview and TokenRequestProjection flags
* Add kube-apiserver flags for TokenReview and TokenRequestProjection
(beta, defaults on) to allow using Service Account Token Volume
Projection to create and mount service account tokens tied to a Pod's
lifecycle

Rel:

* https://github.com/poseidon/terraform-render-bootstrap/pull/231
* https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection
2020-12-01 20:02:33 -08:00
f6025666eb Update etcd from v3.4.12 to v3.4.14
* https://github.com/etcd-io/etcd/releases/tag/v3.4.14
2020-11-29 20:04:25 -08:00
85eb502f19 Update Prometheus from v2.23.0-rc.0 to v2.23.0
* https://github.com/prometheus/prometheus/releases/tag/v2.23.0
2020-11-29 19:59:27 -08:00
fa3184fb9c Relax terraform-provider-ct version constraint
* Allow terraform-provider-ct versions v0.6+ (e.g. v0.7.1)
Before, only v0.6.x point updates were allowed
* Update terraform-provider-ct to v0.7.1 in docs
* READ the docs before updating terraform-provider-ct,
as changing worker user-data is handled differently
by different cloud platforms
2020-11-29 19:51:26 -08:00
22565e57e0 Update kube-state-metrics from v2.0.0-alpha.2 to v2.0.0-alpha.3
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.3
2020-11-25 14:30:11 -08:00
026e1f3648 Update Grafana from v7.3.3 to v7.3.4
* https://github.com/grafana/grafana/releases/tag/v7.3.4
2020-11-25 14:25:15 -08:00
ae548ce213 Update Calico from v3.16.5 to v3.17.0
* Enable Calico MTU auto-detection
* Remove [workaround](https://github.com/poseidon/typhoon/pull/724) to
Calico cni-plugin [issue](https://github.com/projectcalico/cni-plugin/issues/874)

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/230
2020-11-25 14:22:58 -08:00
e826b49648 Update Matchbox profile to use initramfs and rootfs images
* Fedora CoreOS stable (after Oct 6) ships separate initramfs
and rootfs images, used as initrd's
* Update profiles to match the Matchbox examples, which have
already switched to the new profile and to remove the unused
kernel args
* Requires Fedora CoreOS version which ships rootfs images
(e.g. stable 32.20200923.3.0 or later)

Rel:

* https://github.com/coreos/fedora-coreos-tracker/issues/390#issuecomment-661986987
* da0df01763 (diff-4541f7b7c174f6ae6270135942c1c65ed9e09ebe81239709f5a9fb34e858ddcf)

Supercedes https://github.com/poseidon/typhoon/pull/888
2020-11-25 14:13:39 -08:00
fa8f68f50e Fix Fedora CoreOS AWS AMI query in non-US regions
* A `aws_ami` data source will fail a Terraform plan
if no matching AMI is found, even if the AMI is not
used. ARM64 images are only published to a few US
regions, so the `aws_ami` data query could fail when
creating Fedora CoreOS AWS clusters in non-US regions
* Condition `aws_ami` on whether experimental arch
`arm64` is chosen
* Recent regression introduced in v1.19.4
https://github.com/poseidon/typhoon/pull/875

Closes https://github.com/poseidon/typhoon/issues/886
2020-11-25 11:32:05 -08:00
ba8d972c76 Update Prometheus from v2.22.2 to v2.23.0-rc.0
* https://github.com/prometheus/prometheus/releases/tag/v2.23.0-rc.0
2020-11-24 10:54:42 -08:00
c0347ca0c6 Set kubeconfig and asset_dist as sensitive
* Mark `kubeconfig` and `asset_dist` as `sensitive` to
prevent the Terraform CLI displaying these values, esp.
for CI systems
* In particular, external tools or tfvars style uses (not
recommended) reportedly display all outputs and are improved
by setting sensitive
* For Terraform v0.14, outputs referencing sensitive fields
must also be annotated as sensitive

Closes https://github.com/poseidon/typhoon/issues/884
2020-11-23 11:41:55 -08:00
9f94ab6bcc Rerun terraform fmt for recent variables 2020-11-21 14:20:36 -08:00
5e4f5de271 Enable Network Load Balancer (NLB) dualstack
* NLB subnets assigned both IPv4 and IPv6 addresses
* NLB DNS name has both A and AAAA records
* NLB to target node traffic is IPv4 (no change),
no change to security groups needed
* Ingresses exposed through the recommended Nginx
Ingress Controller addon will be accessible via
IPv4 or IPv6. No change is needed to the app's
CNAME to NLB record

Related: https://aws.amazon.com/about-aws/whats-new/2020/11/network-load-balancer-supports-ipv6/
2020-11-21 14:16:24 -08:00
be28495d79 Update Prometheus from v2.22.1 to v2.22.2
* https://github.com/prometheus/prometheus/releases/tag/v2.22.2
2020-11-19 21:50:48 -08:00
f1356fec24 Update Grafana from v7.3.2 to v7.3.3
* https://github.com/grafana/grafana/releases/tag/v7.3.3
2020-11-19 21:49:11 -08:00
cc00afa4e1 Add Terraform v0.13 input variable validations
* Support for migrating from Terraform v0.12.x to v0.13.x
was added in v1.18.8
* Require Terraform v0.13+. Drop support for Terraform v0.12
2020-11-17 12:02:34 -08:00
5c3b5a20de Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-11-14 13:32:04 -08:00
f5a83667e8 Update Grafana from v7.3.1 to v7.3.2
* https://github.com/grafana/grafana/releases/tag/v7.3.2
2020-11-14 13:30:30 -08:00
a911367c2e Update nginx-ingress from v0.41.0 to v0.41.2
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.2
2020-11-14 13:27:06 -08:00
f884de847e Discard Prometheus etcd gRPC failure alert
* Kubernetes watch expiry is not a gRPC code we care about
* Background: This rule is typically removed, but was added back in
2020-11-14 13:17:56 -08:00
1b3a0f6ebc Add experimental Fedora CoreOS arm64 support on AWS
* Add experimental `arch` variable to Fedora CoreOS AWS,
accepting amd64 (default) or arm64 to support native
arm64/aarch64 clusters or mixed/hybrid clusters with
a worker pool of arm64 workers
* Add `daemonset_tolerations` variable to cluster module
(experimental)
* Add `node_taints` variable to workers module
* Requires flannel CNI and experimental Poseidon-built
arm64 Fedora CoreOS AMIs (published to us-east-1, us-east-2,
and us-west-1)

WARN:

* Our AMIs are experimental, may be removed at any time, and
will be removed when Fedora CoreOS publishes official arm64
AMIs. Do NOT use in production

Related:

* https://github.com/poseidon/typhoon/pull/682
2020-11-14 13:09:24 -08:00
1113a22f61 Update Kubernetes from v1.19.3 to v1.19.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1194
2020-11-11 22:56:27 -08:00
152c7d86bd Change bootstrap.service container from rkt to docker
* Use docker to run `bootstrap.service` container
* Background https://github.com/poseidon/typhoon/pull/855
2020-11-11 22:26:05 -08:00
79deb8a967 Update Cilium from v1.9.0-rc3 to v1.9.0
* https://github.com/cilium/cilium/releases/tag/v1.9.0
2020-11-10 23:42:41 -08:00
f412f0d9f2 Update Calico from v3.16.4 to v3.16.5
* https://github.com/projectcalico/calico/releases/tag/v3.16.5
2020-11-10 22:58:19 -08:00
eca6c4a1a1 Fix broken flatcar linux documentation links (#870)
* Fix old documentation links
2020-11-10 18:30:30 -08:00
133d325013 Update nginx-ingress from v0.40.2 to v0.41.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.0
2020-11-08 14:34:52 -08:00
4b05c0180e Update Grafana from v7.3.0 to v7.3.1
* https://github.com/grafana/grafana/releases/tag/v7.3.1
2020-11-08 14:13:39 -08:00
f49ab3a6ee Update Prometheus from v2.22.0 to v2.22.1
* https://github.com/prometheus/prometheus/releases/tag/v2.22.1
2020-11-08 14:12:24 -08:00
0eef16b274 Improve and tidy Fedora CoreOS etcd-member.service
* Allow a snippet with a systemd dropin to set an alternate
image via `ETCD_IMAGE`, for consistency across Fedora CoreOS
and Flatcar Linux
* Drop comments about integrating system containers with
systemd-notify
2020-11-08 11:49:56 -08:00
ad1f59ce91 Change Flatcar etcd-member.service container from rkt to docker
* Use docker to run the `etcd-member.service` container
* Use env-file `/etc/etcd/etcd.env` like podman on FCOS
* Background: https://github.com/poseidon/typhoon/pull/855
2020-11-03 16:42:18 -08:00
82e5ac3e7c Update Cilium from v1.8.5 to v1.9.0-rc3
* https://github.com/poseidon/terraform-render-bootstrap/pull/224
2020-11-03 10:29:07 -08:00
a8f7880511 Update Cilium from v1.8.4 to v1.8.5
* https://github.com/cilium/cilium/releases/tag/v1.8.5
2020-10-29 00:50:18 -07:00
cda5b93b09 Update kube-state-metrics from v2.0.0-alpha.1 to v2.0.0-alpha.2
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.2
2020-10-28 18:49:40 -07:00
3e9f5f34de Update Grafana from v7.2.2 to v7.3.0
* https://github.com/grafana/grafana/releases/tag/v7.3.0
2020-10-28 17:46:26 -07:00
893d139590 Update Calico from v3.16.3 to v3.16.4
* https://github.com/projectcalico/calico/releases/tag/v3.16.4
2020-10-26 00:50:40 -07:00
fc62e51b2a Update Grafana from v7.2.1 to v7.2.2
* https://github.com/grafana/grafana/releases/tag/v7.2.2
2020-10-22 00:14:04 -07:00
e5ba3329eb Remove bare-metal CoreOS Container Linux profiles
* Remove Matchbox profiles for CoreOS Container Linux
* Simplify the remaining Flatcat Linux profiles
2020-10-21 00:25:10 -07:00
7c3f3ab6d0 Rename container-linux modules to flatcar-linux
* CoreOS Container Linux was deprecated in v1.18.3
* Continue transitioning docs and modules from supporting
both CoreOS and Flatcar "variants" of Container Linux to
now supporting Flatcar Linux and equivalents

Action Required: Update the Flatcar Linux modules `source`
to replace `s/container-linux/flatcar-linux`. See docs for
examples
2020-10-20 22:47:19 -07:00
a99a990d49 Remove unused Kubelet tls mounts
* Kubelet trusts only the cluster CA certificate (and
certificates in the Kubelet debian base image), there
is no longer a need to mount the host's trusted certs
* Similar change on Flatcar Linux in
https://github.com/poseidon/typhoon/pull/855

Rel: https://github.com/poseidon/typhoon/pull/810
2020-10-18 23:48:21 -07:00
df17253e72 Fix delete node permission on Fedora CoreOS node shutdown
* On cloud platforms, `delete-node.service` tries to delete the
local node (not always possible depending on preemption time)
* Since v1.18.3, kubelet TLS bootstrap generates a kubeconfig
in `/var/lib/kubelet` which should be used with kubectl in
the delete-node oneshot
2020-10-18 23:38:11 -07:00
eda78db08e Change Flatcar kubelet.service container from rkt to docker
* Use docker to run the `kubelet.service` container
* Update Kubelet mounts to match Fedora CoreOS
* Remove unused `/etc/ssl/certs` mount (see
https://github.com/poseidon/typhoon/pull/810)
* Remove unused `/usr/share/ca-certificates` mount
* Remove `/etc/resolv.conf` mount, Docker default is ok
* Change `delete-node.service` to use docker instead of rkt
and inline ExecStart, as was done on Fedora CoreOS
* Fix permission denied on shutdown `delete-node`, caused
by the kubeconfig mount changing with the introduction of
node TLS bootstrap

Background

* podmand, rkt, and runc daemonless container process runners
provide advantages over the docker daemon for system containers.
Docker requires workarounds for use in systemd units where the
ExecStart must tail logs so systemd can monitor the daemonized
container. https://github.com/moby/moby/issues/6791
* Why switch then? On Flatcar Linux, podman isn't shipped. rkt
works, but isn't developing while container standards continue
to move forward. Typhoon has used runc for the Kubelet runner
before in Fedora Atomic, but its more low-level. So we're left
with Docker, which is less than ideal, but shipped in Flatcar
* Flatcar Linux appears to be shifting system components to
use docker, which does provide some limited guards against
breakages (e.g. Flatcar cannot enable docker live restore)
2020-10-18 23:24:45 -07:00
afac46e39a Remove asset_dir variable and optional asset writes
* Originally, poseidon/terraform-render-bootstrap generated
TLS certificates, manifests, and cluster "assets" written
to local disk (`asset_dir`) during terraform apply cluster
bootstrap
* Typhoon v1.17.0 introduced bootstrapping using only Terraform
state to store cluster assets, to avoid ever writing sensitive
materials to disk and improve automated use-cases. `asset_dir`
was changed to optional and defaulted to "" (no writes)
* Typhoon v1.18.0 deprecated the `asset_dir` variable, removed
docs, and announced it would be deleted in future.
* Add Terraform output `assets_dir` map
* Remove the `asset_dir` variable

Cluster assets are now stored in Terraform state only. For those
who wish to write those assets to local files, this is possible
doing so explicitly.

```
resource local_file "assets" {
  for_each = module.yavin.assets_dist
  filename = "some-assets/${each.key}"
  content = each.value
}
```

Related:

* https://github.com/poseidon/typhoon/pull/595
* https://github.com/poseidon/typhoon/pull/678
2020-10-17 15:00:15 -07:00
b1e680ac0c Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-10-17 13:56:24 -07:00
9fbfbdb854 Update Prometheus from v2.21.0 to v2.22.0
* https://github.com/prometheus/prometheus/releases/tag/v2.22.0
2020-10-17 12:38:25 -07:00
511f5272f4 Update Calico from v3.15.3 to v3.16.3
* https://github.com/projectcalico/calico/releases/tag/v3.16.3
* https://github.com/poseidon/terraform-render-bootstrap/pull/212
2020-10-15 20:08:51 -07:00
46ca5e8813 Update Kubernetes from v1.19.2 to v1.19.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193
2020-10-14 20:47:49 -07:00
394e496cc7 Update Grafana from v7.2.0 to v7.2.1
* https://github.com/grafana/grafana/releases/tag/v7.2.1
2020-10-11 13:21:25 -07:00
a38ec1a856 Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-10-11 13:06:53 -07:00
7881f4bd86 Update kube-state-metrics from v1.9.7 to v2.0.0-alpha.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.1
2020-10-11 12:35:43 -07:00
d5b5b7cb02 Update nginx-ingress from v0.40.0 to v0.40.2
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.2
2020-10-06 23:52:15 -07:00
759a48be7c Update mkdocs-material from v5.5.12 to v6.0.1
* Update OS kernel, systemd, and docker verisons
2020-10-02 01:18:38 -07:00
b39a1d70da Update nginx-ingress from v0.35.0 to v0.40.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.0
2020-10-02 01:00:35 -07:00
901f7939b2 Update Cilium from v1.8.3 to v1.8.4
* https://github.com/cilium/cilium/releases/tag/v1.8.4
2020-10-02 00:24:26 -07:00
d65085ce14 Update Grafana from v7.1.5 to v7.2.0
* https://github.com/grafana/grafana/releases/tag/v7.2.0
2020-09-24 20:58:32 -07:00
343db5b578 Remove references to CoreOS Container Linux
* CoreOS Container Linux was deprecated in v1.18.3 (May 2020)
in favor of Fedora CoreOS and Flatcar Linux. CoreOS Container
Linux references were kept to give folks more time to migrate,
but AMIs have now been deleted. Time is up.

Rel: https://coreos.com/os/eol/
2020-09-24 20:51:02 -07:00
444363be2d Update Kubernetes from v1.19.1 to v1.19.2
* Update flannel from v0.12.0 to v0.13.0-rc2
* Update flannel-cni from v0.4.0 to v0.4.1
* Update CNI plugins from v0.8.6 to v0.8.7
2020-09-16 20:05:54 -07:00
bc7ad25c60 Update Grafana dashboard for Kubelet v1.19
* Fix Kubelet pod and container count metrics dashboard
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/499
2020-09-15 23:21:56 -07:00
e838d4dc3d Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh upstream Prometheus rules/alerts and Grafana dashboards
2020-09-13 15:03:27 -07:00
979c092ef6 Reduce apiserver metrics cardinality of non-core APIs
* Reduce `apiserver_request_duration_seconds_count` cardinality
by dropping series for non-core Kubernetes APIs. This is done
to match `apiserver_request_duration_seconds_count` relabeling
* These two relabels must be performed the same way to avoid
affecting new SLO calculations (upcoming)
* See https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/498

Related: https://github.com/poseidon/typhoon/pull/596
2020-09-13 14:47:49 -07:00
db8e94bb4b Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-09-12 19:41:15 -07:00
eb093af9ed Drop Kubelet labelmap relabel for node_name
* Originally, Kubelet and CAdvisor metrics used a labelmap
relabel to add Kubernetes SD node labels onto timeseries
* With https://github.com/poseidon/typhoon/pull/596 that
relabel was dropped since node labels aren't usually that
valuable. `__meta_kubernetes_node_name` was retained but
the field name is empty
* Favor just using Prometheus server-side `instance` in
queries that require some node identifier for aggregation
or debugging

Fix https://github.com/poseidon/typhoon/issues/823
2020-09-12 19:40:00 -07:00
36096f844d Promote Cilium from experimental to GA
* Cilium was added as an experimental CNI provider in June
* Since then, I've been choosing it for an increasing number
of clusters and scenarios.
2020-09-12 19:24:55 -07:00
d236628e53 Update Prometheus from v2.20.0 to v2.21.0
* https://github.com/prometheus/prometheus/releases/tag/v2.21.0
2020-09-12 19:20:54 -07:00
577b927a2b Update Fedora CoreOS Config version from v1.0.0 to v1.1.0
* No notable changes in the config spec, just house keeping
* Require any snippets customization to update to v1.1.0. Version
skew between the main config and snippets will show an err message
* https://github.com/coreos/fcct/blob/master/docs/configuration-v1_1.md
2020-09-10 23:38:40 -07:00
000c11edf6 Update IngressClass resources to networking.k8s.io/v1
* Kubernetes v1.19 graduated Ingress and IngressClass from
networking.k8s.io/v1beta1 to networking.k8s.io/v1
2020-09-10 23:25:53 -07:00
29b16c3fc0 Change seccomp annotations to seccompProfile
* seccomp graduated to GA in Kubernetes v1.19. Support for
seccomp alpha annotations will be removed in v1.22
* Replace seccomp annotations with the GA seccompProfile
field in the PodTemplate securityContext
* Switch profile from `docker/default` to `runtime/default`
(no effective change, since docker is the runtime)
* Verify with docker inspect SecurityOpt. Without the profile,
you'd see `seccomp=unconfined`

Related: https://github.com/poseidon/terraform-render-bootstrap/pull/215
2020-09-10 01:15:07 -07:00
0c7a879bc4 Update Kubernetes from v1.19.0 to v1.19.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191
2020-09-09 20:52:29 -07:00
1e654c9e4e Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
* Update mkdocs-material from v5.5.11 to v5.5.12
2020-09-07 21:18:47 -07:00
28ee693e6b Update Cilium from v1.8.2 to v1.8.3
* https://github.com/cilium/cilium/releases/tag/v1.8.3
2020-09-07 21:10:27 -07:00
8c7d95aefd Update mkdocs-material from v5.5.9 to v5.5.11 2020-08-29 13:52:16 -07:00
d45dfdbf91 Update nginx-ingress from v0.34.1 to v0.35.0
* Repo changed to k8s.gcr.io/ingress-nginx/controller
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.35.0
2020-08-29 13:38:28 -07:00
d7e0536838 Add code group blocks to improve worker pool docs
* Show Fedora CoreOS and Flatcar Linux examples in
separate tabs, rather than trying to show one
* Add copyright footer for the poseidon org
2020-08-28 00:25:12 -07:00
8dd221a57c Add fleetlock docs and links to addons
* Add links to fleetlock for Fedora CoreOS reboot coordination
* https://github.com/poseidon/fleetlock
2020-08-28 00:02:24 -07:00
f17bb4cf61 Update mkdocs-material from v5.5.6 to v5.5.9 2020-08-27 09:20:18 -07:00
44f1fe620a Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-08-27 09:18:39 -07:00
a504264e24 Update Grafana from v7.1.4 to v7.1.5
* https://github.com/grafana/grafana/releases/tag/v7.1.5
2020-08-27 08:52:07 -07:00
88cf7273dc Update Kubernetes from v1.18.8 to v1.19.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md
2020-08-27 08:50:01 -07:00
58def65a09 Update Grafana from v7.1.3 to v7.1.4
* https://github.com/grafana/grafana/releases/tag/v7.1.4
2020-08-22 15:40:09 -07:00
cd7fd29194 Update etcd from v3.4.10 to v3.4.12
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md
2020-08-19 21:25:41 -07:00
aafa38476a Fix SELinux race condition on non-bootstrap controllers in multi-controller (#808)
* Fix race condition for bootstrap-secrets SELinux context on non-bootstrap controllers in multi-controller FCOS clusters
* On first boot from disk on non-bootstrap controllers, adding bootstrap-secrets races with kubelet.service starting, which can cause the secrets assets to have the wrong label until kubelet.service restarts (service, reboot, auto-update)
* This can manifest as `kube-apiserver`, `kube-controller-manager`, and `kube-scheduler` pods crashlooping on spare controllers on first cluster creation
2020-08-19 21:18:10 -07:00
9a07f1d30b Update recommended Terraform provider versions
* Sync Terraform provider plugin versions to those used
internally
* Update mkdocs-material from v5.5.1 to v5.5.6
* Fix minor details in docs
2020-08-14 10:05:52 -07:00
c87db3ef37 Update Kubernetes from v1.18.6 to v1.18.8
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1188
2020-08-13 20:47:43 -07:00
342380cfa4 Update Terraform migration guide SHA
* Mention the first master branch SHA that introduced Terraform
v0.13 forward compatibility
* Link the migration guide on Github until a release is available
and website docs are published
2020-08-13 00:36:47 -07:00
5e70d7e2c8 Migrate from Terraform v0.12.x to v0.13.x
* Recommend Terraform v0.13.x
* Support automatic install of poseidon's provider plugins
* Update tutorial docs for Terraform v0.13.x
* Add migration guide for Terraform v0.13.x (best-effort)
* Require Terraform v0.12.26+ (migration compatibility)
* Require `terraform-provider-ct` v0.6.1
* Require `terraform-provider-matchbox` v0.4.1
* Require `terraform-provider-digitalocean` v1.20+

Related:

* https://www.hashicorp.com/blog/announcing-hashicorp-terraform-0-13/
* https://www.terraform.io/upgrade-guides/0-13.html
* https://registry.terraform.io/providers/poseidon/ct/latest
* https://registry.terraform.io/providers/poseidon/matchbox/latest
2020-08-12 01:54:32 -07:00
aab071309f Update recommended Terraform provider versions
* Sync Terraform provider plugin versions to those used
internally
2020-08-09 12:40:22 -07:00
f6ce12766b Allow terraform-provider-aws v3.0+ plugin
* Typhoon AWS is compatible with terraform-provider-aws v3.x releases
* Continue to allow v2.23+, no v3.x specific features are used
* Set required provider versions in the worker module, since
it can be used independently

Related:

* https://github.com/terraform-providers/terraform-provider-aws/releases/tag/v3.0.0
2020-08-09 12:39:26 -07:00
e1d6ab2f24 Update Grafana from v7.1.1 to v7.1.3
* https://github.com/grafana/grafana/releases/tag/v7.1.3
* https://github.com/grafana/grafana/releases/tag/v7.1.2
2020-08-08 18:59:49 -07:00
8b3d41d6a0 Update mkdocs-material from v5.4.0 to v5.5.1 2020-08-02 15:22:10 -07:00
ccee5d3d89 Update from coreos/flannel-cni to poseidon/flannel-cni
* Update CNI plugins from v0.6.0 to v0.8.6 to fix several CVEs
* Update the base image to alpine:3.12
* Use `flannel-cni` as an init container and remove sleep
* https://github.com/poseidon/terraform-render-bootstrap/pull/205
* https://github.com/poseidon/flannel-cni
* https://quay.io/repository/poseidon/flannel-cni

Background

* Switch from github.com/coreos/flannel-cni v0.3.0 which was last
published by me in 2017 and is no longer accessible to me to maintain
or patch
* Port to the poseidon/flannel-cni rewrite, which releases v0.4.0
to continue the prior release numbering
2020-08-02 15:13:15 -07:00
8aefd4f082 Relex terraform-provider-matchbox version constraint
* Allow use of terraform-provider-matchbox v0.3+ (which
allows v0.3.0 <= version < v1.0) for any pre 1.0 release
* Before, the requirement was v0.3.0 <= version < v0.4.0
2020-08-02 01:09:28 -07:00
78e6409bd0 Fix flannel support on Fedora CoreOS
* Fedora CoreOS now ships systemd-udev's `default.link` while
Flannel relies on being able to pick its own MAC address for
the `flannel.1` link for tunneled traffic to reach cni0 on
the destination side, without being dropped
* This change first appeared in FCOS testing-devel 32.20200624.20.1
and is the behavior going forward in FCOS since it was added
to align FCOS network naming / configs with the rest of Fedora
and address issues related to the default being missing
* Flatcar Linux (and Container Linux) has a specific flannel.link
configuration builtin, so it was not affected
* https://github.com/coreos/fedora-coreos-tracker/issues/574#issuecomment-665487296

Note: Typhoon's recommended and default CNI provider is Calico,
unless `networking` is set to flannel directly.
2020-08-01 21:22:08 -07:00
2aef42d4f6 Update Prometheus from v2.19.2 to v2.20.0
* https://github.com/prometheus/prometheus/releases/tag/v2.20.0
2020-07-25 16:37:28 -07:00
b7d67757de Update Grafana from v7.1.0 to v7.1.1
* https://github.com/grafana/grafana/releases/tag/v7.1.1
2020-07-25 16:33:40 -07:00
26f5d2d753 Fix some links in docs (#788) 2020-07-25 16:32:08 -07:00
cd0a28904e Update Cilium from v1.8.1 to v1.8.2
* https://github.com/cilium/cilium/releases/tag/v1.8.2
2020-07-25 16:06:27 -07:00
618f8b30fd Update CoreDNS from v1.6.7 to v1.7.0
* https://coredns.io/2020/06/15/coredns-1.7.0-release/
* Update Grafana dashboard with revised metrics names
2020-07-25 15:51:31 -07:00
264d23a1b5 Declare etcd data directory permissions
* Set etcd data directory /var/lib/etcd permissions to 700
* On Flatcar Linux, /var/lib/etcd is pre-existing and Ignition
v2 doesn't overwrite the directory. Update the Container Linux
config, but add the manual chmod workaround to bootstrap for
Flatcar Linux users
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v3410-2020-07-16
* https://github.com/etcd-io/etcd/pull/11798
2020-07-25 15:48:27 -07:00
f96e91f225 Update etcd from v3.4.9 to v3.4.10
* https://github.com/etcd-io/etcd/releases/tag/v3.4.10
2020-07-18 14:08:22 -07:00
efd4a0319d Update Grafana from v7.0.6 to v7.1.0
* https://github.com/grafana/grafana/releases/tag/v7.1.0
2020-07-18 13:54:56 -07:00
6df6bf904a Show Cilium as a CNI provider option in docs
* Start to show Cilium as a CNI option
* https://github.com/cilium/cilium
2020-07-18 13:27:56 -07:00
5fba20d358 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions with those
used internally
2020-07-18 13:19:25 -07:00
a8d3d3bb12 Update ingress-nginx from v0.33.0 to v0.34.1
* Switch to ingress-nginx controller images from us.grc.io (eu, asia
can also be used if desired)
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.0
2020-07-15 22:43:49 -07:00
9ea6d2c245 Update Kubernetes from v1.18.5 to v1.18.6
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1186
* https://github.com/poseidon/terraform-render-bootstrap/pull/201
2020-07-15 22:05:57 -07:00
507aac9b78 Update mkdocs-material from v5.3.3 to v5.4.0 2020-07-11 22:56:59 -07:00
dfd2a0ec23 Update Grafana from v7.0.5 to v7.0.6
* https://github.com/grafana/grafana/releases/tag/v7.0.6
2020-07-09 21:10:48 -07:00
e3bf7d8f9b Update Prometheus from v2.19.1 to v2.19.2
* https://github.com/prometheus/prometheus/releases/tag/v2.19.2
2020-07-09 21:08:55 -07:00
49050320ce Update Cilium from v1.8.0 to v1.8.1
* https://github.com/cilium/cilium/releases/tag/v1.8.1
2020-07-05 16:00:00 -07:00
74e025c9e4 Update Grafana from v7.0.4 to v7.0.5
* https://github.com/grafana/grafana/releases/tag/v7.0.5
2020-07-05 15:49:34 -07:00
257a49ce37 Remove CoreOS Container Linux image names from docs
* Remove coreos-stable, coreos-beta, and coreos-alpha channel
references from docs
* CoreOS Container Linux is end of life (see changelog)
2020-06-30 01:36:53 -07:00
df3f40bcce Allow using Flatcar Linux edge on Azure
* Set Kubelet cgroup driver to systemd when Flatcar Linux edge
is chosen

Note: Typhoon module status assumes use of the stable variant of
an OS channel/stream. Its possible to use earlier variants and
those are sometimes tested or developed against, but stable is
the recommendation
2020-06-30 01:35:29 -07:00
32886cfba1 Promote Fedora CoreOS on Google Cloud to stable status 2020-06-29 23:09:11 -07:00
0ba2c1a4da Fix terraform fmt in firewall rules 2020-06-29 23:04:54 -07:00
430d139a5b Remove os_image variable on Google Cloud Fedora CoreOS
* In v1.18.3, the `os_stream` variable was added to select
a Fedora CoreOS image stream (stable, testing, next) on
AWS and Google Cloud (which publish official streams)
* Remove `os_image` variable deprecated in v1.18.3. Manually
uploaded images are no longer needed
2020-06-29 22:57:11 -07:00
7c6ab21b94 Isolate each DigitalOcean cluster in its own VPC
* DigitalOcean introduced Virtual Private Cloud (VPC) support
to match other clouds and enhance the prior "private networking"
feature. Before, droplet's belonging to different clusters (but
residing in the same region) could reach one another (although
Typhoon firewall rules prohibit this). Now, droplets in a VPC
reside in their own network
* https://www.digitalocean.com/docs/networking/vpc/
* Create droplet instances in a VPC per cluster. This matches the
design of Typhoon AWS, Azure, and GCP.
* Require `terraform-provider-digitalocean` v1.16.0+ (action required)
* Output `vpc_id` for use with an attached DigitalOcean
loadbalancer
2020-06-28 23:25:30 -07:00
21178868db Revert "Update Prometheus from v2.19.1 to v2.19.2"
* Prometheus has not published the v1.19.2
* This reverts commit 81b6f54169.
2020-06-27 14:53:58 -07:00
9dcf35e393 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions with those
used internally
2020-06-27 14:44:18 -07:00
81b6f54169 Update Prometheus from v2.19.1 to v2.19.2
* https://github.com/prometheus/prometheus/releases/tag/v2.19.2
2020-06-27 14:34:30 -07:00
7bce15975c Update Kubernetes from v1.18.4 to v1.18.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1185
2020-06-27 13:52:18 -07:00
1f83ae7dbb Update Calico from v3.14.1 to v3.15.0
* https://docs.projectcalico.org/v3.15/release-notes/
2020-06-26 02:40:12 -07:00
a10a1cee9f Update mkdocs-material from v5.3.0 to v5.3.3 2020-06-26 02:24:37 -07:00
a79ad34ba3 Update Grafana from v7.0.3 to v7.0.4
* https://github.com/grafana/grafana/releases/tag/v7.0.4
2020-06-26 02:06:38 -07:00
99a11442c7 Update Prometheus from v2.19.0 to v2.19.1
* https://github.com/prometheus/prometheus/releases/tag/v2.19.1
2020-06-26 02:01:58 -07:00
d27f367004 Update Cilium from v1.8.0-rc4 to v1.8.0
* https://github.com/cilium/cilium/releases/tag/v1.8.0
2020-06-22 22:26:49 -07:00
e9c8520359 Add experimental Cilium CNI provider
* Accept experimental CNI `networking` mode "cilium"
* Run Cilium v1.8.0-rc4 with overlay vxlan tunnels and a
minimal set of features. We're interested in:
  * IPAM: Divide pod_cidr into /24 subnets per node
  * CNI networking pod-to-pod, pod-to-external
  * BPF masquerade
  * NetworkPolicy as defined by Kubernetes (no L7 Policy)
* Continue using kube-proxy with Cilium probe mode
* Firewall changes:
  * Require UDP 8472 for vxlan (Linux kernel default) between nodes
  * Optional ICMP echo(8) between nodes for host reachability
    (health)
  * Optional TCP 4240 between nodes for endpoint reachability (health)

Known Issues:

* Containers with `hostPort` don't listen on all host addresses,
these workloads must use `hostNetwork` for now
https://github.com/cilium/cilium/issues/12116
* Erroneous warning on Fedora CoreOS
https://github.com/cilium/cilium/issues/10256

Note: This is experimental. It is not listed in docs and may be
changed or removed without a deprecation notice

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/192
* https://github.com/cilium/cilium/issues/12217
2020-06-21 20:41:53 -07:00
37f00a3882 Reduce Calcio MTU on Fedora CoreOS Azure
* Change the Calico VXLAN interface for MTU from 1450 to 1410
* VXLAN on Azure should support MTU 1450. However, there is
history where performance measures have shown that 1410 is
needed to have expected performance. Flatcar Linux has the
same MTU 1410 override and note
* FCOS 31.20200323.3.2 was known to perform fine with 1450, but
now in 31.20200517.3.0 the right value seems to be 1410
2020-06-19 00:24:56 -07:00
4cfafeaa07 Fix Kubelet starting before hostname set on FCOS AWS
* Fedora CoreOS `kubelet.service` can start before the hostname
is set. Kubelet reads the hostname to determine the node name to
register. If the hostname was read as localhost, Kubelet will
continue trying to register as localhost (problem)
* This race manifests as a node that appears NotReady, the Kubelet
is trying to register as localhost, while the host itself (by then)
has an AWS provided hostname. Restarting kubelet.service is a
manual fix so Kubelet re-reads the hostname
* This race could only be shown on AWS, not on Google Cloud or
Azure despite attempts. Bare-metal and DigitalOcean differ and
use hostname-override (e.g. afterburn) so they're not affected
* Wait for nodes to have a non-localhost hostname in the oneshot
that awaits /etc/resolve.conf. Typhoon has no valid cases for a
node hostname being localhost (not even single-node clusters)

Related Openshift: https://github.com/openshift/machine-config-operator/pull/1813
Close https://github.com/poseidon/typhoon/issues/765
2020-06-19 00:19:54 -07:00
90e23f5822 Rename controller node label and NoSchedule taint
* Remove node label `node.kubernetes.io/master` from controller nodes
* Use `node.kubernetes.io/controller` (present since v1.9.5,
[#160](https://github.com/poseidon/typhoon/pull/160)) to node select controllers
* Rename controller NoSchedule taint from `node-role.kubernetes.io/master` to
`node-role.kubernetes.io/controller`
* Tolerate the new taint name for workloads that may run on controller nodes
and stop tolerating `node-role.kubernetes.io/master` taint
2020-06-19 00:12:13 -07:00
6234147948 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions with those
used internally
2020-06-18 01:03:37 -07:00
c25c59058c Update Kubernetes from v1.18.3 to v1.18.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1184
2020-06-17 19:53:19 -07:00
bc9b808d44 Update nginx-ingress from v0.32.0 to v0.33.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-0.33.0
2020-06-16 18:44:40 -07:00
4b0203fdb2 Fix typo in DigitalOcean docs title 2020-06-16 18:33:56 -07:00
331566e1f7 Update mkdocs packages for website 2020-06-16 18:20:19 -07:00
04520e447c Update node-exporter from v1.0.0 to v1.0.1
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.1
2020-06-16 17:57:09 -07:00
413585681b Remove unused Kubelet lock-file and exit-on-lock-contention
* Kubelet `--lock-file` and `--exit-on-lock-contention` date
back to usage of bootkube and at one point running Kubelet
in a "self-hosted" style whereby an on-host Kubelet (rkt)
started pods, but then a Kubelet DaemonSet was scheduled
and able to take over (hence self-hosted). `lock-file` and
`exit-on-lock-contention` flags supported this pivot. The
pattern has been out of favor (in bootkube too) for years
because of dueling Kubelet complexity
* Typhoon runs Kubelet as a container via an on-host systemd
unit using podman (Fedora CoreOS) or rkt (Flatcar Linux). In
fact, Typhoon no longer uses bootkube or control plane pivot
(let alone Kubelet pivot) and uses static pods since v1.16.0
* https://github.com/poseidon/typhoon/pull/536
2020-06-12 00:06:41 -07:00
96711d7f17 Remove unused Kubelet cert / key Terraform state
* Generated Kubelet TLS certificate and key are not longer
used or distributed to machines since Kubelet TLS bootstrap
is used instead. Remove the certificate and key from state
2020-06-11 21:24:36 -07:00
c9059d3fe9 Update Prometheus from v2.19.0-rc.0 to v2.19.0
* https://github.com/prometheus/prometheus/releases/tag/v2.19.0
2020-06-09 23:05:03 -07:00
a287920169 Use strict mode for Container Linux Configs
* Enable terraform-provider-ct `strict` mode for parsing
Container Linux Configs and snippets
* Fix Container Linux Config systemd unit syntax `enable`
(old) to `enabled`
* Align with Fedora CoreOS which uses strict mode already
2020-06-09 23:00:36 -07:00
8dc170b9d9 Update security disclosure contact email
* Use security@psdn.io across github.com/poseidon projects
2020-06-08 12:37:09 -07:00
aed1a5f33d Fix Fedora CoreOS docs for selecting a stream
* Fedora CoreOS image `os_stream` stable, testing, and next
have been configurable since v1.18.3
* Remove mention of outdated `os_image` variable
2020-06-08 12:25:57 -07:00
31d02b0221 Update Prometheus from v2.18.1 to v2.19.0-rc.0
* https://github.com/prometheus/prometheus/releases/tag/v2.19.0-rc.0
2020-06-05 00:16:45 -07:00
8f875f80f5 Update Grafana from v7.0.1 to v7.0.3
* https://github.com/grafana/grafana/releases/tag/v7.0.2
* https://github.com/grafana/grafana/releases/tag/v7.0.3
2020-06-03 12:31:58 -07:00
16c0b9152b Update kube-state-metrics from v1.9.6 to v1.9.7
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.7
2020-06-03 11:35:10 -07:00
99dbce67a3 Tweak minor style elements of issue templates 2020-05-31 16:19:33 -07:00
20bfd69780 Change Kubelet container image publishing
* Build Kubelet container images internally and publish
to Quay and Dockerhub (new) as an alternative in case of
registry outage or breach
* Use our infra to provide single and multi-arch (default)
Kublet images for possible future use
* Docs: Show how to use alternative Kubelet images via
snippets and a systemd dropin (builds on #737)

Changes:

* Update docs with changes to Kubelet image building
* If you prefer to trust images built by Quay/Dockerhub,
automated image builds are still available with unique
tags (albeit with some limitations):
  * Quay automated builds are tagged `build-{short_sha}`
  (limit: only amd64)
  * Dockerhub automated builts are tagged `build-{tag}`
  and `build-master` (limit: only amd64, no shas)

Links:

* Kubelet: https://github.com/poseidon/kubelet
* Docs: https://typhoon.psdn.io/topics/security/#container-images
* Registries:
  * quay.io/poseidon/kubelet
  * docker.io/psdn/kubelet
2020-05-30 23:34:23 -07:00
ba44408b76 Update Calico from v3.14.0 to v3.14.1
* https://docs.projectcalico.org/v3.14/release-notes/
2020-05-30 22:08:37 -07:00
455175d9e6 Update the fallback issue template
* Even "blank" issues need to fill out the fallback
template
2020-05-28 00:06:59 -07:00
d45804b1f6 Update Github issue template to use drop-downs (#747)
* Create a stricter bug report template
* Highlight topics that are not accepted in issues: operation, support, debugging, advice, or Kubernetes concepts
* Add a section to strongly suggest bug reports link a PR or describe a solution. This may be able to weed out topics that aren't focused bug reports
2020-05-27 23:49:25 -07:00
907a96916f Update mkdocs-material from v5.2.0 to v5.2.2
* https://github.com/squidfunk/mkdocs-material/releases/tag/5.2.2
2020-05-27 21:49:40 -07:00
187bb17d39 Update Grafana from v7.0.0 to v7.0.1
* https://github.com/grafana/grafana/releases/tag/v7.0.1
2020-05-27 21:35:24 -07:00
abc31c3711 Update node-exporter from v1.0.0-rc.1 to v1.0.0
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0
2020-05-27 21:33:03 -07:00
283e14f3e0 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions to those actively
used internally
* Fix terraform fmt
2020-05-22 01:12:53 -07:00
e72f916c8d Update etcd from v3.4.8 to v3.4.9
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v349-2020-05-20
2020-05-22 00:52:20 -07:00
c52f9f8d08 Upgrade docs packages and refresh content
* Promote DigitalOcean from alpha to beta for Fedora
CoreOS and Flatcar Linux
* Upgrade mkdocs-material and PyPI packages for docs
* Replace docs mentions of Container Linux with Flatcar
Linux and move docs/cl to docs/flatcar-linux
* Deprecate CoreOS Container Linux support. Its still
usable for some time, but start removing docs
2020-05-20 23:31:26 -07:00
ecae6679ff Update Kubernetes from v1.18.2 to v1.18.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md
2020-05-20 20:37:39 -07:00
4760543356 Set Kubelet image via kubelet.service KUBELET_IMAGE
* Write the systemd kubelet.service to use `KUBELET_IMAGE`
as the Kubelet. This provides a nice way to use systemd
dropins to temporarily override the image (e.g. during a
registry outage)

Note: Only Typhoon Kubelet images and registries are supported.
2020-05-19 22:39:53 -07:00
09eb208b4e Fix Fedora CoreOS on GCP proposing controller recreate
* With Fedora CoreOS image stream support (#727), the latest
resolved image will change over the lifecycle of a cluster.
* Fix issue where an image diff proposed replacing a Fedora
CoreOS controller on GCP, introduced in #727 (unreleased)
* Also ignore image diffs to the GCP managed instance group
of workers. This aligns with worker AMI diffs being ignored
on AWS and similar on Azure, since workers update themselves.

Background:

* Controller nodes should strictly not be recreated by Terraform,
they are stateful (etcd) and should not be replaced
* Across cloud platforms, OS image diffs are ignored since both
Flatcar Linux and Fedora CoreOS nodes update themselves. For
workers, user-data or disk size diffs (where relevant) are allowed
to recreate workers templates/configs since these are considered
to be user-initiated declarations that a reprovision should be done
2020-05-19 21:41:51 -07:00
8d024d22ad Update etcd from v3.4.7 to v3.4.8
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v348-2020-05-18
2020-05-18 23:50:46 -07:00
3bdddc452c Update Grafana from v7.0.0-beta2 to v7.0.0
* https://grafana.com/docs/grafana/latest/guides/whats-new-in-v7-0/
2020-05-18 23:42:32 -07:00
ff4187a1fb Use new Azure subnet to set address_prefixes list
* Update Azure subnet `address_prefix` to `azure_prefixes` list
* Fix warning that `address_prefix` is deprecated
* Require `terraform-provider-azurerm` v2.8.0+ (action required)

Rel: https://github.com/terraform-providers/terraform-provider-azurerm/pull/6493
2020-05-18 23:35:47 -07:00
2578be1f96 Rollback Grafana to v7.0.0-beta3, v7.0.0 image is missing
* Grafana hasn't published the v7.0.0 image yet
2020-05-16 12:32:10 -07:00
90edcd3d77 Update node-exporter from v1.0.0-rc.0 to v1.0.0-rc.1
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.1
2020-05-15 18:03:19 -07:00
a927c7c790 Update kube-state-metrics from v1.9.5 to v1.9.6
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.6
2020-05-15 17:42:24 -07:00
d952576d2f Update Grafana from v7.0.0-beta3 to v7.0.0
* https://github.com/grafana/grafana/releases/tag/7.0.0
2020-05-15 17:38:59 -07:00
70e389f37f Restore use of Flatcar Linux Azure Marketplace image
* Switch Flatcar Linux Azure to use the Marketplace image
from Kinvolk (offer `flatcar-container-linux-free`)
* Accepting Azure Marketplace terms is still neccessary,
update docs to show accepting the free offer rather than
BYOL

* Upstream Flatcar: https://github.com/flatcar-linux/Flatcar/issues/82
* Typhoon: https://github.com/poseidon/typhoon/issues/703
2020-05-13 22:50:24 -07:00
a18bd0a707 Highlight SELinux enforcing mode in features 2020-05-13 21:57:38 -07:00
01905b00bc Support Fedora CoreOS OS image streams on AWS
* Add `os_stream` variable to set the stream to stable (default),
testing, or next
* Remove unused os_image variable on Fedora CoreOS AWS
2020-05-13 21:45:12 -07:00
f4194cd57a Update Grafana from v7.0.0-beta2 to v7.0.0-beta.3
* https://github.com/grafana/grafana/releases/tag/v7.0.0-beta3
2020-05-09 17:50:40 -07:00
a2db4fa8c4 Update Calico from v3.13.3 to v3.14.0
* https://docs.projectcalico.org/v3.14/release-notes/
2020-05-09 16:05:30 -07:00
358854e712 Fix Calico install-cni crash loop on Pod restarts
* Set a consistent MCS level/range for Calico install-cni
* Note: Rebooting a node was a workaround, because Kubelet
relabels /etc/kubernetes(/cni/net.d)

Background:

* On SELinux enforcing systems, the Calico CNI install-cni
container ran with default SELinux context and a random MCS
pair. install-cni places CNI configs by first creating a
temporary file and then moving them into place, which means
the file MCS categories depend on the containers SELinux
context.
* calico-node Pod restarts creates a new install-cni container
with a different MCS pair that cannot access the earlier
written file (it places configs every time), causing the
init container to error and calico-node to crash loop
* https://github.com/projectcalico/cni-plugin/issues/874

```
mv: inter-device move failed: '/calico.conf.tmp' to
'/host/etc/cni/net.d/10-calico.conflist'; unable to remove target:
Permission denied
Failed to mv files. This may be caused by selinux configuration on
the
host, or something else.
```

Note, this isn't a host SELinux configuration issue.

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/186
2020-05-09 16:01:44 -07:00
b5dabcea31 Use Fedora CoreOS image streams on Google Cloud
* Add `os_stream` variable to set a Fedora CoreOS stream
to `stable` (default), `testing`, or `next`
* Deprecate `os_image` variable. Remove docs about uploading
Fedora CoreOS images manually, this is no longer needed
* https://docs.fedoraproject.org/en-US/fedora-coreos/update-streams/

Rel: https://github.com/coreos/fedora-coreos-docs/pull/70
2020-05-08 01:23:12 -07:00
3f0a5d2715 Update Grafana from v7.0.0-beta1 to v7.0.0-beta2
* https://github.com/grafana/grafana/releases/tag/v7.0.0-beta2
2020-05-07 23:04:44 -07:00
33173c0206 Update Prometheus from v2.18.0 to v2.18.1
* https://github.com/prometheus/prometheus/releases/tag/v2.18.1
2020-05-07 22:59:11 -07:00
70f30d9c07 Update Prometheus from v2.18.0-rc.1 to v2.18.0
* https://github.com/prometheus/prometheus/releases/tag/v2.18.0
2020-05-05 22:31:11 -07:00
6afc1643d9 Update nginx-ingress from v0.30.0 to v0.32.0
* Add support for IngressClass and RBAC authorization
* Since our nginx ingress controller example uses the flag
`--ingress-class=public`, add an IngressClass to go along
with it

Rel: https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class
2020-05-03 23:24:19 -07:00
e71e27e769 Update Prometheus from v2.17.2 to v2.18.0-rc.1
* https://github.com/prometheus/prometheus/releases/tag/v2.18.0-rc.1
2020-04-29 20:57:48 -07:00
64035005d4 Update Grafana from v6.7.2 to v7.0.0-beta1
* https://github.com/grafana/grafana/releases/tag/v7.0.0-beta1
2020-04-29 20:53:30 -07:00
317416b316 Use Terraform element wrap-around for AWS controllers subnet_id (#714)
* Fix Terraform plan error when controller_count exceeds available AWS zones (e.g. 5 controllers)
2020-04-29 20:41:08 -07:00
2c1af917ec Update recommended Terraform provider versions
* Sync the Terraform provider plugin versions to those
actively used and tested by the author
* Fix terraform fmt
2020-04-28 19:57:50 -07:00
4ac2d94999 Add Fedora CoreOS Azure docs to site navigation
* Fix missing Fedora CoreOS Azure docs
2020-04-28 19:54:37 -07:00
fd044ee117 Enable Kubelet TLS bootstrap and NodeRestriction
* Enable bootstrap token authentication on kube-apiserver
* Generate the bootstrap.kubernetes.io/token Secret that
may be used as a bootstrap token
* Generate a bootstrap kubeconfig (with a bootstrap token)
to be securely distributed to nodes. Each Kubelet will use
the bootstrap kubeconfig to authenticate to kube-apiserver
as `system:bootstrappers` and send a node-unique CSR for
kube-controller-manager to automatically approve to issue
a Kubelet certificate and kubeconfig (expires in 72 hours)
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the `system:node-bootstrapper`
ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr nodeclient ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr selfnodeclient ClusterRole
* Enable NodeRestriction admission controller to limit the
scope of Node or Pod objects a Kubelet can modify to those of
the node itself
* Ability for a Kubelet to delete its Node object is retained
as preemptible nodes or those in auto-scaling instance groups
need to be able to remove themselves on shutdown. This need
continues to have precedence over any risk of a node deleting
itself maliciously

Security notes:

1. Issued Kubelet certificates authenticate as user `system:node:NAME`
and group `system:nodes` and are limited in their authorization
to perform API operations by Node authorization and NodeRestriction
admission. Previously, a Kubelet's authorization was broader. This
is the primary security motivation.

2. The bootstrap kubeconfig credential has the same sensitivity
as the previous generated TLS client-certificate kubeconfig.
It must be distributed securely to nodes. Its compromise still
allows an attacker to obtain a Kubelet kubeconfig

3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers
a slight security improvement.
  * An attacker who obtains the kubeconfig can likely obtain the
  bootstrap kubeconfig as well, to obtain the ability to renew
  their access
  * A compromised bootstrap kubeconfig could plausibly be handled
  by replacing the bootstrap token Secret, distributing the token
  to new nodes, and expiration. Whereas a compromised TLS-client
  certificate kubeconfig can't be revoked (no CRL). However,
  replacing a bootstrap token can be impractical in real cluster
  environments, so the limited lifetime is mostly a theoretical
  benefit.
  * Cluster CSR objects are visible via kubectl which is nice

4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet
clients have more identity information, which can improve the
utility of audits and future features

Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/185
2020-04-28 19:35:33 -07:00
38a6bddd06 Update Calico from v3.13.1 to v3.13.3
* https://docs.projectcalico.org/v3.13/release-notes/
2020-04-23 23:58:02 -07:00
d8966afdda Remove extraneous sudo from layout asset unpacking 2020-04-22 20:28:01 -07:00
84ed0a31c3 Update Prometheus from v2.17.1 to v2.17.2
* https://github.com/prometheus/prometheus/releases/tag/v2.17.2
2020-04-20 18:09:24 -07:00
fcbee12334 Fix race condition creating DigitalOcean firewall rules
* DigitalOcean firewall rules should reference Terraform tag
resources rather than using tag strings. Otherwise, terraform
apply can fail (neeeds rerun) if a tag has not yet been created
2020-04-19 16:55:02 -07:00
feac94605a Fix bootstrap mount to use shared volume SELinux label
* Race: During initial bootstrap, static control plane pods
could hang with Permission denied to bootstrap secrets. A
manual fix involved restarting Kubelet, which relabeled mounts
The race had no effect on subsequent reboots.
* bootstrap.service runs podman with a private unshared mount
of /etc/kubernetes/bootstrap-secrets which uses an SELinux MCS
label with a category pair. However, bootstrap-secrets should
be shared as its mounted by Docker pods kube-apiserver,
kube-scheduler, and kube-controller-manager. Restarting Kubelet
was a manual fix because Kubelet relabels all /etc/kubernetes
* Fix bootstrap Pod to use the shared volume label, which leaves
bootstrap-secrets files with SELinux level s0 without MCS
* Also allow failed bootstrap.service to be re-applied. This was
missing on bare-metal and AWS
2020-04-19 16:31:32 -07:00
2b1b918b43 Revert Flatcar Linux Azure to manual upload images
* Initial support for Flatcar Linux on Azure used the Flatcar
Linux Azure Marketplace images (e.g. `flatcar-stable`) in
https://github.com/poseidon/typhoon/pull/664
* Flatcar Linux Azure Marketplace images have some unresolved
items https://github.com/poseidon/typhoon/issues/703
* Until the Marketplace items are resolved, revert to requiring
Flatcar Linux's images be manually uploaded (like GCP and
DigitalOcean)
2020-04-18 15:40:57 -07:00
bf22222f7d Remove temporary workaround for v1.18.0 apply issue
* In v1.18.0, kubectl apply would fail to apply manifests if any
single manifest was unable to validate. For example, if a CRD and
CR were defined in the same directory, apply would fail since the
CR would be invalid as the CRD wouldn't exist
* Typhoon temporary workaround was to separate CNI CRD manifests
and explicitly apply them first. No longer needed in v1.18.1+
* Kubernetes v1.18.1 restored the prior behavior where kubectl apply
applies as many valid manifests as it can. In the example above, the
CRD would be applied and the CR could be applied if the kubectl apply
was re-run (allowing for apply loops).
* Upstream fix: https://github.com/kubernetes/kubernetes/pull/89864
2020-04-16 23:49:55 -07:00
671eacb86e Update Kubernetes from v1.18.1 to v1.18.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#changelog-since-v1181
2020-04-16 23:40:52 -07:00
e2d4af43be Fix Fedora CoreOS Azure MTU with Calico
* With Calico VXLAN on Fedora CoreOS the 1450 MTU should
be used
2020-04-12 23:20:04 -07:00
5c4a3f73d5 Add support for Fedora CoreOS on Azure
* Add `azure/fedora-coreos/kubernetes` module
2020-04-12 16:35:49 -07:00
76ab4c4c2a Change container-linux module preference to Flatcar Linux
* No change to Fedora CoreOS modules
* For Container Linx AWS and Azure, change the `os_image` default
from coreos-stable to flatcar-stable
* For Container Linux GCP and DigitalOcean, change `os_image` to
be required since users should upload a Flatcar Linux image and
set the variable
* For Container Linux bare-metal, recommend users change the
`os_channel` to Flatcar Linux. No actual module change.
2020-04-11 14:52:30 -07:00
1627ecaf27 Fix docs TOC to include Fedora CoreOS DigitalOcean 2020-04-11 14:07:09 -07:00
1420700bc0 Update CHANGES for v1.18.1 release
* Change order of modules in the README
2020-04-11 13:23:49 -07:00
80538e2953 Add support for Fedora CoreOS on DigitalOcean
* Add `digital-ocean/fedora-coreos/kubernetes` module
* DigitalOcean custom uploaded images do not permit
droplet IPv6 networking
2020-04-09 23:55:29 -07:00
73af2f3b7c Update Kubernetes from v1.18.0 to v1.18.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1181
2020-04-08 19:41:48 -07:00
17ea547723 Update etcd from v3.4.5 to v3.4.7
* https://github.com/etcd-io/etcd/releases/tag/v3.4.7
* https://github.com/etcd-io/etcd/releases/tag/v3.4.6
2020-04-06 21:09:25 -07:00
2b5dfece93 Update Grafana from v6.7.1 to v6.7.2
* https://github.com/grafana/grafana/releases/tag/v6.7.2
2020-04-04 13:13:19 -07:00
d47d40b517 Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh upstream Prometheus rules and alerts and Grafana
dashboards
* All Loki recording rules for convenience
2020-03-31 21:53:01 -07:00
3c1be7b0e0 Fix terraform fmt 2020-03-31 21:42:51 -07:00
bbbaf949f9 Fix UDP outbound and clock sync timeouts on Azure workers
* Add "lb" outbound rule for worker TCP _and_ UDP traffic
* Fix Azure worker nodes clock synchronization being inactive
due to timeouts reaching the CoreOS / Flatcar NTP pool
* Fix Azure worker nodes not providing outbount UDP connectivity

Background:

Azure provides VMs outbound connectivity either by having a public
IP or via an SNAT masquerade feature bundled with their virtual
load balancing abstraction (in contrast with, say, a NAT gateway).

Azure worker nodes have only a private IP, but are associated with
the cluster load balancer's backend pool and ingress frontend IP.
Outbound traffic uses SNAT with this frontend IP. A subtle detail
with Azure SNAT seems to be that since both inbound lb_rule's are
TCP only, outbound UDP traffic isn't SNAT'd (highlights the reasons
Azure shouldn't have conflated inbound load balancing with outbound
SNAT concepts). However, adding a separate outbound rule and
disabling outbound SNAT on our ingress lb_rule's we can tell Azure
to continue load balancing as before, and support outbound SNAT for
worker traffic of both the TCP and UDP protocol.

Fixes clock synchronization timeouts:

```
systemd-timesyncd[786]: Timed out waiting for reply from
45.79.36.123:123 (3.flatcar.pool.ntp.org)
```

Azure controller nodes have their own public IP, so controllers (and
etcd) nodes have not had clock synchronization or outbound UDP issues
2020-03-31 21:00:16 -07:00
135c6182b8 Update flannel from v0.11.0 to v0.12.0
* https://github.com/coreos/flannel/releases/tag/v0.12.0
2020-03-31 18:31:59 -07:00
c53dc66d4a Rename Container Linux snippets variable for consistency
* Rename controller_clc_snippets to controller_snippets (cloud platforms)
* Rename worker_clc_snippets to worker_snippets (cloud platforms)
* Rename clc_snippets to snippets (bare-metal)
2020-03-31 18:25:51 -07:00
9960972726 Fix bootstrap regression when networking="flannel"
* Fix bootstrap error for missing `manifests-networking/crd*yaml`
when `networking = "flannel"`
* Cleanup manifest-networking directory left during bootstrap
* Regressed in v1.18.0 changes for Calico https://github.com/poseidon/typhoon/pull/675
2020-03-31 18:21:59 -07:00
bac5acb3bd Change default kube-system DaemonSet tolerations
* Change kube-proxy, flannel, and calico-node DaemonSet
tolerations to tolerate `node.kubernetes.io/not-ready`
and `node-role.kubernetes.io/master` (i.e. controllers)
explicitly, rather than tolerating all taints
* kube-system DaemonSets will no longer tolerate custom
node taints by default. Instead, custom node taints must
be enumerated to opt-in to scheduling/executing the
kube-system DaemonSets
* Consider setting the daemonset_tolerations variable
of terraform-render-bootstrap at a later date

Background: Tolerating all taints ruled out use-cases
where certain nodes might legitimately need to keep
kube-proxy or CNI networking disabled
Related: https://github.com/poseidon/terraform-render-bootstrap/pull/179
2020-03-31 01:00:45 -07:00
70bdc9ec94 Allow bootstrap re-apply for Fedora CoreOS GCP
* Problem: Fedora CoreOS images are manually uploaded to GCP. When a
cluster is created with a stale image, Zincati immediately checks
for the latest stable image, fetches, and reboots. In practice,
this can unfortunately occur exactly during the initial cluster
bootstrap phase.

* Recommended: Upload the latest Fedora CoreOS image regularly
* Mitigation: Allow a failed bootstrap.service run (which won't touch
the done ConditionalPathExists) to be re-run by running `terraforma apply`
again. Add a known issue to CHANGES
* Update docs to show the current Fedora CoreOS stable version to
reduce likelihood users see this issue

 Longer term ideas:

* Ideal: Fedora CoreOS publishes a stable channel. Instances will always
boot with the latest image in a channel. The problem disappears since
it works the same way AWS does
* Timer: Consider some timer-based approach to have zincati delay any
system reboots for the first ~30 min of a machine's life. Possibly just
configured on the controller node https://github.com/coreos/zincati/pull/251
* External coordination: For Container Linux, locksmith filled a similar
role and was disabled to allow CLUO to coordinate reboots. By running
atop Kubernetes, it was not possible for the reboot to occur before
cluster bootstrap
* Rely on https://github.com/coreos/zincati/issues/115 to delay the
reboot since bootstrap involves an SSH session
* Use path-based activation of zincati on controllers and set that
path at the end of the bootstrap process

Rel: https://github.com/coreos/fedora-coreos-tracker/issues/239
2020-03-28 18:12:31 -07:00
144bb9403c Add support for Fedora CoreOS snippets
* Refresh snippets customization docs
* Requires terraform-provider-ct v0.5+
2020-03-28 16:15:04 -07:00
5fca08064b Fix Fedora CoreOS AMI to filter for stable images
* Fix issue observed in us-east-1 where AMI filters chose the
latest testing channel release, rather than the stable chanel
* Fedora CoreOS AMI filter selects the latest image with a
matching name, x86_64, and hvm, excluding dev images. Add a
filter for "Fedora CoreOS stable", which seems to be the only
distinguishing metadata indicating the channel
2020-03-28 12:57:45 -07:00
fc686c8fc7 Fix delete-node.service kubectl service exec's
* Fix delete-node service that runs on worker (cloud-only)
shutdown to delete a Kubernetes node. Regressed in #669
(unreleased)
* Use rkt `--exec` to invoke kubectl binary in the kubelet
image
* Use podman `--entrypoint` to invoke the kubectl binary in
the kubelet image
2020-03-28 12:35:23 -07:00
a1a5da6bc2 Add CoreOS Container Linux EOL recommendation to CHANGES
* Recommend that users who have not yet tried Fedora CoreOS or
Flatcar Linux do so. Likely, Container Linux will reach EOL
and platform support / stability ratings will be in a mixed
state. Nevertheless, folks should migrate by September.
2020-03-26 23:41:54 -07:00
076b8e3c42 Update Prometheus from v2.17.0 to v2.17.1
* https://github.com/prometheus/prometheus/releases/tag/v2.17.1
2020-03-26 22:17:13 -07:00
ef5f953e04 Set docker log driver to journald on Fedora CoreOS
* Before Kubernetes v1.18.0, Kubelet only supported kubectl
`--limit-bytes` with the Docker `json-file` log driver so
the Fedora CoreOS default was overridden for conformance.
See https://github.com/poseidon/typhoon/pull/642
* Kubelet v1.18+ implemented support for other docker log
drivers, so the Fedora CoreOS default `journald` can be
used again

Rel: https://github.com/kubernetes/kubernetes/issues/86367
2020-03-26 22:06:45 -07:00
d25f23e675 Update docs from Kubernetes v1.17.4 to v1.18.0 2020-03-25 20:28:30 -07:00
f100a90d28 Update Kubernetes from v1.17.4 to v1.18.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md
2020-03-25 17:51:50 -07:00
c3bf8bcf96 Add Fedora CoreOS to issue template and docs
* Update several Container Linux references to start
referring to Flatcar Linux
* Update docs and mentions of Fedora CoreOS
2020-03-25 00:36:15 -07:00
5d1e4ad333 Deprecate asset_dir variable and remove docs
* Remove docs for the `asset_dir` variable and deprecate
it in CHANGES. It will be removed in an upcoming release
* Typhoon v1.17.0 introduced a new mechanism for managing
and distributing generated assets that stopped relying on
writing out to disk. `asset_dir` became optional and
defaulted to being unset / off (recommended)
2020-03-25 00:00:01 -07:00
9f702c72d2 Rename DigitalOcean image variable to os_image
* Rename variable `image` to `os_image` to match the naming
used for the same purpose on other supported platforms (e.g.
AWS, Azure, GCP)
2020-03-24 23:49:37 -07:00
e556bc2167 Update Prometheus from v2.17.0-rc.3 to v2.17.0
* https://github.com/prometheus/prometheus/releases/tag/v2.17.0
2020-03-24 23:15:49 -07:00
1bf4f3b801 Fix image tag for Container Linux AWS workers
* #669 left one reference to the original SHA tagged image
before the v1.17.4 image tag was applied
2020-03-21 15:44:33 -07:00
590d941f50 Switch from upstream hyperkube image to individual images
* Kubernetes plans to stop releasing the hyperkube container image
* Upstream will continue to publish `kube-apiserver`, `kube-controller-manager`,
`kube-scheduler`, and `kube-proxy` container images to `k8s.gcr.io`
* Upstream will publish Kubelet only as a binary for distros to package,
either as a DEB/RPM on traditional distros or a container image on
container-optimized operating systems
* Typhoon will package the upstream Kubelet (checksummed) and its
dependencies as a container image for use on CoreOS Container Linux,
Flatcar Linux, and Fedora CoreOS
* Update the Typhoon container image security policy to list
`quay.io/poseidon/kubelet`as an official distributed artifact

Hyperkube: https://github.com/kubernetes/kubernetes/pull/88676
Kubelet Container Image: https://github.com/poseidon/kubelet
Kubelet Quay Repo: https://quay.io/repository/poseidon/kubelet
2020-03-21 15:43:05 -07:00
ddc1ff5348 Update Grafana from v6.6.2 to v6.7.1
* https://github.com/grafana/grafana/releases/tag/v6.7.1
2020-03-21 15:27:55 -07:00
61557e89a6 Update Prometheus from v2.16.0 to v2.17.0-rc.3
* https://github.com/prometheus/prometheus/releases/tag/v2.17.0-rc.3
2020-03-19 22:38:05 -07:00
c3ef21dbf5 Update etcd from v3.4.4 to v3.4.5
* https://github.com/etcd-io/etcd/releases/tag/v3.4.5
2020-03-18 20:50:41 -07:00
2a5dddeb9d Promote Fedora CoreOS AWS and Google Cloud
* Promote Fedora CoreOS AWS to stable
* Promote Fedora CoreOS GCP to beta
2020-03-16 22:12:26 -07:00
75fb4e5d11 Remove Container Linux Update Operator (CLUO) addon
* Stop providing example manifests for the Container Linux
Update Operator (CLUO)
* CLUO requires patches to support Kubernetes v1.16+, but the
project and push access is rather unowned
* CLUO hasn't been in active use in our clusters and won't be
relevant beyond Container Linux. Not to say folks can't patch
it and run it on their own. Examples just aren't provided here

Related: https://github.com/coreos/container-linux-update-operator/pull/197
2020-03-16 22:05:17 -07:00
1a139ef6f1 Update recommended Terraform versions and providers
* Sync the documented Terraform versions and provider
plugin versions to those that are actively used/tested
by the author
2020-03-16 21:40:52 -07:00
bc7902f40a Update Kubernetes from v1.17.3 to v1.17.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1174
2020-03-13 00:06:41 -07:00
70bf39bb9a Update Calico from v3.12.0 to v3.13.1
* https://docs.projectcalico.org/v3.13/release-notes/
2020-03-12 23:00:38 -07:00
4e1b8f22df Add support for Flatcar Linux on Azure
* Accept `os_image` "flatcar-stable" and "flatcar-beta" to
use Kinvolk's Flatcar Linux images from the Azure Marketplace

Note: Flatcar Linux Azure Marketplace images require terms be
accepted before use
2020-03-12 22:52:48 -07:00
ab7913a061 Accept initial worker node labels and taints map on bare-metal
* Add `worker_node_labels` map from node name to a list of initial
node label strings
* Add `worker_node_taints` map from node name to a list of initial
node taint strings
* Unlike cloud platforms, bare-metal node labels and taints
are defined via a map from node name to list of labels/taints.
Bare-metal clusters may have heterogeneous hardware so per node
labels and taints are accepted
* Only worker node names are allowed. Workloads are not scheduled
on controller nodes so altering their labels/taints isn't suitable

```
module "mercury" {
  ...

  worker_node_labels = {
    "node2" = ["role=special"]
  }

  worker_node_taints = {
    "node2" = ["role=special:NoSchedule"]
  }
}
```

Related: https://github.com/poseidon/typhoon/issues/429
2020-03-09 00:12:02 -07:00
7b0ea23cdc Upgrade terraform-provider-azurerm to v2.0+
* Add support for `terraform-provider-azurerm` v2.0+. Require
`terraform-provider-azurerm` v2.0+ and drop v1.x support since
the Azure provider major release is not backwards compatible
* Use Azure's new Linux VM and Linux VM Scale Set resources
* Change controller's Azure disk caching to None
* Associate subnets (in addition to NICs) with security groups
(aesthetic)
* If set, change `worker_priority` from `Low` to `Spot` (action required)

Related:

* https://www.terraform.io/docs/providers/azurerm/guides/2.0-upgrade-guide.html
2020-03-08 17:40:13 -07:00
c4683c5bad Refresh Prometheus alerts and Grafana dashboards
* Add 2 min wait before KubeNodeUnreachable to be less
noisy on premeptible clusters
* Add a BlackboxProbeFailure alert for any failing probes
for services annotated `prometheus.io/probe: true`
2020-03-02 20:08:37 -08:00
51cee6d5a4 Change Container Linux etcd-member to fetch with docker://
* Quay has historically generated ACI signatures for images to
facilitate rkt's notions of verification (it allowed authors to
actually sign images, though `--trust-keys-from-https` is in use
since etcd and most authors don't sign images). OCI standardization
didn't adopt verification ideas and checking signatures has fallen
out of favor.
* Fix an issue where Quay no longer seems to be generating ACI
signatures for new images (e.g. quay.io/coreos/etcd:v.3.4.4)
* Don't be alarmed by rkt `--insecure-options=image`. It refers
to disabling image signature checking (i.e. docker pull doesn't
check signatures either)
* System containers for Kubelet and bootstrap have transitioned
to the docker:// transport, so there is precedent and this brings
all the system containers on Container Linux controllers into
alignment
2020-03-02 19:57:45 -08:00
87f9a2fc35 Add automatic worker deletion on Fedora CoreOS clouds
* On clouds where workers can scale down or be preempted
(AWS, GCP, Azure), shutdown runs delete-node.service to
remove a node a prevent NotReady nodes from lingering
* Add the delete-node.service that wasn't carried over
from Container Linux and port it to use podman
2020-02-29 20:22:03 -08:00
6de5cf5a55 Update etcd from v3.4.3 to v3.4.4
* https://github.com/etcd-io/etcd/releases/tag/v3.4.4
2020-02-29 16:19:29 -08:00
3250994c95 Use a route table with separate (rather than inline) routes
* Allow users to extend the route table using a data reference
and adding route resources (e.g. unusual peering setups)
* Note: Internally connecting AWS clusters can reduce cross-cloud
flexibility and inhibits blue-green cluster patterns. It is not
recommended
2020-02-25 23:21:58 -08:00
f4d260645c Update node-exporter from v0.18.1 to v1.0.0-rc.0
* Update mdadm alert rule; node-exporter adds `state` label to
`node_md_disks` and removes `node_md_disks_active`
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.0
2020-02-25 22:29:52 -08:00
d9219a6722 Update nginx-ingress from v0.29.0 to v0.30.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.30.0
2020-02-25 22:11:59 -08:00
60c7eb85ee Update nginx-ingress from v0.28.0 to v0.29.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.29.0
2020-02-22 15:57:59 -08:00
4c964b56a0 Update kube-state-metrics from v1.9.4 to v1.9.5
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.5
2020-02-22 15:21:10 -08:00
1fbd6835f2 Update Grafana from v6.6.1 to v6.6.2
* https://github.com/grafana/grafana/releases/tag/v6.6.2
2020-02-22 15:19:24 -08:00
e4d977bfcd Fix worker_node_labels for initial Fedora CoreOS
* Add Terraform strip markers to consume beginning and
trailing whitespace in templated Kubelet arguments for
podman (Fedora CoreOS only)
* Fix initial `worker_node_labels` being quietly ignored
on Fedora CoreOS cloud platforms that offer the feature
* Close https://github.com/poseidon/typhoon/issues/650
2020-02-22 15:12:35 -08:00
947c2c1815 Update mkdocs-material from v4.6.2 to v4.6.3 2020-02-18 21:59:17 -08:00
4a38fb5927 Update CoreDNS from v1.6.6 to v1.6.7
* https://coredns.io/2020/01/28/coredns-1.6.7-release/
2020-02-18 21:46:19 -08:00
c4e64a9d1b Change Kubelet /var/lib/calico mount to read-only (#643)
* Kubelet only requires read access to /var/lib/calico

Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
2020-02-18 21:40:58 -08:00
7ca03e5219 Update Prometheus from v1.15.2 to v1.16.0
* https://github.com/prometheus/prometheus/releases/tag/v2.16.0
2020-02-14 12:10:56 -08:00
362b3fac5c Add guide for Typhoon with Flatcar Linux on DigitalOcean
* Add docs on manually uploading a Flatcar Linux DigitalOcean
bin image as a custom image and using a data reference
* Set status of Flatcar Linux on DigitalOcean to alpha
* IPv6 is not supported for DigitalOcean custom images
2020-02-14 12:08:58 -08:00
32db59b9eb Update CHANGELOG sections and links 2020-02-14 12:05:51 -08:00
0c53ad52e4 Update recommended Terraform versions and providers
* Sync the documented Terraform versions and provider
plugin versions to those that are actively used/tested
by the author
2020-02-13 14:39:48 -08:00
008817b0aa Promote Fedora CoreOS AWS/bare-metal to beta
* Remove alpha warnings from docs headers
2020-02-13 14:25:22 -08:00
49d3b9e6b3 Set docker log driver to json-file on Fedora CoreOS
* Fix the last minor issue for Fedora CoreOS clusters to pass CNCF's
Kubernetes conformance tests
* Kubelet supports a seldom used feature `kubectl logs --limit-bytes=N`
to trim a log stream to a desired length. Kubelet handles this in the
CRI driver. The Kubelet docker shim only supports the limit bytes
feature when Docker is configured with the default `json-file` logging
driver
* CNCF conformance tests started requiring limit-bytes be supported,
indirectly forcing the log driver choice until either the Kubelet or
the conformance tests are fixed
* Fedora CoreOS defaults Docker to use `journald` (desired). For now,
as a workaround to offer conformant clusters, the log driver can
be set back to `json-file`. RHEL CoreOS likely won't have noticed the
non-conformance since its using crio runtime
* https://github.com/kubernetes/kubernetes/issues/86367

Note: When upstream has a fix, the aim is to drop the docker config
override and use the journald default
2020-02-11 23:00:38 -08:00
1243f395d1 Update Kubernetes from v1.17.2 to v1.17.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1173
2020-02-11 20:22:14 -08:00
846f11097f Update Fedora CoreOS kernel arguments to align with upstream
* Align bare-metal kernel arguments with upstream docs
* Add missing initrd argument which can cause issues if
not present. Fix #638
* Add tty0 and ttyS0 consoles (matches Container Linux)
* Remove unused coreos.inst=yes

Related: https://docs.fedoraproject.org/en-US/fedora-coreos/bare-metal/
2020-02-11 20:11:19 -08:00
ba84f86dc7 Add guide for Typhoon with Flatcar Linux on Google Cloud
* Add docs on manually uploading a Flatcar Linux GCE/GCP gzipped
tarball image as a Compute Engine image for use with the Typhoon
container-linux module
* Set status of Flatcar Linux on Google Cloud to alpha
2020-02-11 19:38:40 -08:00
b49a1d715d Update docs generation packages
* Update mkdocs-material from v4.6.0 to v4.6.2
2020-02-08 15:12:12 -08:00
34c3d7cc39 Update Grafana from v6.6.0 to v6.6.1
* https://github.com/grafana/grafana/releases/tag/v6.6.1
2020-02-08 14:50:33 -08:00
ca96a1335c Update Calico from v3.11.2 to v3.12.0
* https://docs.projectcalico.org/release-notes/#v3120
* Remove reverse packet filter override, since Calico no
longer relies on the setting
* https://github.com/coreos/fedora-coreos-tracker/issues/219
* https://github.com/projectcalico/felix/pull/2189
2020-02-06 00:43:33 -08:00
e339fbd2b6 Update kube-state-metrics from v1.9.3 to v1.9.4
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.4
2020-02-04 21:33:34 -08:00
8cc303c9ac Add module for Fedora CoreOS on Google Cloud
* Add Typhoon Fedora CoreOS on Google Cloud as alpha
* Add docs on uploading the Fedora CoreOS GCP gzipped tarball to
Google Cloud storage to create a boot disk image
2020-02-01 15:21:40 -08:00
b19ba16afa Update nginx-ingress from v0.27.1 to v0.28.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.28.0
2020-01-30 18:00:23 -08:00
d127a7345c Update Grafana from v6.5.3 to v6.6.0
* https://github.com/grafana/grafana/releases/tag/v6.6.0
2020-01-27 20:46:32 -08:00
02a470d2f2 Fix minor typo in announcement date 2020-01-23 08:57:01 -08:00
5643ad525f Promote Fedora CoreOS from preview to alpha in docs
* Add an announcement to the website as well
2020-01-23 08:47:18 -08:00
d5b7ce8f27 Update kube-state-metrics from v1.9.2 to v1.9.3
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.3
2020-01-23 00:03:16 -08:00
1cda5bcd2a Update Kubernetes from v1.17.1 to v1.17.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1172
2020-01-21 18:27:39 -08:00
bda73264f7 Update nginx-ingress from v0.26.1 to v0.27.1
* Change runAsUser from 33 to 101 for new alpine-based image
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.1
2020-01-20 15:22:16 -08:00
dd930a2ff9 Update bare-metal Fedora CoreOS image location
* Use Fedora CoreOS production download streams (change)
* Use live PXE kernel and initramfs images
* https://getfedora.org/coreos/download/
* Update docs example to use public images (cache is still
recommended at large scale) and stable stream
2020-01-20 14:44:06 -08:00
03ff3a9cf3 Update kube-state-metrics from v1.9.1 to v1.9.2
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.2
2020-01-18 15:32:10 -08:00
48703f9906 Update Grafana from v6.5.2 to v6.5.3
* https://github.com/grafana/grafana/releases/tag/v6.5.3
2020-01-18 15:30:39 -08:00
7ddd3d096d Fix link in maintenance docs
* Also a fix version mention, since Terraform v0.12 was
added in Typhoon v1.15.0
2020-01-18 15:19:27 -08:00
7daabd28b5 Update Calico from v3.11.1 to v3.11.2
* https://docs.projectcalico.org/v3.11/release-notes/
2020-01-18 13:45:24 -08:00
b642e3b41b Update Kubernetes from v1.17.0 to v1.17.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1171
2020-01-14 20:21:36 -08:00
ac786a2efc Update AWS Fedora CoreOS AMI filter for fedora-coreos-31
* Select the most recent fedora-coreos-31 AMI on AWS, instead
of the most recent fedora-coreos-30 AMI (Nov 27, 2019)
* Evaluated with fedora-coreos-31.20200108.2.0-hvm
2020-01-14 20:06:14 -08:00
073fcb7067 Fix bare-metal instruction for watching install to disk
* Original instructions were to watch install to disk by SSH'ing
via port 2222 following Typhoon v1.10.1. Restore that message,
since the version number in the instruction was incorrectly bumped
on each release
2020-01-12 14:16:00 -08:00
ce0569e03b Remove unneeded Kubelet /var/run mount on Fedora CoreOS
* /var/run symlinks to /run (already mounted)
2020-01-11 15:15:39 -08:00
0e2fc89f78 Update kube-state-metrics from v1.9.0 to v1.9.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.1
2020-01-11 14:15:55 -08:00
b1f521fc4a Allow terraform-provider-google v3.x plugin versions
* Typhoon Google Cloud is compatible with `terraform-provider-google`
v3.x releases
* No v3.x specific features are used, so v2.19+ provider versions are
still allowed, to ease migrations
2020-01-11 14:07:18 -08:00
73588cfad3 Update Prometheus from v2.15.1 to v2.15.2
* https://github.com/prometheus/prometheus/releases/tag/v2.15.2
2020-01-06 22:08:34 -08:00
0223b31e1a Ensure /etc/kubernetes exists following Kubelet inlining
* Inlining the Kubelet service removed the need for the
kubelet.env file declared in Ignition. However, on some
platforms, this removed the guarantee that /etc/kubernetes
exists. Bare-Metal and DigitalOcean distribute the kubelet
kubeconfig through Terraform file provisioner (scp) and
place it in (now missing) /etc/kubernetes
* https://github.com/poseidon/typhoon/pull/606
* Fix bare-metal and DigitalOcean Ignition to ensure the
desired directory exists following first boot from disk
* Cloud platforms with worker pools distribute the kubeconfig
through Ignition user data (no impact or need)
2020-01-06 21:38:20 -08:00
bb586b60da Reduce Prometheus addon's node-exporter tolerations
* Change node-exporter DaemonSet tolerations from tolerating
all possible NoSchedule taints to tolerating the master taint
and the not ready taint (we'd like metrics regardless)
* Users who add custom node taints must add their custom taints
to the addon node-exporter DaemonSet. As an addon, its expected
users copy and manipulate manifests out-of-band in their own
systems
2020-01-06 21:24:24 -08:00
43e05b9131 Enable kube-proxy metrics and allow Prometheus scrapes
* Configure kube-proxy --metrics-bind-address=0.0.0.0 (default
127.0.0.1) to serve metrics on 0.0.0.0:10249
* Add firewall rules to allow Prometheus (resides on a worker) to
scrape kube-proxy service endpoints on controllers or workers
* Add a clusterIP: None service for kube-proxy endpoint discovery
2020-01-06 21:11:18 -08:00
b2eb3e05d0 Disable Kubelet 127.0.0.1.10248 healthz endpoint
* Kubelet runs a healthz server listening on 127.0.0.1:10248
by default. Its unused by Typhoon and can be disabled
* https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
2019-12-29 11:23:25 -08:00
f1f4cd6fc0 Inline Container Linux kubelet.service, deprecate kubelet-wrapper
* Change kubelet.service on Container Linux nodes to ExecStart Kubelet
inline to replace the use of the host OS kubelet-wrapper script
* Express rkt run flags and volume mounts in a clear, uniform way to
make the Kubelet service easier to audit, manage, and understand
* Eliminate reliance on a Container Linux kubelet-wrapper script
* Typhoon for Fedora CoreOS developed a kubelet.service that similarly
uses an inline ExecStart (except with podman instead of rkt) and a
more minimal set of volume mounts. Adopt the volume improvements:
  * Change Kubelet /etc/kubernetes volume to read-only
  * Change Kubelet /etc/resolv.conf volume to read-only
  * Remove unneeded /var/lib/cni volume mount

Background:

* kubelet-wrapper was added in CoreOS around the time of Kubernetes v1.0
to simplify running a CoreOS-built hyperkube ACI image via rkt-fly. The
script defaults are no longer ideal (e.g. rkt's notion of trust dates
back to quay.io ACI image serving and signing, which informed the OCI
standard images we use today, though they still lack rkt's signing ideas).
* Shipping kubelet-wrapper was regretted at CoreOS, but remains in the
distro for compatibility. The script is not updated to track hyperkube
changes, but it is stable and kubelet.env overrides bridge most gaps
* Typhoon Container Linux nodes have used kubelet-wrapper to rkt/rkt-fly
run the Kubelet via the official k8s.gcr.io hyperkube image using overrides
(new image registry, new image format, restart handling, new mounts, new
entrypoint in v1.17).
* Observation: Most of what it takes to run a Kubelet container is defined
in Typhoon, not in kubelet-wrapper. The wrapper's value is now undermined
by having to workaround its dated defaults. Typhoon may be better served
defining Kubelet.service explicitly
* Typhoon for Fedora CoreOS developed a kubelet.service without the use
of a host OS kubelet-wrapper which is both clearer and eliminated some
volume mounts
2019-12-29 11:17:26 -08:00
50db3d0231 Rename CLC files and favor Terraform list index syntax
* Rename Container Linux Config (CLC) files to *.yaml to align
with Fedora CoreOS Config (FCC) files and for syntax highlighting
* Replace common uses of Terraform `element` (which wraps around)
with `list[index]` syntax to surface index errors
2019-12-28 12:14:01 -08:00
11565ffa8a Update Calico from v3.10.2 to v3.11.1
* https://docs.projectcalico.org/v3.11/release-notes/
2019-12-28 11:08:03 -08:00
a4e843693f Update Prometheus from v2.15.0 to v2.15.1
* https://github.com/prometheus/prometheus/releases/tag/v2.15.1
2019-12-26 09:12:55 -05:00
f48e43c0b1 Update Prometheus from v2.14.0 to v2.15.0
* https://github.com/prometheus/prometheus/releases/tag/v2.15.0
2019-12-24 10:52:19 -05:00
daa8d9d9ec Update CoreDNS from v1.6.5 to v1.6.6
* https://coredns.io/2019/12/11/coredns-1.6.6-release/
2019-12-22 10:47:19 -05:00
52d11096dc Update kube-state-metrics from v1.9.0-rc.1 to v1.9.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0
2019-12-20 13:53:37 -08:00
00c431a9d2 Add Kubelet kubeconfig output for DigitalOcean
* Allow the raw kubelet kubeconfig to be consumed via
Terraform output
2019-12-18 23:20:55 -08:00
0ecb995890 Update kube-state-metrics from v1.8.0 to v1.9.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0
2019-12-14 17:20:49 -08:00
1b9fa2e688 Update Grafana from v6.5.1 to v6.5.2
* https://github.com/grafana/grafana/releases/tag/v6.5.2
2019-12-14 15:25:48 -08:00
2d8e367664 Update mkdocs-material from v4.5.1 to v4.6.0 2019-12-14 15:02:28 -08:00
c3e22f3d13 Fix minor example typo in README 2019-12-10 23:14:12 -08:00
f69dc2ea0f Update CHANGES and tutorial notes for release
* Update recommended Terraform and provider plugin versions
* Update the rough count of resources created per cluster
since its not been refreshed in a while (will vary based
on cluster options)
2019-12-10 23:03:39 -08:00
c0ce04e1de Update Calico from v3.10.1 to v3.10.2
* https://docs.projectcalico.org/v3.10/release-notes/
2019-12-09 21:03:00 -08:00
ed3550dce1 Update systemd services for the v0.17.x hyperkube
* Binary asset locations within the upstream hyperkube image
changed https://github.com/kubernetes/kubernetes/pull/84662
* Fix Container Linux and Flatcar Linux kubelet.service
(rkt-fly with fairly dated CoreOS kubelet-wrapper)
* Fix Fedora CoreOS kubelet.service (podman)
* Fix Fedora CoreOS bootstrap.service
* Fix delete-node kubectl usage for workers where nodes may
delete themselves on shutdown (e.g. preemptible instances)
2019-12-09 18:39:17 -08:00
de36d99afc Update Kubernetes from v1.16.3 to v1.17.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md/#v1170
2019-12-09 18:31:58 -08:00
4fce9485c8 Reduce kube-controller-manager pod eviction timeout from 5m to 1m
* Reduce time to delete pods on unready nodes from 5m to 1m
* Present since v1.13.3, but mistakenly removed in v1.16.0 static
pod control plane migration

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/148
* https://github.com/poseidon/terraform-render-bootstrap/pull/164
2019-12-08 22:58:31 -08:00
178afe4a9b Reduce apiserver metrics cardinality and extraneous labels
* Stop mapping node labels to targets discovered via Kubernetes
nodes (e.g. etcd, kubelet, cadvisor). It is rarely useful to
store node labels (e.g. kubernetes.io/os=linux) on these metrics
* kube-apiserver's apiserver_request_duration_seconds_bucket metric
has a high cardinality that includes labels for the API group, verb,
scope, resource, and component for each object type, including for
each CRD. This one metric has ~10k time series in a typical cluster
(btw 10-40% of total)
* Removing the apiserver request duration outright would make latency
alerts a NoOp and break a Grafana apiserver panel. Instead, drop series
that have a "group" label. Effectively, only request durations for
core Kubernetes APIs will be kept (e.g. cardinality won't grow with
each CRD added). This reduces the metric to ~2k unique series
2019-12-08 22:48:25 -08:00
d9c7a9e049 Add/update docs for asset_dir and kubeconfig usage
* Original tutorials favored including the platform (e.g.
google-cloud) in modules (e.g. google-cloud-yavin). Prefer
naming conventions where each module / cluster has a simple
name (e.g. yavin) since the platform is usually redundant
* Retain the example cluster naming themes per platform
2019-12-05 22:56:42 -08:00
2837275265 Introduce cluster creation without local writes to asset_dir
* Allow generated assets (TLS materials, manifests) to be
securely distributed to controller node(s) via file provisioner
(i.e. ssh-agent) as an assets bundle file, rather than relying
on assets being locally rendered to disk in an asset_dir and
then securely distributed
* Change `asset_dir` from required to optional. Left unset,
asset_dir defaults to "" and no assets will be written to
files on the machine that runs terraform apply
* Enhancement: Managed cluster assets are kept only in Terraform
state, which supports different backends (GCS, S3, etcd, etc) and
optional encryption. terraform apply accesses state, runs in-memory,
and distributes sensitive materials to controllers without making
use of local disk (simplifies use in CI systems)
* Enhancement: Improve asset unpack and layout process to position
etcd certificates and control plane certificates more cleanly,
without unneeded secret materials

Details:

* Terraform file provisioner support for distributing directories of
contents (with unknown structure) has been limited to reading from a
local directory, meaning local writes to asset_dir were required.
https://github.com/poseidon/typhoon/issues/585 discusses the problem
and newer or upcoming Terraform features that might help.
* Observation: Terraform provisioner support for single files works
well, but iteration isn't viable. We're also constrained to Terraform
language features on the apply side (no extra plugins, no shelling out)
and CoreOS / Fedora tools on the receive side.
* Take a map representation of the contents that would have been splayed
out in asset_dir and pack/encode them into a single file format devised
for easy unpacking. Use an awk one-liner on the receive side to unpack.
In pratice, this has worked well and its rather nice that a single
assets file is transferred by file provisioner (all or none)

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162
2019-12-05 01:24:50 -08:00
5fa002f4f7 Update mkdocs-material from v4.5.0 to v4.5.1 2019-12-02 21:21:16 -08:00
aa275796cb Fix DigitalOcean controller and worker ipv4/ipv6 outputs (#594)
* Fix controller and worker ipv4/ipv4 outputs to be lists of strings
* With Terraform v0.11 syntax, an enclosing list was required to coerce the
output to be a list of strings
* With Terraform v0.12 syntax, the enclosing list shouldn't be needed
2019-12-02 21:20:47 -08:00
26674083b6 Update Grafana from v6.5.0 to v6.5.1
* https://github.com/grafana/grafana/releases/tag/v6.5.1
2019-11-28 14:11:25 -08:00
030a4cec19 Update Grafana from v6.4.4 to v6.5.0
* https://grafana.com/docs/guides/whats-new-in-v6-5/
2019-11-25 22:45:58 -08:00
ddea7dc452 Use new resource dashboards in Grafana deployment
* kubernetes-mixin pod resource dashboards were split into
two ConfigMap parts because they provide richer networking
details
* New dashboards have been used by the author at the global
level, but were missing in the per-cluster Grafana tracked
here
2019-11-25 22:27:11 -08:00
4b485a9bf2 Fix recent deletion of bootstrap module pinned SHA
* Fix deletion of bootstrap module pinned SHA, which was
introduced recently through an automation mistake creating
https://github.com/poseidon/typhoon/pull/589
2019-11-21 22:34:09 -08:00
4704b494f0 Update mkdocs-material from v4.4.3 to v4.4.0
* Upgrade dependency packages as well
2019-11-18 23:05:29 -08:00
525ae23305 Add node-exporter alerts and Grafana dashboard
* Add Prometheus alerts from node-exporter
* Add Grafana dashboard nodes.json, from node-exporter
* Not adding recording rules, since those are only used
by some node-exporter USE dashboards not being included
2019-11-16 13:47:20 -08:00
8a9e8595ae Fix terraform fmt formatting 2019-11-13 23:44:02 -08:00
19ee57dc04 Use GCP region_instance_group_manager version block format
* terraform-provider-google v2.19.0 deprecates `instance_template`
within `google_compute_region_instance_group_manager` in order to
support a scheme with multiple version blocks. Adapt our single
version to the new format to resolve deprecation warnings.
* Fixes: Warning: "instance_template": [DEPRECATED] This field
will be replaced by `version.instance_template` in 3.0.0
* Require terraform-provider-google v2.19.0+ (action required)
2019-11-13 17:41:13 -08:00
0e4ee5efc9 Add small CPU resource requests to static pods
* Set small CPU requests on static pods kube-apiserver,
kube-controller-manager, and kube-scheduler to align with
upstream tooling and for edge cases
* Effectively, a practical case for these requests hasn't been
observed. However, a small static pod CPU request may offer
a slight benefit if a controller became overloaded and the
below mechanisms were insufficient

Existing safeguards:

* Control plane nodes are tainted to isolate them from
ordinary workloads. Even dense workloads can only compress
CPU resources on worker nodes.
* Control plane static pods use the highest priority class, so
contention favors control plane pods (over say node-exporter)
and CPU is compressible too.

See: https://github.com/poseidon/terraform-render-bootstrap/pull/161
2019-11-13 17:18:45 -08:00
a271b9f340 Update CoreDNS from v1.6.2 to v1.6.5
* Add health `lameduck` option 5s. Before CoreDNS shuts down, it will
wait and report unhealthy for 5s to allow time for plugins to shutdown
cleanly
* Minor bug fixes over a few releases
* https://coredns.io/2019/08/31/coredns-1.6.3-release/
* https://coredns.io/2019/09/27/coredns-1.6.4-release/
* https://coredns.io/2019/11/05/coredns-1.6.5-release/
2019-11-13 16:47:44 -08:00
cb0598e275 Adopt Terraform v0.12 templatefile function
* Update terraform-render-bootstrap module to adopt the
Terrform v0.12 templatefile function feature to replace
the use of terraform-provider-template's `template_dir`
* Require Terraform v0.12.6+ which adds `for_each`

Background:

* `template_dir` was added to `terraform-provider-template`
to add support for template directory rendering in CoreOS
Tectonic Kubernetes distribution (~2017)
* Terraform v0.12 introduced a native `templatefile` function
and v0.12.6 introduced native `for_each` support (July 2019)
that makes it possible to replace `template_dir` usage
2019-11-13 16:33:36 -08:00
ad117f4592 Update recommended Terraform provider versions
* Recommend provider plugin version tested against
2019-11-13 13:53:46 -08:00
42b6df89c8 Update Prometheus from v2.14.0-rc.0 to v2.14.0
* https://github.com/prometheus/prometheus/releases/tag/v2.14.0
2019-11-13 13:41:11 -08:00
d7061020ba Update Kubernetes from v1.16.2 to v1.16.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163
2019-11-13 13:05:15 -08:00
a8b7792338 Update Grafana from v6.4.3 to v6.4.4
* https://github.com/grafana/grafana/releases/tag/v6.4.4
2019-11-07 12:00:25 -08:00
a3807086d4 Update Prometheus from v2.13.1 to v2.14.0-rc.0
* Happy PromCon 2019!
* https://github.com/prometheus/prometheus/releases/tag/v2.14.0-rc.0
2019-11-07 11:48:23 -08:00
2c163503f1 Update etcd from v3.4.2 to v3.4.3
* etcd v3.4.3 builds with Go v1.12.12 instead of v1.12.9
and adds a few minor metrics fixes
* https://github.com/etcd-io/etcd/compare/v3.4.2...v3.4.3
2019-11-07 11:41:01 -08:00
0034a15711 Update Calico from v3.10.0 to v3.10.1
* https://docs.projectcalico.org/v3.10/release-notes/
2019-11-07 11:38:32 -08:00
38957163cb Output resource_group_id in Azure (#577)
* Add an output variable `resource_group_id` to the azure module
2019-10-31 01:05:04 -07:00
d4573092b5 Improve Kubelet and Compute Resource dashboards
* Add cluster filter to Kubelet dashboard
* Add network details in resource dashboards
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/275
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/284
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/285
2019-10-28 02:22:15 -07:00
4775e9d0f7 Upgrade Calico v3.9.2 to v3.10.0
* Allow advertising Kubernetes service ClusterIPs to BGPPeer
routers via a BGPConfiguration
* Improve EdgeRouter docs about routes and BGP
* https://docs.projectcalico.org/v3.10/release-notes/
* https://docs.projectcalico.org/v3.10/networking/advertise-service-ips
2019-10-27 14:13:41 -07:00
d418045929 Switch kube-proxy from iptables mode to ipvs mode
* Kubernetes v1.11 considered kube-proxy IPVS mode GA
* Many problems were found #321
* Since then, major blockers seem to have been addressed
2019-10-27 00:37:41 -07:00
eb7b6d39f2 Improve minor aspects of CoreDNS and nginx-ingress dashboards
* Add default 10s refresh rate to custom dashboards to match
those from Kubernetes
* Show labels for "instance" as "pod" for clarity
* Add cluster filter for internal use
2019-10-20 23:16:55 -07:00
33d4c2fd68 Add explicit annotation for Prometheus port to scrape
* Without the prometheus.io/port annotation, Prometheus
service discovery can scrape other Prometheus ports that
may be available.
* For example, Prometheus sidecars (not included) may
be scraped and that may be unintended
2019-10-20 16:05:09 -07:00
de90cb9246 Remove kube-state-metrics addon-resizer
* addon-resizer is outdated and has been dropped from
kube-state-metrics examples. Those using it should look
to the cluster-proportional-vertical-autoscaler.
* Eliminate addon-resizer log spew
* Remove associated Role and RoleBinding
* Also fix kube-state-metrics readinessProbe port
2019-10-20 16:03:29 -07:00
68da420adc Refresh Prometheus rules/alerts and Grafana dashboards
* Update Prometheus rules/alerts and Grafana dashboards
* Remove dashboards that were moved to node-exporter, they
may be added back later if valuable
* Remove kube-prometheus based rules/alerts (ClockSkew alert)
2019-10-19 17:43:47 -07:00
130c97f8eb Update Prometheus from v2.13.0 to v2.13.1
* https://github.com/prometheus/prometheus/releases/tag/v2.13.1
2019-10-18 00:10:25 -07:00
271d2f6b52 Update Grafana from v6.4.2 to v6.4.3
* https://github.com/grafana/grafana/releases/tag/v6.4.3
2019-10-18 00:08:39 -07:00
0595915a19 Cleanup CHANGES notes 2019-10-15 23:25:45 -07:00
e6bc5143aa Default to Calico as the CNI provider on Azure/DigitalOcean
* Change `networking` default from flannel to calico on
Azure and DigitalOcean
* AWS, bare-metal, and Google Cloud continue to default
to Calico (as they have since v1.7.5)
* Typhoon now defaults to using Calico and supporting
NetworkPolicy on all platforms
2019-10-15 23:15:40 -07:00
e4ac1027c8 Update Grafana from v6.4.1 to v6.4.2
* https://github.com/grafana/grafana/releases/tag/v6.4.2
2019-10-15 22:58:43 -07:00
24fc440d83 Update Kubernetes from v1.16.1 to v1.16.2
* Update Calico from v3.9.1 to v3.9.2
2019-10-15 22:42:52 -07:00
a6702573a2 Update etcd from v3.4.1 to v3.4.2
* https://github.com/etcd-io/etcd/releases/tag/v3.4.2
2019-10-15 00:06:15 -07:00
69188af565 Rename CLUO label from "app" to "name"
* Match the labeling pattern in other addons
2019-10-15 00:05:02 -07:00
d874bdd17d Update bootstrap module control plane manifests and type constraints
* Remove unneeded control plane flags that correspond to defaults
* Adopt Terraform v0.12 type constraints in bootstrap module
2019-10-06 21:09:30 -07:00
5b9dab6659 Introduce list of detail objects for bare-metal machines
* Define bare-metal `controllers` and `workers` as a complex type
list(object{name=string, mac=string, domain=string}) to allow
clusters with many machines to be defined more cleanly
* Remove `controller_names` list variable
* Remove `controller_macs` list variable
* Remove `controller_domains` list variable
* Remove `worker_names` list variable
* Remove `worker_macs` list variable
* Remove `worker_domains` list variable
2019-10-06 20:22:45 -07:00
5196709fe0 Update docs, CHANGES, and mkdocs-material
* Update mkdocs-material from v4.4.2 to v4.4.3
* Update recommended Terraform provider versions
* Cleanup the changelog before release
2019-10-06 18:41:25 -07:00
ab72f1ab2d Update Prometheus from v2.12.0 to v2.13.0
* https://github.com/prometheus/prometheus/releases/tag/v2.13.0
2019-10-06 18:22:20 -07:00
5ef4155e08 Detect most recent Fedora CoreOS AMI in region
* Detect the most recent Fedora CoreOS AMI to allow usage
of Fedora CoreOS in supported regions (previously just
us-east-1)
* Unpin the Fedora CoreOS AMI image which was pinned to
images that had been checked. This does mean if Fedora
publishes a broken image, it will be selected
* Filter out "dev" images which have similar naming
2019-10-06 18:13:55 -07:00
15c4b793c3 Use new Fedora CoreOS kernel/initrd/raw asset names
* Fedora CoreOS changed the kernel, initramfs, and raw
image asset download paths and names in 30.20191002.0
2019-10-06 17:31:21 -07:00
36ed53924f Add stricter types for bare-metal modules
* Review variables available in bare-metal kubernetes modules
for Container Linux and Fedora CoreOS
* Deprecate cluster_domain_suffix variable
* Remove deprecated container_linux_oem variable
2019-10-06 17:18:50 -07:00
19de38b30d Fix Prometheus etcd metrics scraping
* Prometheus was configured to use kubernetes discovery
of etcd targets based on nodes matching the node label
node-role.kubernetes.io/controller=true
* Kubernetes v1.16 stopped permitting node role labels
node-role.kubernetes.io/* so Typhoon renamed these labels
(no longer any association with roles) to
node.kubermetes.io/controller=true
* As a result, Prometheus didn't discover etcd targets,
etcd metrics were missing, etcd alerts were ineffective,
and the etcd Grafana dashboard was empty
* Introduced: https://github.com/poseidon/typhoon/pull/543
2019-10-03 19:07:05 -07:00
995824fa6d Add stricter types for DigitalOcean module
* Review variables available in DigitalOcean kubernetes
module and sync with documentation
* Promote Calico for DigitalOcean and Azure beyond experimental
(its the primary mode I've used since it was introduced)
2019-10-02 21:48:24 -07:00
1c5ed84fc2 Update Kubernetes from v1.16.0 to v1.16.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161
2019-10-02 21:31:55 -07:00
ca7d62720e Update Grafana from v6.3.6 to v6.4.1
* https://github.com/grafana/grafana/releases/tag/v6.4.1
2019-10-02 20:36:05 -07:00
26f8d76755 Update kube-state-metrics from v1.7.2 to v1.8.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.8.0
2019-10-01 20:50:33 -07:00
fdd6882a87 Add stricter types to Azure modules
* Review variables available in Azure kubernetes and workers
modules and sync with documentation
* Fix internal workers module default type to Standard_DS1_v2
2019-09-30 22:20:20 -07:00
f82266ac8c Add stricter types for GCP modules
* Review variables available in google-cloud kubernetes
and workers modules and in documentation
2019-09-30 22:04:35 -07:00
7bcf2d7831 Update nginx-ingress from v0.25.1 to v0.26.1
* Add lifecycle hook to allow draining connections for
up to 5 minutes
2019-09-30 22:01:07 -07:00
78bfff0afe Update Fedora CoreOS to testing 30.20190905.0
* Fix duplicated cluster_domain_suffix variable
2019-09-29 11:34:31 -07:00
a6de245d8a Rename bootkube.tf to bootstrap.tf
* Typhoon no longer uses the bootkube project
2019-09-29 11:30:49 -07:00
96afa6a531 Update Calico from v3.8.2 to v3.9.1
* https://docs.projectcalico.org/v3.9/release-notes/
2019-09-29 11:22:53 -07:00
a407ff72df Add stricter types for AWS modules and update docs
* Review variables available in AWS kubernetes and workers
modules and documentation
* Switching between spot and on-demand has worked since
Terraform v0.12
* Generally, there are too many knobs. Less useful ones
should be de-emphasized or removed
* Remove `cluster_domain_suffix` documentation
2019-09-29 11:19:38 -07:00
f453c54956 Update Grafana from v6.3.5 to v6.3.6
* https://github.com/grafana/grafana/releases/tag/v6.3.6
2019-09-28 15:13:46 -07:00
3e34fb075b Update etcd from v3.4.0 to v3.4.1
* https://github.com/etcd-io/etcd/releases/tag/v3.4.1
2019-09-28 15:09:57 -07:00
9bfb1c5faf Update docs and variable types for worker node_labels
* Document worker pools `node_labels` variable to set the
initial node labels for a homogeneous set of workers
* Document `worker_node_labels` convenience variable to
set the initial node labels for default worker nodes
2019-09-28 15:05:12 -07:00
99ab81f79c Add node_labels variable in workers modules to set initial node labels (#550)
* Also add `worker_node_labels` variable in `kubernetes` modules to set
initial node labels for the default workers
2019-09-28 14:59:24 -07:00
8703f2c3c5 Fix missing comma separator on bare-metal and DO
* Introduced in bare-metal and DigitalOcean in #544
while addressing possible ordering race, but after
the v1.16 upgrade validation
2019-09-23 11:05:26 -07:00
078f084220 Update CHANGES and docs for v1.16.0 release 2019-09-22 17:37:23 -07:00
81a1ae38e6 Update Terraform provider plugin versions
* Recommend provider plugin versions that Typhoon
authors use
2019-09-22 17:14:30 -07:00
5b06e0e869 Organize and cleanup Kubelet ExecStartPre
* Sort Kubelet ExecStartPre mkdir commands
* Remove unused inactive-manifests and checkpoint-secrets
directories (were used by bootkube self-hosting)
2019-09-19 00:38:34 -07:00
b951aca66f Create /etc/kubernetes/manifests before asset copy
* Fix issue (present since bootkube->bootstrap switch) where
controller asset copy could fail if /etc/kubernetes/manifests
wasn't created in time on platforms using path activation for
the Kubelet (observed on DigitalOcean, also possible on
bare-metal)
2019-09-19 00:30:53 -07:00
9da3725738 Update Kubernetes from v1.15.3 to v1.16.0
* Drop `node-role.kubernetes.io/master` and
`node-role.kubernetes.io/node` node labels
* Kubelet (v1.16) now rejects the node labels used
in the kubectl get nodes ROLES output
* https://github.com/kubernetes/kubernetes/issues/75457
2019-09-18 22:53:06 -07:00
fd12f3612b Rename CA organization from bootkube to typhoon
* Rename the organization in generated CA certificates from
bootkube to typhoon. Avoid confusion with the bootkube project
* https://github.com/poseidon/terraform-render-bootstrap/pull/149
2019-09-14 16:56:53 -07:00
96b646cf6d Rename bootkube modules to bootstrap
* Rename render module from bootkube to bootstrap. Avoid
confusion with the kubernetes-incubator/bootkube tool since
it is no longer used
* Use the poseidon/terraform-render-bootstrap Terraform module
(formerly poseidon/terraform-render-bootkube)
* https://github.com/poseidon/terraform-render-bootkube/pull/149
2019-09-14 16:24:32 -07:00
b15c60fa2f Update CHANGES for control plane static pod switch
* Remove old references to bootkube / self-hosted
2019-09-09 22:48:48 -07:00
db947537d1 Migrate GCP, DO, Azure to static pod control plane
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
2019-09-09 22:37:31 -07:00
c933bdfc26 Migrate Container Linux AWS to static pod control plane
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
2019-09-09 22:37:31 -07:00
21632c6674 Migrate Container Linux bare-metal to static pod control plane
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
2019-09-09 22:37:31 -07:00
74780fb09f Migrate Fedora CoreOS bare-metal to static pod control plane
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
2019-09-09 22:37:31 -07:00
b60a2ecdf7 Migrate Fedora CoreOS AWS to a static pod control plane
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
2019-09-09 22:37:31 -07:00
4a7083d94a Change Azure default controller_type and worker_type
* Change default controller_type to Standard_B2s. A B2s is cheaper
by $17/month and provides 2 vCPU, 4GB RAM (vs 1 vCPU, 3.5GB RAM)
* Change default worker_type to Standard_DS1_v2. F1 was the previous
generation. The DS1_v2 is newer, similar cost, more memory, and still
supports Low Priority mode, if desired
2019-09-09 22:34:28 -07:00
c20683067d Update etcd from v3.3.15 to v3.4.0
* https://github.com/etcd-io/etcd/releases/tag/v3.4.0
2019-09-08 15:32:49 -07:00
dc436b8fe9 Update Grafana from v6.3.4 to v6.3.5
* https://github.com/grafana/grafana/releases/tag/v6.3.5
2019-09-07 14:21:59 -07:00
efb9a2d09a Update Fedora CoreOS bare-metal docs for 30.20190801.0 2019-09-04 21:11:22 -07:00
e8d586f3b3 Enable QoS on Fedora CoreOS controllers
* Kubelet race should be fixed in Kubernetes v1.15.1
* https://github.com/kubernetes/kubernetes/issues/79046
* Reverts temporary mitigation https://github.com/poseidon/typhoon/pull/515
2019-09-04 21:09:45 -07:00
b74f470701 Recommend updating terraform-provider-ct from v0.3.2 to v0.4.0
* v0.4.0 adds a "strict" mode we'll start using in future and
also adds support for Fedora CoreOS
* https://github.com/poseidon/terraform-provider-ct/releases/tag/v0.4.0
2019-08-31 16:07:22 -07:00
45bc52d156 Update Grafana from v6.3.3 to v6.3.4
* https://github.com/grafana/grafana/releases/tag/v6.3.4
2019-08-31 15:59:13 -07:00
4d5f962d76 Update CoreDNS from v1.5.0 to v1.6.2
* https://coredns.io/2019/06/26/coredns-1.5.1-release/
* https://coredns.io/2019/07/03/coredns-1.5.2-release/
* https://coredns.io/2019/07/28/coredns-1.6.0-release/
* https://coredns.io/2019/08/02/coredns-1.6.1-release/
* https://coredns.io/2019/08/13/coredns-1.6.2-release/
2019-08-31 15:57:42 -07:00
e7d805d9a4 Sync recommended versions of Terraform providers for clouds
* Align Terraform provider plugin versions with those tested against
2019-08-27 22:00:08 -07:00
d95bf2d1ea Update mkdocs-material from v4.4.0 to v4.4.2 2019-08-27 21:57:20 -07:00
c42139beaa Update etcd from v3.3.14 to v3.3.15
* No functional changes, just changes to vendoring tools
(go modules -> glide). Still, update to v3.3.15 anyway
* https://github.com/etcd-io/etcd/compare/v3.3.14...v3.3.15
2019-08-19 15:05:21 -07:00
35c2763ab0 Update Kubernetes from v1.15.2 to v1.15.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md/#v1153
2019-08-19 14:49:24 -07:00
2067356ae9 Update Fedora CoreOS to testing 30.20190801.0 2019-08-18 21:46:59 -07:00
8f412e2f09 Update etcd from v3.3.13 to v3.3.14
* https://github.com/etcd-io/etcd/releases/tag/v3.3.14
2019-08-18 21:05:06 -07:00
4ef2eb7e6b Update Prometheus from v2.11.2 to v2.12.0
* https://github.com/prometheus/prometheus/releases/tag/v2.12.0
2019-08-18 20:59:44 -07:00
99990e3cbb Use stable IDs for etcd, CoreDNS, and Ngnix dashboards
* Use unique dashboard ID so that multiple replicas of Grafana
serve dashboards with uniform paths
* Fix issue where refreshing a dashboard served by one replica
could show a 404 unless the request went to the same replica
2019-08-18 12:45:49 -07:00
3c3708d58e Update Calico from v3.8.1 to v3.8.2
* https://docs.projectcalico.org/v3.8/release-notes/
2019-08-16 15:38:23 -07:00
0c45cd0f06 Update Grafana from v6.3.2 to v6.3.3
* https://github.com/grafana/grafana/releases/tag/v6.3.3
2019-08-16 14:40:47 -07:00
976452825e Update Prometheus from v2.11.0 to v2.11.2
* https://github.com/prometheus/prometheus/releases/tag/v2.11.2
2019-08-14 21:26:46 -07:00
7bc5633c38 Update nginx-ingress from v0.25.0 to v0.25.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.25.1
2019-08-14 21:26:46 -07:00
09eb236519 Fix worker_preemptible spelling in GCP docs (#529) 2019-08-14 21:25:38 -07:00
6db11d5908 Enable AWS root block device encryption by default
* terraform-provider-aws v2.23.0 allows AWS root block devices
to enable encryption by default.
* Require updating terraform-provider-aws to v2.23.0 or higher
* Enable root EBS device encryption by default for controller
instances and worker instances in auto-scaling groups

For comparison:

* Google Cloud persistent disks have been encrypted by
default for years
* Azure managed disk encryption is not ready yet (#486)
2019-08-07 21:13:44 -07:00
cad12804c8 Refresh terraform provider versions used in docs
* Sync terraform provider versions with those tested against
2019-08-07 20:42:40 -07:00
eaea4d37a2 Update Grafana from v6.2.5 to v6.3.2
* https://github.com/grafana/grafana/releases/tag/v6.3.2
* https://github.com/grafana/grafana/releases/tag/v6.3.1
* https://github.com/grafana/grafana/releases/tag/v6.3.0
2019-08-07 20:01:18 -07:00
457ad18daa Update kube-state-metrics from v1.7.1 to v1.7.2
* Add a separate liveness and readiness probe
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.2
2019-08-07 20:00:24 -07:00
350 changed files with 44981 additions and 19051 deletions

1
.github/FUNDING.yml vendored Normal file
View File

@ -0,0 +1 @@
github: [poseidon]

View File

@ -1,33 +0,0 @@
<!-- Fill in either the 'Bug' or 'Feature Request' section -->
## Bug
### Environment
* Platform: aws, azure, bare-metal, google-cloud, digital-ocean
* OS: container-linux, flatcar-linux
* Release: Typhoon version or Git SHA (reporting latest is **not** helpful)
* Terraform: `terraform version` (reporting latest is **not** helpful)
* Plugins: Provider plugin versions (reporting latest is **not** helpful)
### Problem
Describe the problem.
### Desired Behavior
Describe the goal.
### Steps to Reproduce
Provide clear steps to reproduce the issue unless already covered.
## Feature Request
### Feature
Describe the feature and what problem it solves.
### Tradeoffs
What are the pros and cons of this feature? How will it be exercised and maintained?

39
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@ -0,0 +1,39 @@
---
name: Bug report
about: Report a bug to improve the project
title: ''
labels: ''
assignees: ''
---
<!-- READ: Issues are used to receive focused bug reports from users and to track planned future enhancements by the authors. Topics like cluster operation, support, debugging help, advice, and Kubernetes concepts are out of scope and should not use issues-->
**Description**
A clear and concise description of what the bug is.
**Steps to Reproduce**
Provide clear steps to reproduce the bug.
- [ ] Relevant error messages if appropriate (concise, not a dump of everything).
- [ ] Explored using a vanilla cluster from the [tutorials](https://typhoon.psdn.io/#documentation). Ruled out [customizations](https://typhoon.psdn.io/advanced/customization/).
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment**
* Platform: aws, azure, bare-metal, google-cloud, digital-ocean
* OS: fedora-coreos, flatcar-linux (include release version)
* Release: Typhoon version or Git SHA (reporting latest is **not** helpful)
* Terraform: `terraform version` (reporting latest is **not** helpful)
* Plugins: Provider plugin versions (reporting latest is **not** helpful)
**Possible Solution**
<!-- Most bug reports should have some inkling about solutions. Otherwise, your report may be less of a bug and more of a support request (see top).-->
Link to a PR or description.

5
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,5 @@
blank_issues_enabled: true
contact_links:
- name: Security
url: https://typhoon.psdn.io/topics/security/
about: Report security vulnerabilities

View File

@ -1,10 +0,0 @@
High level description of the change.
* Specific change
* Specific change
## Testing
Describe your work to validate the change works.
rel: issue number (if applicable)

6
.github/dependabot.yaml vendored Normal file
View File

@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: pip
directory: "/"
schedule:
interval: weekly

15
.github/issue_template.md vendored Normal file
View File

@ -0,0 +1,15 @@
<!-- READ: Issues are used to receive focused bug reports from users and to track planned future enhancements by the authors. Topics like cluster operation, support, debugging help, advice, and Kubernetes concepts are out of scope and should not use issues-->
## Enhancement
### Overview
One paragraph explanation of the enhancement.
### Motivation
Describe the motivation and what problem this solves.
### Tradeoffs
What are the pros and cons of this feature? How will it be exercised and maintained?

12
.github/release.yaml vendored Normal file
View File

@ -0,0 +1,12 @@
changelog:
categories:
- title: Contributions
labels:
- '*'
exclude:
labels:
- dependencies
- no-release-note
- title: Dependencies
labels:
- dependencies

12
.github/workflows/publish.yaml vendored Normal file
View File

@ -0,0 +1,12 @@
name: publish
on:
push:
branches:
- release-docs
jobs:
mkdocs:
name: mkdocs
uses: poseidon/matchbox/.github/workflows/mkdocs-pages.yaml@main
# Add content write for GitHub Pages
permissions:
contents: write

2
.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
site/
venv/

1764
CHANGES.md
View File

@ -4,6 +4,1752 @@ Notable changes between versions.
## Latest
## v1.31.3
* Kubernetes [v1.31.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1312)
* Update CoreDNS from v1.11.3 to v1.11.4
* Update Cilium from v1.16.3 to [v1.16.4](https://github.com/cilium/cilium/releases/tag/v1.16.4)
### Deprecations
* Plan to drop support for using Calico CNI, recommend everyone use the Cilium default
## v1.31.2
* Kubernetes [v1.31.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1312)
* Update Cilium from v1.16.1 to [v1.16.3](https://github.com/cilium/cilium/releases/tag/v1.16.3)
* Update flannel from v0.25.6 to [v0.26.0](https://github.com/flannel-io/flannel/releases/tag/v0.26.0)
## v1.31.1
* Kubernetes [v1.31.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1311)
* Update flannel from v0.25.5 to [v0.25.6](https://github.com/flannel-io/flannel/releases/tag/v0.25.6)
### Google
* Add `controller_disk_type` and `worker_disk_type` variables ([#1513](https://github.com/poseidon/typhoon/pull/1513))
* Add explicit `region` field to regional worker instance templates ([#1524](https://github.com/poseidon/typhoon/pull/1524))
## v1.31.0
* Kubernetes [v1.31.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1310)
* Use Cilium kube-proxy replacement mode when `cilium` networking is chosen ([#1501](https://github.com/poseidon/typhoon/pull/1501))
* Fix invalid flannel-cni container image for those using `flannel` networking ([#1497](https://github.com/poseidon/typhoon/pull/1497))
### AWS
* Use EC2 resource-based hostnames instead of IP-based hostnames ([#1499](https://github.com/poseidon/typhoon/pull/1499))
* The Amazon DNS server can resolve A and AAAA queries to IPv4 and IPv6 node addresses
* Tag controller node EBS volumes with a name based on the controller node name
### Google
* Use `google_compute_region_instance_template` instead of `google_compute_instance_template`
* Google's regional instance template metadata is kept in the associated region for greater resiliency. The "global" instance templates were kept in a single region
## v1.30.4
* Kubernetes [v1.30.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1304)
* Update Cilium from v1.15.7 to [v1.16.1](https://github.com/cilium/cilium/releases/tag/v1.16.1)
* Update CoreDNS from v1.11.1 to v1.11.3
* Remove `enable_aggregation` variable for Kubernetes Aggregation Layer, always set to true
* Remove `cluster_domain_suffix` variable, always use "cluster.local"
* Remove `enable_reporting` variable for analytics, always set to false
## v1.30.3
* Kubernetes [v1.30.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1303)
* Update Cilium from v1.15.6 to [v1.15.7](https://github.com/cilium/cilium/releases/tag/v1.15.7)
* Update flannel from v0.25.4 to [v0.25.5](https://github.com/flannel-io/flannel/releases/tag/v0.25.5)
### AWS
* Configure controller and worker disks ([#1482](https://github.com/poseidon/typhoon/pull/1482))
* Add `controller_disk_type`, `controller_disk_size`, and `controller_disk_iops` variables
* Add `worker_disk_type`, `worker_disk_size`, and `worker_disk_iops` variables
* Remove `disk_type`, `disk_size`, and `disk_iops` variables
* Fix propagating settings to worker disks, previously ignored
* Configure CPU pricing model for burstable instance types ([#1482](https://github.com/poseidon/typhoon/pull/1482))
* Add `controller_cpu_credits` and `worker_cpu_credits` variables (`standard` or `unlimited`)
* Configure controller or worker instance architecture ([#1485](https://github.com/poseidon/typhoon/pull/1485))
* Add `controller_arch` and `worker_arch` variables (`amd64` or `arm64`)
* Remove `arch` variable
```diff
module "cluster" {
...
- arch = "amd64"
- disk_type = "gp3"
- disk_size = 30
- disk_iops = 3000
+ controller_arch = "amd64"
+ controller_disk_size = 15
+ controller_cpu_credits = "standard"
+ worker_arch = "amd64"
+ worker_disk_size = 22
+ worker_cpu_credits = "unlimited"
}
```
### Azure
* Configure the virtual network and subnets with IPv6 private address space
* Change `host_cidr` variable (string) to a `network_cidr` object with `ipv4` and `ipv6` fields that list CIDR strings. Leave the variable unset to use the defaults. (**breaking**)
* Add support for dual-stack Kubernetes Ingress Load Balancing
* Add a public IPv6 frontend, 80/443 rules, and a worker-ipv6 backend pool
* Change the `controller_address_prefixes` output from a list of strings to an object with `ipv4` and `ipv6` fields. Most Azure resources can't accept a mix, so these are split out (**breaking**)
* Change the `worker_address_prefixes` output from a list of strings to an object with `ipv4` and `ipv6` fields. Most Azure resources can't accept a mix, so these are split out (**breaking**)
* Change the `backend_address_pool_id` output (and worker module input) from a string to an object with `ipv4` and `ipv6` fields that list ids (**breaking**)
* Configure nodes to have outbound IPv6 internet connectivity (analogous to IPv4 SNAT)
* Configure controller nodes to have a public IPv6 address
* Configure worker nodes to use outbound rules and the load balancer for SNAT
* Extend network security rules to allow IPv6 traffic, analogous to IPv4
* Rename `region` variable to `location` to align with Azure platform conventions ([#1469](https://github.com/poseidon/typhoon/pull/1469))
* Change worker pools from uniform to flexible orchestration mode ([#1473](https://github.com/poseidon/typhoon/pull/1473))
* Add options to allow workers nodes to use ephemeral local disks ([#1473](https://github.com/poseidon/typhoon/pull/1473))
* Add `controller_disk_type` and `controller_disk_size` variables
* Add `worker_disk_type`, `worker_disk_size`, and `worker_ephemeral_disk` variables
* Reduce the number of public IPv4 addresses needed for the Azure load balancer ([#1470](https://github.com/poseidon/typhoon/pull/1470))
* Configure controller or worker instance architecture for Flatcar Linux ([#1485](https://github.com/poseidon/typhoon/pull/1485))
* Add `controller_arch` and `worker_arch` variables (`amd64` or `arm64`)
* Remove `arch` variable
```diff
module "cluster" {
...
- region = "centralus"
+ location = "centralus"
# optional
- host_cidr = "10.0.0.0/16"
+ network_cidr = {
+ ipv4 = ["10.0.0.0/16"]
+ }
# instances
+ controller_disk_type = "StandardSSD_LRS"
+ worker_ephemeral_disk = true
}
```
### Google Cloud
* Allow configuring controller and worker disks ([#1486](https://github.com/poseidon/typhoon/pull/1486))
* Add `controller_disk_size` and `worker_disk_size` variables
* Remove `disk_size` variable
## v1.30.2
* Kubernetes [v1.30.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1302)
* Update CoreDNS from v1.9.4 to v1.11.1
* Update Cilium from v1.15.5 to [v1.15.6](https://github.com/cilium/cilium/releases/tag/v1.15.6)
* Update flannel from v0.25.1 to [v0.25.4](https://github.com/flannel-io/flannel/releases/tag/v0.25.4)
## v1.30.1
* Kubernetes [v1.30.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1301)
* Add firewall rules and security group rules for Cilium and Hubble metrics ([#1449](https://github.com/poseidon/typhoon/pull/1449))
* Update Cilium from v1.15.3 to [v1.15.5](https://github.com/cilium/cilium/releases/tag/v1.15.5)
* Update flannel from v0.24.4 to [v0.25.1](https://github.com/flannel-io/flannel/releases/tag/v0.25.1)
* Introduce `components` variabe to enable/disable/configure pre-installed components ([#1453](https://github.com/poseidon/typhoon/pull/1453))
* Add Terraform modules for `coredns`, `cilium`, and `flannel` components
### Azure
* Add `controller_security_group_name` output for adding custom security rules ([#1450](https://github.com/poseidon/typhoon/pull/1450))
* Add `controller_address_prefixes` output for adding custom security rules ([#1450](https://github.com/poseidon/typhoon/pull/1450))
## v1.30.0
* Kubernetes [v1.30.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1300)
* Update etcd from v3.5.12 to [v3.5.13](https://github.com/etcd-io/etcd/releases/tag/v3.5.13)
* Update Cilium from v1.15.2 to [v1.15.3](https://github.com/cilium/cilium/releases/tag/v1.15.3)
* Update Calico from v3.27.2 to [v3.27.3](https://github.com/projectcalico/calico/releases/tag/v3.27.3)
## v1.29.3
* Kubernetes [v1.29.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1293)
* Update Cilium from v1.15.1 to [v1.15.2](https://github.com/cilium/cilium/releases/tag/v1.15.2)
* Update flannel from v0.24.2 to [v0.24.4](https://github.com/flannel-io/flannel/releases/tag/v0.24.4)
## v1.29.2
* Kubernetes [v1.29.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1292)
* Update etcd from v3.5.10 to [v3.5.12](https://github.com/etcd-io/etcd/releases/tag/v3.5.12)
* Update Cilium from v1.14.3 to [v1.15.1](https://github.com/cilium/cilium/releases/tag/v1.15.1)
* Update Calico from v3.26.3 to [v3.27.2](https://github.com/projectcalico/calico/releases/tag/v3.27.2)
* Fix upstream incompatibility with Fedora CoreOS ([calico#8372](https://github.com/projectcalico/calico/issues/8372))
* Update flannel from v0.22.2 to [v0.24.2](https://github.com/flannel-io/flannel/releases/tag/v0.24.2)
* Add an `install_container_networking` variable (default `true`) ([#1421](https://github.com/poseidon/typhoon/pull/1421))
* When `true`, the chosen container `networking` provider is installed during cluster bootstrap
* Set `false` to self-manage the container networking provider. This allows flannel, Calico, or Cilium
to be managed via Terraform (like any other Kubernetes resources). Nodes will be NotReady until you
apply the self-managed container networking provider. This may become the default in future.
* Continue to set `networking` to one of the three supported container networking providers. Most
require custom firewall / security policies be present across nodes so they have some infra tie-ins.
## v1.29.1
* Kubernetes [v1.29.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1291)
### AWS
* Continue to support AWS IMDSv1 ([#1412](https://github.com/poseidon/typhoon/pull/1412))
### Known Issues
* Calico and Fedora CoreOS cannot be used together currently ([calico#8372](https://github.com/projectcalico/calico/issues/8372))
## v1.29.0
* Kubernetes [v1.29.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1290)
### Known Issues
* Calico and Fedora CoreOS cannot be used together currently ([calico#8372](https://github.com/projectcalico/calico/issues/8372))
## v1.28.4
* Kubernetes [v1.28.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1284)
## v1.28.3
* Kubernetes [v1.28.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1283)
* Update etcd from v3.5.9 to [v3.5.10](https://github.com/etcd-io/etcd/releases/tag/v3.5.10)
* Update Cilium from v1.14.2 to [v1.14.3](https://github.com/cilium/cilium/releases/tag/v1.14.3)
* Workaround problems in Cilium v1.14's partial `kube-proxy` implementation ([#365](https://github.com/poseidon/terraform-render-bootstrap/pull/365))
* Update Calico from v3.26.1 to [v3.26.3](https://github.com/projectcalico/calico/releases/tag/v3.26.3)
### Google Cloud
* Allow upgrading Google Cloud Terraform provider to v5.x
## v1.28.2
* Kubernetes [v1.28.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1282)
* Update Cilium from v1.14.1 to [v1.14.2](https://github.com/cilium/cilium/releases/tag/v1.14.2)
### Azure
* Add optional `azure_authorized_key` variable
* Azure obtusely inspects public keys, requires RSA keys, and forbids more secure key formats (e.g. ed25519)
* Allow passing a dummy RSA key via `azure_authorized_key` (delete the private key) to satisfy Azure validations, then the usual `ssh_authorized_key` variable can new newer formats (e.g. ed25519)
## v1.28.1
* Kubernetes [v1.28.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1281)
## v1.28.0
* Kubernetes [v1.28.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1280)
* Update Cilium from v1.13.4 to [v1.14.1](https://github.com/cilium/cilium/releases/tag/v1.14.1)
* Update flannel from v0.22.0 to [v0.22.2](https://github.com/flannel-io/flannel/releases/tag/v0.22.2)
## v1.27.4
* Kubernetes [v1.27.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1274)
## v1.27.3
* Kubernetes [v1.27.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1273)
* Update etcd from v3.5.7 to [v3.5.9](https://github.com/etcd-io/etcd/releases/tag/v3.5.9)
* Update Cilium from v1.13.2 to [v1.13.4](https://github.com/cilium/cilium/releases/tag/v1.13.4)
* Update Calico from v3.25.1 to [v3.26.1](https://github.com/projectcalico/calico/releases/tag/v3.26.1)
* Update flannel from v0.21.2 to [v0.22.0](https://github.com/flannel-io/flannel/releases/tag/v0.22.0)
### AWS
* Allow upgrading AWS Terraform provider to v5.x ([#1353](https://github.com/poseidon/typhoon/pull/1353))
### Azure
* Enable boot diagnostics for controller and worker VMs ([#1351](https://github.com/poseidon/typhoon/pull/1351))
## v1.27.2
* Kubernetes [v1.27.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1272)
### Fedora CoreOS
* Update Butane Config version from v1.4.0 to v1.5.0
* Require any custom Butane [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) update to v1.5.0
* Require Fedora CoreOS `37.20230303.3.0` or newer (with ignition v2.15)
* Require poseidon/ct v0.13+ (**action required**)
## v1.27.1
* Kubernetes [v1.27.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1271)
* Update etcd from v3.5.7 to [v3.5.8](https://github.com/etcd-io/etcd/releases/tag/v3.5.8)
* Update Cilium from v1.13.1 to [v1.13.2](https://github.com/cilium/cilium/releases/tag/v1.13.2)
* Update Calico from v3.25.0 to [v3.25.1](https://github.com/projectcalico/calico/releases/tag/v3.25.1)
## v1.26.3
* Kubernetes [v1.26.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1263)
* Update Cilium from v1.12.6 to [v1.13.1](https://github.com/cilium/cilium/releases/tag/v1.13.1)
### Bare-Metal
* Add `oem_type` variable for Flatcar Linux ([#1302](https://github.com/poseidon/typhoon/pull/1302))
## v1.26.2
* Kubernetes [v1.26.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1262)
* Update Cilium from v1.12.5 to [v1.12.6](https://github.com/cilium/cilium/releases/tag/v1.12.6)
* Update flannel from v0.20.2 to [v0.21.2](https://github.com/flannel-io/flannel/releases/tag/v0.21.2)
### Bare-Metal
* Add a `worker` module to allow customizing individual worker nodes ([#1295](https://github.com/poseidon/typhoon/pull/1295))
### Known Issues
* Fedora CoreOS [issue](https://github.com/coreos/fedora-coreos-tracker/issues/1423) fix is progressing through channels
## v1.26.1
* Kubernetes [v1.26.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1261)
* Update etcd from v3.5.6 to [v3.5.7](https://github.com/etcd-io/etcd/releases/tag/v3.5.7)
* Update Cilium from v1.12.4 to [v1.12.5](https://github.com/cilium/cilium/releases/tag/v1.12.5)
* Update Calico from v3.24.5 to [v3.25.0](https://github.com/projectcalico/calico/releases/tag/v3.25.0)
* Update CoreDNS from v1.9.3 to [v1.9.4](https://github.com/poseidon/terraform-render-bootstrap/pull/341)
## v1.26.0
* Kubernetes [v1.26.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1260)
* Update etcd from v3.5.5 to [v3.5.6](https://github.com/etcd-io/etcd/releases/tag/v3.5.6)
* Update Cilium from v1.12.3 to [v1.12.4](https://github.com/cilium/cilium/releases/tag/v1.12.4)
* Update flannel from v0.15.1 to [v0.20.2](https://github.com/flannel-io/flannel/releases/tag/v0.20.2)
* Reminder: Modules are no longer published to the [Terraform Module Registry](https://registry.terraform.io/search/modules?q=poseidon) ([#1282](https://github.com/poseidon/typhoon/pull/1282))
* See [#1282](https://github.com/poseidon/typhoon/pull/1282) and [v1.25.4](https://github.com/poseidon/typhoon/releases/tag/v1.25.4) for details
### AWS
* Migrate AWS launch configurations to launch templates ([#1275](https://github.com/poseidon/typhoon/pull/1275))
* Starting Dec 31, 2022 AWS won't add new instance types/families to launch configurations
### Addons
* Update ingress-nginx from v1.3.1 to [v1.5.1](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.5.1)
* Update Prometheus from v2.40.1 to [v2.40.5](https://github.com/prometheus/prometheus/releases/tag/v2.40.5)
* Update node-exporter from v1.3.1 to [v1.5.0](https://github.com/prometheus/node_exporter/releases/tag/v1.5.0)
* Update kube-state-metrics from v2.6.0 to [v2.7.0](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.7.0)
* Update Grafana from v9.2.4 to [v9.3.1](https://github.com/grafana/grafana/releases/tag/v9.3.1)
## v1.25.4
* Kubernetes [v1.25.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1254)
* Update Calico from v3.24.1 to [v3.24.5](https://github.com/projectcalico/calico/releases/tag/v3.24.5)
* Allow Kubelet kubeconfig to drain nodes, if desired ([#330](https://github.com/poseidon/terraform-render-bootstrap/pull/330))
* Re-enable Kubelet Graceful Node Shutdown ([#1261](https://github.com/poseidon/typhoon/pull/1261))
* Introduce companion project [poseidon/scuttle](https://github.com/poseidon/scuttle)
* Link to new Mastodon account for release announcements
* [@typhoon@fosstodon.org](https://fosstodon.org/@typhoon)
* [@poseidon@fosstodon.org](https://fosstodon.org/@poseidon)
* Deprecate publishing to the [Terraform Module Registry](https://registry.terraform.io/search/modules?q=poseidon)
* Typhoon docs have always shown using Git-based module sources, not the Terraform Module Registry
* Module usage should be `source = "git::https://github.com/poseidon/typhoon/...` not `source = poseidon/kubernetes/...`
* Terraform's Module Registry requires subtree mirroring typhoon to special terraform-platform-kubernetes repos, only supports release versions (no commit SHAs or forks), only ever contained Flatcar Linux modules (not Fedora CoreOS) for historical reasons
* Note, this does not affect Terraform Providers like `poseidon/matchbox` or `poseidon/ct`, the registry works well for providers
### Fedora CoreOS
* Remove unused `Wants=network.target` from `etcd-member.service` ([#1254](https://github.com/poseidon/typhoon/pull/1254))
### Cloud
* Remove defunct `delete-node.service` from worker node configurations ([#1256](https://github.com/poseidon/typhoon/pull/1256))
### Addons
* Update Prometheus from v2.39.1 to [v2.40.1](https://github.com/prometheus/prometheus/releases/tag/v2.40.1)
* Update Grafana from v9.1.7 to [v9.2.4](https://github.com/grafana/grafana/releases/tag/v9.2.4)
## v1.25.3
* Kubernetes [v1.25.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1253)
* Switch Kubernetes registry from `k8s.gcr.io` to `registry.k8s.io` for addons ([#1246](https://github.com/poseidon/typhoon/pull/1246))
* Update Cilium from v1.12.2 to [v1.12.3](https://github.com/cilium/cilium/releases/tag/v1.12.3) ([#1253](https://github.com/poseidon/typhoon/pull/1253))
### Azure
* Change default Azure `worker_type` from [`Standard_DS1_v2`](https://learn.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series#dsv2-series) to [`Standard_D2as_v5`](https://learn.microsoft.com/en-us/azure/virtual-machines/dasv5-dadsv5-series#dasv5-series) ([#1248](https://github.com/poseidon/typhoon/pull/1248))
* Get 2 VCPU, 7 GiB, 12500Mbps (vs 1 VCPU, 3.5GiB, 750 Mbps)
* Small increase in pay-as-you-go price ($53.29 -> $62.78)
* Small increase in spot price ($5.64/mo -> $7.37/mo)
* Change from Intel to AMD EPYC (`D2as_v5` cheaper than `D2s_v5`)
### Flatcar Linux
* Add Flatcar Linux ARM64 support on Azure ([docs](https://typhoon.psdn.io/advanced/arm64/), [#1251](https://github.com/poseidon/typhoon/pull/1251))
* Switch from Azure Hypervisor gen1 to gen2 (**action required**) ([#1248](https://github.com/poseidon/typhoon/pull/1248))
* Run `az vm image terms accept --publish kinvolk --offer flatcar-container-linux-free --plan stable-gen2`
### Docs
* Remove old docs note about not supporting ARM64 with Calico
* Typhoon supports ARM64 with `cilium`, `calico`, and `flannel`
### Addons
* Update Prometheus from v2.38.0 to [v2.39.1](https://github.com/prometheus/prometheus/releases/tag/v2.39.1)
* Update Grafana from v9.1.6 to [v9.1.7](https://github.com/grafana/grafana/releases/tag/v9.1.7)
## v1.25.2
Kubernetes v1.25.2 was skipped since there were minimal changes upstream.
## v1.25.1
* Kubernetes [v1.25.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1251)
* Update etcd from v3.5.4 to [v3.5.5](https://github.com/etcd-io/etcd/releases/tag/v3.5.5)
* Update Cilium from v1.12.1 to [v1.12.2](https://github.com/cilium/cilium/releases/tag/v1.12.2)
* Update Calico from v3.23.3 to [v3.24.1](https://github.com/projectcalico/calico/releases/tag/v3.24.1)
* Revert Kubelet Graceful Node Shutdown on worker nodes ([#1227](https://github.com/poseidon/typhoon/pull/1227))
* Fix issue where non-critical pods are left in Error/Completed state on node shutdown
* Remove feature flag disable workaround for [kubernetes/kubernetes#112081](https://github.com/kubernetes/kubernetes/issues/112081)
* Kubernetes [reverted](https://github.com/kubernetes/kubernetes/pull/112078) `LocalStorageCapacityIsolationFSQuotaMonitoring` back to alpha
* Remove workaround for preventing `search .` propagation in [kubernetes/kubernetes#112135](https://github.com/kubernetes/kubernetes/issues/112135)
* Upstream Kubernetes [fix](https://github.com/kubernetes/kubernetes/pull/112157)
### Addons
* Update kube-state-metrics from v2.5.0 to [v2.6.0](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.6.0)
* Update ingress-nginx from v1.3.0 to [v1.3.1](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.3.1)
* Update Grafana from v9.1.0 to [v9.1.6](https://github.com/grafana/grafana/releases/tag/v9.1.6)
## v1.25.0
* Kubernetes [v1.25.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1250)
* Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature gate ([#1220](https://github.com/poseidon/typhoon/pull/1220), fixes [kubernetes#112081](https://github.com/kubernetes/kubernetes/issues/112081))
* Add workaround to revert adding "search ." to containers' `/etc/resolv.conf` ([#1224](https://github.com/poseidon/typhoon/pull/1224), fixes [kubernetes#112135](https://github.com/kubernetes/kubernetes/issues/112135))
* Migrate most Kubelet flags to KubeletConfiguration file ([#1219](https://github.com/poseidon/typhoon/pull/1219))
* Configure Kubelet Graceful Node Shutdown ([#1222](https://github.com/poseidon/typhoon/pull/1222))
* Allow up to 30s for critical pods to gracefully shutdown on node shutdown
* Allow up to 15s for regular pods to gracefully shutdown on node shutdown
* Mark node NotReady promptly on node shutdown
* Lengthen systemd inhibitor lock max delay from 5s to 45s
### Fedora CoreOS
* Change Podman `log-driver` from `journald` to `k8s-file` ([#1221](https://github.com/poseidon/typhoon/pull/1221))
* Fix `etcd-member` and Kubelet systemd service log lines appearing twice in journal logs
## v1.24.4
* Kubernetes [v1.24.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244)
* Update CoreDNS from v1.8.6 to [v1.9.3](https://github.com/poseidon/terraform-render-bootstrap/pull/318)
* Update Cilium from v1.11.7 to [v1.12.1](https://github.com/cilium/cilium/releases/tag/v1.12.1)
* Update Calico from v3.23.1 to [v3.23.3](https://github.com/projectcalico/calico/releases/tag/v3.23.3)
* Switch Kubernetes registry from `k8s.gcr.io` to `registry.k8s.io` ([#1206](https://github.com/poseidon/typhoon/pull/1206))
* Remove use of deprecated Terraform [template](https://registry.terraform.io/providers/hashicorp/template) provider ([#1194](https://github.com/poseidon/typhoon/pull/1194))
### Fedora CoreOS
* Remove ineffective `/etc/fedora-coreos/iptables-legacy.stamp` ([#1201](https://github.com/poseidon/typhoon/pull/1201))
* Typhoon already uses iptables v1.8.7 (nf_tables) since FCOS 36
* Staying on legacy iptables required a file in `/etc/coreos` instead
### Flatcar Linux
* Migrate Flatcar Linux from Ignition spec v2.3.0 to v3.3.0 ([#1196](https://github.com/poseidon/typhoon/pull/1196)) (**action required**)
* Flatcar Linux 3185.0.0+ [supports](https://flatcar-linux.org/docs/latest/provisioning/ignition/specification/#ignition-v3) Ignition v3.x specs (which are rendered from Butane Configs, like Fedora CoreOS)
* `poseidon/ct` v0.11.0 [supports](https://github.com/poseidon/terraform-provider-ct/pull/131) the `flatcar` Butane Config variant
* Require poseidon/ct v0.11+ and Flatcar Linux 3185.0.0+
* Please modify any Flatcar Linux snippets to use the [Butane Config](https://coreos.github.io/butane/config-flatcar-v1_0/) format (**action required**)
```tf
variant: flatcar
version: 1.0.0
...
```
### AWS
* [Refresh](https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html) instances in autoscaling group when launch configuration changes ([#1208](https://github.com/poseidon/typhoon/pull/1208)) ([docs](https://typhoon.psdn.io/topics/maintenance/#node-configuration-updates), **important**)
* Worker launch configuration changes start an autoscaling group instance refresh to replace instances
* Instance refresh creates surge instances, waits for a warm-up period, then deletes old instances
* Changing `worker_type`, `disk_*`, `worker_price`, `worker_target_groups`, or Butane `worker_snippets` on existing worker nodes will replace instances
* New AMIs or changing `os_stream` will be ignored, to allow Fedora CoreOS or Flatcar Linux to keep themselves updated
* Previously, new launch configurations were made in the same way, but not applied to instances unless manually replaced
* Rename worker autoscaling group `${cluster_name}-worker` ([#1202](https://github.com/poseidon/typhoon/pull/1202))
* Rename launch configuration `${cluster_name}-worker` instead of a random id
### Google
* [Roll](https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups) instance template changes to worker managed instance groups ([#1207](https://github.com/poseidon/typhoon/pull/1207)) ([docs](https://typhoon.psdn.io/topics/maintenance/#node-configuration-updates), **important**)
* Worker instance template changes roll out by gradually replacing instances
* Automatic rollouts create surge instances, wait for health checks, then delete old instances (0 unavailable instances)
* Changing `worker_type`, `disk_size`, `worker_preemptible`, or Butane `worker_snippets` on existing worker nodes will replace instances
* New compute images or changing `os_stream` will be ignored, to allow Fedora CoreOS or Flatcar Linux to keep themselves updated
* Previously, new instance templates were made in the same way, but not applied to instances unless manually replaced
* Add health checks to worker managed instance groups (i.e. "autohealing") ([#1207](https://github.com/poseidon/typhoon/pull/1207))
* Use health checks to probe kube-proxy every 30s
* Replace worker nodes that fail the health check 6 times (3min)
* Name `kube-apiserver` and `worker` health checks consistently ([#1207](https://github.com/poseidon/typhoon/pull/1207))
* Use name `${cluster_name}-apiserver-health` and `${cluster_name}-worker-health`
* Rename managed instance group from `${cluster_name}-worker-group` to `${cluster_name}-worker` ([#1207](https://github.com/poseidon/typhoon/pull/1207))
* Fix bug provisioning clusters with multiple controller nodes ([#1195](https://github.com/poseidon/typhoon/pull/1195))
### Addons
* Update Prometheus from v2.37.0 to [v2.38.0](https://github.com/prometheus/prometheus/releases/tag/v2.38.0)
* Update Grafana from v9.0.3 to [v9.1.0](https://github.com/grafana/grafana/releases/tag/v9.1.0)
## v1.24.3
* Kubernetes [v1.24.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1243)
* Update Cilium from v1.11.6 to [v1.11.7](https://github.com/cilium/cilium/releases/tag/v1.11.7)
### Addons
* Update ingress-nginx from v1.2.1 to [v1.3.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.3.0)
* Update Prometheus from v2.36.1 to [v2.37.0](https://github.com/prometheus/prometheus/releases/tag/v2.37.0)
* Update Grafana from v8.5.6 to [v9.0.3](https://github.com/grafana/grafana/releases/tag/v9.0.3)
### Notes
* Poseidon repos will soon change their default branch from `master` to `main`
## v1.24.2
* Kubernetes [v1.24.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1242)
* Update Cilium from v1.11.5 to [v1.11.6](https://github.com/cilium/cilium/releases/tag/v1.11.6)
* Update Calico from v3.22.2 to [v3.23.1](https://github.com/projectcalico/calico/releases/tag/v3.23.1)
### Addons
* Update Prometheus from v2.36.0 to [v2.36.1](https://github.com/prometheus/prometheus/releases/tag/v2.36.1)
* Update Grafana from v8.5.3 to [v8.5.6](https://github.com/grafana/grafana/releases/tag/v8.5.6)
* Update kube-state-metrics from v2.4.2 to [v2.5.0](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.5.0)
## Known Issues
* Skip AWS Terraform provider v4.17.0 to v4.19.0, which had a regression affecting workers joining ([#1173](https://github.com/poseidon/typhoon/issues/1173))
## v1.24.1
* Kubernetes [v1.24.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1241)
* Update Cilium from v1.11.4 to [v1.11.5](https://github.com/cilium/cilium/releases/tag/v1.11.5)
### Addons
* Update Prometheus from v2.35.0 to [v2.36.0](https://github.com/prometheus/prometheus/releases/tag/v2.36.0)
* Update Grafana from v8.5.1 to [v8.5.3](https://github.com/grafana/grafana/releases/tag/v8.5.3)
* Update nginx-ingress from v1.2.0 to [v1.2.1](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.2.1)
## v1.24.0
* Kubernetes [v1.24.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1240)
* Update etcd from v3.5.2 to [v3.5.4](https://github.com/etcd-io/etcd/releases/tag/v3.5.4)
* Add Kubelet mounts to enable relabeling workload volumes ([#1152](https://github.com/poseidon/typhoon/pull/1152))
* StorageClass no longer require explicit SELinux mount contexts
### Addons
* Update nginx-ingress from v1.1.3 to [v1.2.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.2.0)
* Update Prometheus from v2.34.0 to [v2.35.0](https://github.com/prometheus/prometheus/releases/tag/v2.35.0)
* Update Grafana from v8.4.5 to [v8.5.1](https://github.com/grafana/grafana/releases/tag/v8.5.1)
## v1.23.6
* Kubernetes [v1.23.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1236)
* Update Cilium from v1.11.2 to [v1.11.4](https://github.com/cilium/cilium/releases/tag/v1.11.4)
* Rename Cilium DaemonSet from `cilium-agent` to `cilium` to match Cilium CLI tools ([#303](https://github.com/poseidon/terraform-render-bootstrap/pull/303))
* Update Calico from v3.22.1 to [v3.22.2](https://github.com/projectcalico/calico/releases/tag/v3.22.2)
* Mount /etc/machine-id from host into Kubelet ([#1143](https://github.com/poseidon/typhoon/pull/1143))
* Remove deprecated use of `key_algorithm` in `hashicorp/tls` resources
### Azure
* Allow upgrading Azure Terraform provider to v3.x ([#1144](https://github.com/poseidon/typhoon/pull/1144))
* Rename `worker_address_prefix` output to `worker_address_prefixes`
### Google Cloud
* Fix issue on Flatcar Linux with controller nodes not ignoring os image changes ([#1149](https://github.com/poseidon/typhoon/pull/1149))
* Nodes will auto-update, Terraform should not attempt to delete/recreate them
### Addons
* Update nginx-ingress from v1.1.2 to [v1.1.3](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.1.3)
* Update Prometheus from v2.33.5 to [v2.34.0](https://github.com/prometheus/prometheus/releases/tag/v2.34.0)
* Update Grafana from v8.4.4 to [v8.4.5](https://github.com/grafana/grafana/releases/tag/v8.4.5)
## v1.23.5
* Kubernetes [v1.23.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1235)
* Update Cilium from v1.11.1 to [v1.11.2](https://github.com/cilium/cilium/releases/tag/v1.11.2)
* Update Calico from v3.21.2 to [v3.22.1](https://github.com/projectcalico/calico/releases/tag/v3.22.1)
* Fix [calico#5011](https://github.com/projectcalico/calico/issues/5011), broken since v1.23.0
### Addons
* Refresh Prometheus rules and Grafana dashboards ([#1136](https://github.com/poseidon/typhoon/pull/1136))
* Update nginx-ingress from v1.1.1 to [v1.1.2](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.1.2)
* Update Prometheus from v2.33.3 to [v2.33.5](https://github.com/prometheus/prometheus/releases/tag/v2.33.5)
* Update Grafana from v8.4.1 to [v8.4.3](https://github.com/grafana/grafana/releases/tag/v8.4.3)
* Update kube-state-metrics from v2.3.0 to [v2.4.2](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.4.2)
## v1.23.4
* Kubernetes [v1.23.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1234)
* Update etcd from v3.5.1 to [v3.5.2](https://github.com/etcd-io/etcd/releases/tag/v3.5.2)
* Change default CNI `networking` provider from `calico` to `cilium` ([#1114](https://github.com/poseidon/typhoon/pull/1114))
### AWS
* Allow upgrading AWS Terraform Provider to v4.x
### Addons
* Align nginx-ingress `--controller-class` with `IngressClass`
* Watch only `public` IngressClass objects, better [example](https://kubernetes.github.io/ingress-nginx/user-guide/multiple-ingress/)
* Update Prometheus from v2.32.1 to [v2.33.3](https://github.com/prometheus/prometheus/releases/tag/v2.33.3)
* Update Grafana from v8.3.6 to [v8.4.1](https://github.com/grafana/grafana/releases/tag/v8.4.1)
## V1.23.3
* Kubernetes [v1.23.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1233)
### Flatcar Linux
#### Google Cloud
* Switch to using official Kinvolk Flatcar Linux images
* Promote Typhoon on Flatcar Linux / Google Cloud to stable
* Change `os_image` to `flatcar-stable`, `flatcar-beta`, or `flatcar-alpha` (**action required**)
## v1.23.2
* Kubernetes [v1.23.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1232)
* Update Cilium from v1.11.0 to [v1.11.1](https://github.com/cilium/cilium/releases/tag/v1.11.1)
* Remove Kubelet flag `--network-plugin`. Unused since `docker-shim` isn't used ([#1106](https://github.com/poseidon/typhoon/pull/1106))
### Fedora CoreOS
* Switch Kubernetes Container Runtime from `docker` to `containerd` ([#1101](https://github.com/poseidon/typhoon/pull/1101))
* Mask `docker.service` to prevent it from being socket activated ([#1105](https://github.com/poseidon/typhoon/pull/1105))
### Flatcar Linux
#### AWS
* Add experimental Flatcar Linux ARM64 support ([docs](https://typhoon.psdn.io/advanced/arm64/), [#1102](https://github.com/poseidon/typhoon/pull/1102))
* Add `arch` variable to AWS `kubernetes` and `workers` modules
* Allow arm64 full-cluster or mixed/hybrid cluster with arm64 workers
* Requires `flannel` or `cilium` CNI provider
### DigitalOcean
* Upgrade DigitalOcean Terraform provider to [v2.x](https://registry.terraform.io/providers/digitalocean/digitalocean/latest/docs) ([#1109](https://github.com/poseidon/typhoon/pull/1109))
### Addons
* Update nginx-ingress from v1.1.0 to [v1.1.1](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.1.1)
* Update Grafana from v8.3.3 to [v8.3.4](https://github.com/grafana/grafana/releases/tag/v8.3.4)
## v1.23.1
* Kubernetes [v1.23.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1231)
* Workaround Terraform v1.1 regression in `file` provisioner ([#1093](https://github.com/poseidon/typhoon/pull/1093))
### Flatcar Linux
* Switch Kubernetes Container Runtime from `docker` to `containerd` ([#1087](https://github.com/poseidon/typhoon/pull/1087))
### Addons
* Configure Prometheus to allow a custom scrape query parameter ([#1095](https://github.com/poseidon/typhoon/pull/1095))
* Configure Prometheus to probe Kubernetes Ingress via `blackbox-exporter` ([#1096](https://github.com/poseidon/typhoon/pull/1096))
* Fix Prometheus Service probes to use `blackbox-exporter`, not `blackbox` ([#1096](https://github.com/poseidon/typhoon/pull/1096))
## v1.23.0
* Kubernetes [v1.23.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1230)
* Normalize CA cert mounts in static Pods and kube-proxy ([#1078](https://github.com/poseidon/typhoon/pull/1078))
* Set Kubelet resolver config to `/run/systemd/resolve/resolv.conf` ([#1082](https://github.com/poseidon/typhoon/pull/1082))
* Update Cilium from v1.10.5 to [v1.11.0](https://github.com/cilium/cilium/releases/tag/v1.11.0) ([#1083](https://github.com/poseidon/typhoon/pull/1083))
* With Calico, add missing `caliconodestatuses` CRD ([#289](https://github.com/poseidon/terraform-render-bootstrap/pull/289))
* Change `enable_aggregation` default to true ([#279](https://github.com/poseidon/terraform-render-bootstrap/pull/279))
* Remove deprecated `--port` from `kube-scheduler` ([#1078](https://github.com/poseidon/typhoon/pull/1078))
### AWS
* Change controller node default `disk_iops` to 3000 ([#1073](https://github.com/poseidon/typhoon/pull/1073))
### Azure
* Fix warning about deprecated `backend_address_pool_id` ([#1086](https://github.com/poseidon/typhoon/pull/1086))
### Fedora CoreOS
* Fix Fedora ARM64 workers to official Fedora CoreOS AMIs ([#1072](https://github.com/poseidon/typhoon/pull/1072))
* Should have been changed alongside controller AMIs in ([#1038](https://github.com/poseidon/typhoon/pull/1038))
* Old Poseidon built ARM64 AMIs have been deleted
### Addons
* Update nginx-ingress from v1.0.5 to [v1.1.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.1.0)
* Update Prometheus from v2.31.1 to [v2.32.0](https://github.com/prometheus/prometheus/releases/tag/v2.32.0)
* Update kube-state-metrics from v2.2.4 to [v2.3.0](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.3.0)
* Update node-exporter from v1.3.0 to [v1.3.1](https://github.com/prometheus/node_exporter/releases/tag/v1.3.1)
* Update Grafana from v8.2.4 to [v8.3.3](https://github.com/grafana/grafana/releases/tag/v8.3.3)
### Known Issues
* Calico does not yet support Kubernetes v1.23.0, use `flannel` or `cilium` ([calico#5011](https://github.com/projectcalico/calico/issues/5011))
## v1.22.4
* Kubernetes [v1.22.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1224)
* Update CoreDNS from v1.8.4 to [v1.8.6](https://github.com/poseidon/terraform-render-bootstrap/pull/284)
* Update Calico from v3.20.2 to [v3.21.0](https://github.com/projectcalico/calico/releases/tag/v3.21.0)
* Update flannel from v0.14.0 to [v0.15.1](https://github.com/flannel-io/flannel/releases/tag/v0.15.1)
### Google
* Allow use of Terraform provider `google` [v4.0+](https://github.com/hashicorp/terraform-provider-google/releases/tag/v4.0.0)
### Flatcar Linux
* Change Kubelet mounts for cgroups v2 ([#1064](https://github.com/poseidon/typhoon/pull/1064))
* Update cgroup driver from cgroupfs to systemd (Flatcar Linux changed default) ([#1064](https://github.com/poseidon/typhoon/pull/1064))
### Addons
* Update Prometheus from v2.30.3 to [v2.31.1](https://github.com/prometheus/prometheus/releases/tag/v2.31.1)
* Update node-exporter from v1.2.2 to [v1.3.0](https://github.com/prometheus/node_exporter/releases/tag/v1.3.0)
* Update kube-state-metrics from v2.2.3 to [v2.2.4](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.2.4)
* Update Grafana from v8.2.1 to [v8.2.4](https://github.com/grafana/grafana/releases/tag/v8.2.4)
* Update nginx-ingress from v1.0.4 to [v1.0.5](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.0.5)
## v1.23.3
* Kubernetes [v1.22.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1223)
* Update etcd from v3.5.0 to [v3.5.1](https://github.com/etcd-io/etcd/releases/tag/v3.5.1)
* Update Cilium from v1.10.4 to [v1.10.5](https://github.com/cilium/cilium/releases/tag/v1.10.5)
* Update Calico from v3.20.1 to [v3.20.2](https://github.com/projectcalico/calico/releases/tag/v3.20.2)
* Use Calico's iptables legacy vs nft auto-detection
* Update flannel from v0.13.0 to v0.14.0
### Bare-Metal
* Require Terraform provider `poseidon/matchbox` v0.5+ ([#1048](https://github.com/poseidon/typhoon/pull/1048))
### Addons
* Update nginx-ingress from v1.0.0 to [v1.0.4](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.0.4)
* Update Prometheus from v2.29.2 to [v2.30.3](https://github.com/prometheus/prometheus/releases/tag/v2.30.3)
* Update kube-state-metrics from v2.2.0 to [v2.2.3](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.2.3)
* Update Grafana from v8.1.2 to [v8.2.1](https://github.com/grafana/grafana/releases/tag/v8.2.1)
## v1.22.2
* Kubernetes [v1.22.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1222)
* Update Cilium from v1.10.3 to [v1.10.4](https://github.com/cilium/cilium/releases/tag/v1.10.4)
* Update Calico from v3.20.0 to [v3.20.1](https://github.com/projectcalico/calico/releases/tag/v3.20.1)
* Fix access to ClusterIP services with Cilium ([#276](https://github.com/poseidon/terraform-render-bootstrap/pull/276))
### Fedora CoreOS
* Use Fedora CoreOS ARM64 AMIs ([#1038](https://github.com/poseidon/typhoon/pull/1038))
### Addons
* Update Prometheus from v2.29.1 to [v2.29.2](https://github.com/prometheus/prometheus/releases/tag/v2.29.2)
* Update kube-state-metrics from v2.1.1 to [v2.2.0](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.2.0)
## v1.22.1
* Kubernetes [v1.22.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1221)
* Update Calico from v3.19.1 to [v3.20.0](https://github.com/projectcalico/calico/releases/tag/v3.20.0)
### Addons
* Update nginx-ingress from v1.0.0-beta.1 to [v1.0.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.0.0)
* Update Prometheus from v2.28.1 to [v2.29.1](https://github.com/prometheus/prometheus/releases/tag/v2.29.1)
* Update Grafana from v8.1.1 to [v8.1.2](https://github.com/grafana/grafana/releases/tag/v8.1.2)
## v1.22.0
* Kubernetes [v1.22.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1220)
* Update etcd from v3.4.16 to [v3.5.0](https://github.com/etcd-io/etcd/releases/tag/v3.5.0)
* Switch `kube-controller-manager` and `kube-scheduler` to use secure port only
* Update Prometheus config to discover endpoints and use a bearer token to scrape
### Fedora CoreOS
* Add Cilium cgroups v2 support on Fedora CoreOS
* Update Butane Config version from v1.2.0 to v1.4.0
* Rename Fedora CoreOS Config to Butane Config
* Require any [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customizations to update to v1.4.0
### Addons
* Update nginx-ingress from v0.47.0 to [v1.0.0-beta.1](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.0.0-beta.1)
* Update node-exporter from v1.2.0 to [v1.2.2](https://github.com/prometheus/node_exporter/releases/tag/v1.2.2)
* Update kube-state-metrics from v2.1.0 to [v2.1.1](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.1.1)
* Update Grafana from v8.0.6 to [v8.1.1](https://github.com/grafana/grafana/releases/tag/v8.1.1)
## v1.21.3
* Kubernetes [v1.21.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1213)
* Update Cilium from v1.10.1 to [v1.10.3](https://github.com/cilium/cilium/releases/tag/v1.10.3)
* Require [poseidon/ct](https://github.com/poseidon/terraform-provider-ct) Terraform provider v0.9+ ([notes](https://typhoon.psdn.io/topics/maintenance/#upgrade-terraform-provider-ct))
### AWS
* Change default disk type from `gp2` to `gp3` ([#1012](https://github.com/poseidon/typhoon/pull/1012))
### Addons
* Update Prometheus from v2.28.0 to [v2.28.1](https://github.com/prometheus/prometheus/releases/tag/v2.28.1)
* Update node-exporter from v1.1.2 to [v1.2.0](https://github.com/prometheus/node_exporter/releases/tag/v1.2.0)
* Update Grafana from v8.0.3 to [v8.0.6](https://github.com/grafana/grafana/releases/tag/v8.0.6)
### Known Issues
* Cilium with recent Fedora CoreOS will have networking issues ([fedora-coreos#881](https://github.com/coreos/fedora-coreos-tracker/issues/881)) (fixed in v1.21.4)
## v1.21.2
* Kubernetes [v1.21.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1212)
* Add Terraform v1.0.x support ([#974](https://github.com/poseidon/typhoon/pull/974))
* Continue to support Terraform v0.13.x, v0.14.4+, and v0.15.x
* Update CoreDNS from v1.8.0 to [v1.8.4]([#1006](https://github.com/poseidon/typhoon/pull/1006))
* Update Cilium from v1.9.6 to [v1.10.1](https://github.com/cilium/cilium/releases/tag/v1.10.1)
* Update Calico from v3.19.0 to [v3.19.1](https://github.com/projectcalico/calico/releases/tag/v3.19.1)
### Addons
* Update kube-state-metrics from v2.0.0 to [v2.1.0](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.1.0)
* Update Prometheus from v2.27.0 to [v2.28.0](https://github.com/prometheus/prometheus/releases/tag/v2.28.0)
* Update Grafana from v7.5.6 to [v8.0.3](https://github.com/grafana/grafana/releases/tag/v8.0.3)
* Update nginx-ingress from v0.46.0 to [v0.47.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.47.0)
### Fedora CoreOS
#### AWS
* Extend experimental Fedora CoreOS arm64 support with Cilium
* CNI provider may now be `flannel` or `cilium` (new)
#### Bare-Metal
* Workaround systemd path unit issue [fedora-coreos-tracker/#861](https://github.com/coreos/fedora-coreos-tracker/issues/861)
#### DigitalOcean
* Workaround systemd path unit issue [fedora-coreos-tracker/#861](https://github.com/coreos/fedora-coreos-tracker/issues/861)
### Known Issues
* Cilium with recent Fedora CoreOS will have networking issues ([fedora-coreos#881](https://github.com/coreos/fedora-coreos-tracker/issues/881)) (fixed in v1.21.4)
## v1.21.1
* Kubernetes [v1.21.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1211)
* Add Terraform v0.15.x support ([#974](https://github.com/poseidon/typhoon/pull/974))
* Continue to support Terraform v0.13.x and v0.14.4+
* Update etcd from v3.4.15 to [v3.4.16](https://github.com/etcd-io/etcd/releases/tag/v3.4.16)
* Update Cilium from v1.9.5 to [v1.9.6](https://github.com/cilium/cilium/releases/tag/v1.9.6)
* Update Calico from v3.18.1 to [v3.19.0](https://github.com/projectcalico/calico/releases/tag/v3.19.0)
### AWS
* Reduce the default `disk_size` from 40GB to 30GB ([#983](https://github.com/poseidon/typhoon/pull/983))
### Azure
* Reduce the default `disk_size` from 40GB to 30GB ([#983](https://github.com/poseidon/typhoon/pull/983))
### Google Cloud
* Reduce the default `disk_size` from 40GB to 30GB ([#983](https://github.com/poseidon/typhoon/pull/983))
### Fedora CoreOS
* Update Kubelet mounts for cgroups v2 ([#978](https://github.com/poseidon/typhoon/pull/978))
### Addons
* Update kube-state-metrics from v2.0.0-rc.1 to [v2.0.0](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0)
* Update Prometheus from v2.25.2 to [v2.27.0](https://github.com/prometheus/prometheus/releases/tag/v2.27.0)
* Update Grafana from v7.5.3 to [v7.5.6](https://github.com/grafana/grafana/releases/tag/v7.5.6)
* Update nginx-ingress from v0.45.0 to [v0.46.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.46.0)
## v1.21.0
* Kubernetes [v1.21.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1210)
* Enable `tokencleaner` controller ([#969](https://github.com/poseidon/typhoon/pull/969))
* Enable `kube-scheduler` and `kube-controller-manager` separate authn/z kubeconfig
* Change CNI config location from /etc/kubernetes/cni/net.d to /etc/cni/net.d ([#965](https://github.com/poseidon/typhoon/pull/965))
* Change `kube-controller-manager` to mount `/var/lib/kubelet/volumeplugins` directly
* Remove unused `cloud-provider` flags
* Update Fedora CoreOS Config version from v1.1.0 to v1.2.0 ([#970](https://github.com/poseidon/typhoon/pull/970))
* Require [poseidon/ct](https://github.com/poseidon/terraform-provider-ct) Terraform provider v0.8+ ([notes](https://typhoon.psdn.io/topics/maintenance/#upgrade-terraform-provider-ct))
* Require any [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customizations to update to v1.2.0
### AWS
* Allow setting custom initial node taints on worker pools ([#968](https://github.com/poseidon/typhoon/pull/968))
* Add `node_taints` variable to internal `workers` pool module to set initial node taints
* Add `daemonset_tolerations` so `kube-system` DaemonSets can tolerate custom taints
### Azure
* Allow setting custom initial node taints on worker pools ([#968](https://github.com/poseidon/typhoon/pull/968))
* Add `node_taints` variable to internal `workers` pool module to set initial node taints
* Add `daemonset_tolerations` so `kube-system` DaemonSets can tolerate custom taints
* Remove deprecated `azurerm_lb_backend_address_pool` field `resource_group_name` ([#972](https://github.com/poseidon/typhoon/pull/972))
### Google Cloud
* Allow setting custom initial node taints on worker pools ([#968](https://github.com/poseidon/typhoon/pull/968))
* Add `node_taints` variable to internal `workers` pool module to set initial node taints
* Add `daemonset_tolerations` so `kube-system` DaemonSets can tolerate custom taints
### Addons
* Update nginx-ingress from v0.44.0 to [v0.45.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.45.0)
* Update kube-state-metrics from v2.0.0-rc.0 to [v2.0.0-rc.1](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-rc.1)
* Update Grafana from v7.4.5 to [v7.5.3](https://github.com/grafana/grafana/releases/tag/v7.5.3)
## v1.20.5
* Kubernetes [v1.20.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1205)
* Update etcd from v3.4.14 to [v3.4.15](https://github.com/etcd-io/etcd/releases/tag/v3.4.15)
* Update Cilium from v1.9.4 to [v1.9.5](https://github.com/cilium/cilium/releases/tag/v1.9.5)
* Update Calico from v3.17.3 to [v3.18.1](https://github.com/projectcalico/calico/releases/tag/v3.18.1)
* Update CoreDNS from v1.7.0 to [v1.8.0](https://coredns.io/2020/10/22/coredns-1.8.0-release/)
* Mark bootstrap token as sensitive in Terraform plans ([#949](https://github.com/poseidon/typhoon/pull/949))
### Fedora CoreOS
* Set Kubelet `provider-id` ([#951](https://github.com/poseidon/typhoon/pull/951))
### Flatcar Linux
#### AWS
* Set Kubelet `provider-id` ([#951](https://github.com/poseidon/typhoon/pull/951))
* Remove `os_image` option `flatcar-edge` ([#943](https://github.com/poseidon/typhoon/pull/943))
#### Azure
* Remove `os_image` option `flatcar-edge` ([#943](https://github.com/poseidon/typhoon/pull/943))
#### Bare-Metal
* Remove `os_channel` option `flatcar-edge` ([#943](https://github.com/poseidon/typhoon/pull/943))
### Addons
* Update Prometheus from v2.25.0 to [v2.25.2](https://github.com/prometheus/prometheus/releases/tag/v2.25.2)
* Update kube-state-metrics from v2.0.0-alpha.3 to [v2.0.0-rc.0](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-rc.0)
* Switch image from `quay.io` to `k8s.gcr.io` ([#946](https://github.com/poseidon/typhoon/pull/946))
* Update node-exporter from v1.1.1 to [v1.1.2](https://github.com/prometheus/node_exporter/releases/tag/v1.1.2)
* Update Grafana from v7.4.2 to [v7.4.5](https://github.com/grafana/grafana/releases/tag/v7.4.5)
## v1.20.4
* Kubernetes [v1.20.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1204)
* Update Cilium from v1.9.1 to [v1.9.4](https://github.com/cilium/cilium/releases/tag/v1.9.4)
* Update Calico from v3.17.1 to [v3.17.3](https://github.com/projectcalico/calico/releases/tag/v3.17.3)
* Update flannel-cni from v0.4.1 to [v0.4.2](https://github.com/poseidon/flannel-cni/releases/tag/v0.4.2)
### Addons
* Update nginx-ingress from v0.43.0 to [v0.44.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.44.0)
* Update Prometheus from v2.24.0 to [v2.25.0](https://github.com/prometheus/prometheus/releases/tag/v2.25.0)
* Update node-exporter from v1.0.1 to [v1.1.1](https://github.com/prometheus/node_exporter/releases/tag/v1.1.1)
* Update Grafana from v7.3.7 to [v7.4.2](https://github.com/grafana/grafana/releases/tag/v7.4.2)
## v1.20.2
* Kubernetes [v1.20.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1202)
* Support Terraform v0.13.x and v0.14.4+ ([#924](https://github.com/poseidon/typhoon/pull/923))
### Addons
* Update nginx-ingress from v0.41.2 to [v0.43.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.43.0)
* Update Prometheus from v2.23.0 to [v2.24.0](https://github.com/prometheus/prometheus/releases/tag/v2.24.0)
* Update Grafana from v7.3.6 to [v7.3.7](https://github.com/grafana/grafana/releases/tag/v7.3.7)
## v1.20.1
* Kubernetes [v1.20.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1201)
### Fedora CoreOS
* Fedora CoreOS 33 has stronger crypto defaults ([**notice**](https://docs.fedoraproject.org/en-US/fedora-coreos/faq/#_why_does_ssh_stop_working_after_upgrading_to_fedora_33), [#915](https://github.com/poseidon/typhoon/issues/915))
* Use a non-RSA SSH key or add the workaround provided in upstream [Fedora docs](https://docs.fedoraproject.org/en-US/fedora-coreos/faq/#_why_does_ssh_stop_working_after_upgrading_to_fedora_33) as a [snippet](https://typhoon.psdn.io/advanced/customization/#fedora-coreos) (**action required**)
### Addons
* Update Grafana from v7.3.5 to [v7.3.6](https://github.com/grafana/grafana/releases/tag/v7.3.6)
## v1.20.0
* Kubernetes [v1.20.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1200)
* Add input variable validations ([#880](https://github.com/poseidon/typhoon/pull/880))
* Require Terraform v0.13+ ([migration guide](https://typhoon.psdn.io/topics/maintenance/#terraform-versions))
* Set output sensitive to suppress console display for some cases ([#885](https://github.com/poseidon/typhoon/pull/885))
* Add service account token [volume projection](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection) ([#897](https://github.com/poseidon/typhoon/pull/897))
* Scope kube-scheduler and kube-controller-manager permissions ([#898](https://github.com/poseidon/typhoon/pull/898))
* Update etcd from v3.4.12 to [v3.4.14](https://github.com/etcd-io/etcd/releases/tag/v3.4.14)
* Update Calico from v3.16.5 to v3.17.1 ([#890](https://github.com/poseidon/typhoon/pull/890))
* Enable Calico MTU auto-detection
* Remove [workaround](https://github.com/poseidon/typhoon/pull/724) to Calico cni-plugin [issue](https://github.com/projectcalico/cni-plugin/issues/874)
* Update Cilium from v1.9.0 to [v1.9.1](https://github.com/cilium/cilium/releases/tag/v1.9.1)
* Relax `terraform-provider-ct` version constraint to v0.6+ ([#893](https://github.com/poseidon/typhoon/pull/893))
* Allow upgrading `terraform-provider-ct` to v0.7.x ([warn](https://typhoon.psdn.io/topics/maintenance/#upgrade-terraform-provider-ct))
### AWS
* Enable Network Load Balancer (NLB) dualstack ([#883](https://github.com/poseidon/typhoon/pull/883))
* NLB subnets assigned both IPv4 and IPv6 addresses
* NLB DNS name has both A and AAAA records
* NLB to target node traffic is IPv4 (no change)
### Bare-Metal
* Remove iSCSI `/etc/iscsi` and `iscsadm` mounts from Kubelet ([#912](https://github.com/poseidon/typhoon/pull/912))
### Fedora CoreOS
#### AWS
* Fix AMI query for which could fail in some regions ([#887](https://github.com/poseidon/typhoon/pull/887))
#### Bare-Metal
* Promote Fedora CoreOS to stable
* Use initramfs and rootfs images as initrd's ([#889](https://github.com/poseidon/typhoon/pull/889))
* Requires Fedora CoreOS version with rootfs images (e.g. 32.20200923.3.0+)
### Addons
* Update Prometheus from v2.22.2 to [v2.23.0](https://github.com/prometheus/prometheus/releases/tag/v2.23.0)
* Update kube-state-metrics from v2.0.0-alpha.2 to [v2.0.0-alpha.3](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.3)
* Update Grafana from v7.3.2 to [v7.3.5](https://github.com/grafana/grafana/releases/tag/v7.3.5)
## v1.19.4
* Kubernetes [v1.19.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1194)
* Update Cilium from v1.8.4 to [v1.9.0](https://github.com/cilium/cilium/releases/tag/v1.9.0)
* Update Calico from v3.16.3 to [v3.16.5](https://github.com/projectcalico/calico/releases/tag/v3.16.5)
* Remove `asset_dir` variable (defaulted off in [v1.17.0](https://github.com/poseidon/typhoon/pull/595), deprecated in [v1.18.0](https://github.com/poseidon/typhoon/pull/678))
### Fedora CoreOS
* Improve `etcd-member.service` systemd unit ([#868](https://github.com/poseidon/typhoon/pull/868))
* Allow a snippet with a systemd dropin to set an alternate image (e.g. mirror)
* Fix local node delete oneshot on node shutdown ([#856](https://github.com/poseidon/typhoon/pull/855))
#### AWS
* Add experimental Fedora CoreOS arm64 support ([docs](https://typhoon.psdn.io/advanced/arm64/), [#875](https://github.com/poseidon/typhoon/pull/875))
* Allow arm64 full-cluster or mixed/hybrid cluster with worker pools
* Add `arch` variable to cluster module
* Add `daemonset_tolerations` variable to cluster module
* Add `node_taints` variable to workers module
* Requires flannel CNI provider and use of experimental AMI (see docs)
### Flatcar Linux
* Rename `container-linux` modules to `flatcar-linux` ([#858](https://github.com/poseidon/typhoon/issues/858)) (**action required**)
* Change on-host system containers from rkt to docker
* Change `etcd-member.service` container runnner from rkt to docker ([#867](https://github.com/poseidon/typhoon/pull/867))
* Change `kubelet.service` container runner from rkt-fly to docker ([#855](https://github.com/poseidon/typhoon/pull/855))
* Change `bootstrap.service` container runner from rkt to docker ([#873](https://github.com/poseidon/typhoon/pull/873))
* Change `delete-node.service` to use docker and an inline ExecStart ([#855](https://github.com/poseidon/typhoon/pull/855))
* Fix local node delete oneshot on node shutdown ([#855](https://github.com/poseidon/typhoon/pull/855))
* Remove CoreOS Container Linux Matchbox profiles ([#859](https://github.com/poseidon/typhoon/pull/858))
### Addons
* Update nginx-ingress from v0.40.2 to [v0.41.2](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.2)
* Update Prometheus from v2.22.0 to [v2.22.1](https://github.com/prometheus/prometheus/releases/tag/v2.22.1)
* Update kube-state-metrics from v2.0.0-alpha.1 to [v2.0.0-alpha.2](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.2)
* Update Grafana from v7.2.1 to [v7.3.2](https://github.com/grafana/grafana/releases/tag/v7.3.2)
## v1.19.3
* Kubernetes [v1.19.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193)
* Update Cilium from v1.8.3 to [v1.8.4](https://github.com/cilium/cilium/releases/tag/v1.8.4)
* Update Calico from v1.15.3 to [v1.16.3](https://github.com/projectcalico/calico/releases/tag/v3.16.3) ([#851](https://github.com/poseidon/typhoon/pull/851))
* Update flannel from v0.13.0-rc2 to v0.13.0 ([#219](https://github.com/poseidon/terraform-render-bootstrap/pull/219))
### Flatcar Linux
* Remove references to CoreOS Container Linux ([#839](https://github.com/poseidon/typhoon/pull/839))
* Fix error querying for coreos AMI on AWS ([#838](https://github.com/poseidon/typhoon/issues/838))
### Addons
* Update nginx-ingress from v0.35.0 to [v0.40.2](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.2)
* Update Grafana from v7.1.5 to [v7.2.1](https://github.com/grafana/grafana/releases/tag/v7.2.1)
* Update Prometheus from v2.21.0 to [v2.22.0](https://github.com/prometheus/prometheus/releases/tag/v2.22.0)
* Update kube-state-metrics from v1.9.7 to [v2.0.0-alpha.1](https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.1)
## v1.19.2
* Kubernetes [v1.19.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1192)
* Update flannel from v0.12.0 to v0.13.0-rc2 ([#216](https://github.com/poseidon/terraform-render-bootstrap/pull/216))
* Update flannel-cni from v0.4.0 to v0.4.1
* Update CNI plugins from v0.8.6 to v0.8.7
### Addons
* Refresh Prometheus rules/alerts and Grafana dashboards ([#831](https://github.com/poseidon/typhoon/pull/831))
* Reduce apiserver metrics cardinality for non-core APIs ([#830](https://github.com/poseidon/typhoon/pull/830))
## v1.19.1
* Kubernetes [v1.19.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191)
* Change control plane seccomp annotations to GA `seccompProfile` ([#822](https://github.com/poseidon/typhoon/pull/822))
* Update Cilium from v1.8.2 to [v1.8.3](https://github.com/cilium/cilium/releases/tag/v1.8.3)
* Promote Cilium from experimental to general availability ([#827](https://github.com/poseidon/typhoon/pull/827))
* Update Calico from v1.15.2 to [v1.15.3](https://github.com/projectcalico/calico/releases/tag/v3.15.3)
### Fedora CoreOS
* Update Fedora CoreOS Config version from v1.0.0 to v1.1.0
* Require any [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customizations to update to v1.1.0
### Addons
* Update IngressClass resources to `networking.k8s.io/v1` ([#824](https://github.com/poseidon/typhoon/pull/824))
* Update Prometheus from v2.20.0 to [v2.21.0](https://github.com/prometheus/prometheus/releases/tag/v2.21.0)
* Remove Kubernetes node name labelmap `relabel_config` from etcd, Kubelet, and CAdvisor scrape config ([#828](https://github.com/poseidon/typhoon/pull/828))
## v1.19.0
* Kubernetes [v1.19.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1190)
* Update etcd from v3.4.10 to [v3.4.12](https://github.com/etcd-io/etcd/releases/tag/v3.4.12)
* Update Calico from v3.15.1 to [v3.15.2](https://docs.projectcalico.org/v3.15/release-notes/)
### Fedora CoreOS
* Fix race condition during bootstrap of multi-controller clusters ([#808](https://github.com/poseidon/typhoon/pull/808))
* Fix SELinux label of bootstrap-secrets on non-bootstrap controllers
### Addons
* Introduce [fleetlock](https://github.com/poseidon/fleetlock) for Fedora CoreOS reboot coordination ([#814](https://github.com/poseidon/typhoon/pull/814))
* Update nginx-ingress from v0.34.1 to [v0.35.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.35.0)
* Repository changed to `k8s.gcr.io/ingress-nginx/controller`
* Update Grafana from v7.1.3 to [v7.1.5](https://github.com/grafana/grafana/releases/tag/v7.1.5)
## v1.18.8
* Kubernetes [v1.18.8](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1188)
* Migrate from Terraform v0.12.x to v0.13.x ([#804](https://github.com/poseidon/typhoon/pull/804)) (**action required**)
* Recommend Terraform v0.13.x ([migration guide](https://typhoon.psdn.io/topics/maintenance/#terraform-versions))
* Support automatic install of poseidon's provider plugins ([poseidon/ct](https://registry.terraform.io/providers/poseidon/ct/latest), [poseidon/matchbox](https://registry.terraform.io/providers/poseidon/matchbox/latest))
* Require Terraform v0.12.26+ (migration compatibility)
* Require `terraform-provider-ct` v0.6.1
* Require `terraform-provider-matchbox` v0.4.1
* Update etcd from v3.4.9 to [v3.4.10](https://github.com/etcd-io/etcd/releases/tag/v3.4.10)
* Update CoreDNS from v1.6.7 to [v1.7.0](https://coredns.io/2020/06/15/coredns-1.7.0-release/)
* Update Cilium from v1.8.1 to [v1.8.2](https://github.com/cilium/cilium/releases/tag/v1.8.2)
* Update [coreos/flannel-cni](https://github.com/coreos/flannel-cni) to [poseidon/flannel-cni](https://github.com/poseidon/flannel-cni) ([#798](https://github.com/poseidon/typhoon/pull/798))
* Update CNI plugins and fix CVEs with Flannel CNI (non-default)
* Transition to a poseidon maintained container image
### AWS
* Allow `terraform-provider-aws` v3.0+ ([#803](https://github.com/poseidon/typhoon/pull/803))
* Recommend updating `terraform-provider-aws` to v3.0+
* Continue to allow v2.23+, no v3.x specific features are used
### DigitalOcean
* Require `terraform-provider-digitalocean` v1.21+ for Terraform v0.13.x (unenforced)
* Require `terraform-provider-digitalocean` v1.20+ for Terraform v0.12.x
### Fedora CoreOS
* Fix support for Flannel with Fedora CoreOS ([#795](https://github.com/poseidon/typhoon/pull/795))
* Configure `flannel.1` link to select its own MAC address to solve flannel
pod-to-pod traffic drops starting with default link changes in Fedora CoreOS
32.20200629.3.0 ([details](https://github.com/coreos/fedora-coreos-tracker/issues/574#issuecomment-665487296))
#### Addons
* Update Prometheus from v2.19.2 to [v2.20.0](https://github.com/prometheus/prometheus/releases/tag/v2.20.0)
* Update Grafana from v7.0.6 to [v7.1.3](https://github.com/grafana/grafana/releases/tag/v7.1.3)
## v1.18.6
* Kubernetes [v1.18.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1186)
* Update Calico from v3.15.0 to [v3.15.1](https://docs.projectcalico.org/v3.15/release-notes/)
* Update Cilium from v1.8.0 to [v1.8.1](https://github.com/cilium/cilium/releases/tag/v1.8.1)
#### Addons
* Update nginx-ingress from v0.33.0 to [v0.34.1](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.34.1)
* [ingress-nginx](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.0) will publish images only to gcr.io
* Update Prometheus from v2.19.1 to [v2.19.2](https://github.com/prometheus/prometheus/releases/tag/v2.19.2)
* Update Grafana from v7.0.4 to [v7.0.6](https://github.com/grafana/grafana/releases/tag/v7.0.6)
## v1.18.5
* Kubernetes [v1.18.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1185)
* Add Cilium v1.8.0 as a (experimental) CNI provider option ([#760](https://github.com/poseidon/typhoon/pull/760))
* Set `networking` to "cilium" to enable
* Update Calico from v3.14.1 to [v3.15.0](https://docs.projectcalico.org/v3.15/release-notes/)
#### DigitalOcean
* Isolate each cluster in an independent DigitalOcean VPC ([#776](https://github.com/poseidon/typhoon/pull/776))
* Create droplets in a VPC per cluster (matches Typhoon AWS, Azure, and GCP)
* Require `terraform-provider-digitalocean` v1.16.0+ (action required)
* Output `vpc_id` for use with an attached DigitalOcean [loadbalancer](https://github.com/poseidon/typhoon/blob/v1.18.5/docs/architecture/digitalocean.md#custom-load-balancer)
### Fedora CoreOS
#### Google Cloud
* Promote Fedora CoreOS to stable
* Remove `os_image` variable deprecated in v1.18.3 ([#777](https://github.com/poseidon/typhoon/pull/777))
* Use `os_stream` to select a Fedora CoreOS image stream
### Flatcar Linux
#### Azure
* Allow using Flatcar Linux Edge by setting `os_image` to "flatcar-edge" ([#778](https://github.com/poseidon/typhoon/pull/778))
#### Addons
* Update Prometheus from v2.19.0 to [v2.19.1](https://github.com/prometheus/prometheus/releases/tag/v2.19.1)
* Update Grafana from v7.0.3 to [v7.0.4](https://github.com/grafana/grafana/releases/tag/v7.0.4)
## v1.18.4
* Kubernetes [v1.18.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1184)
* Update Kubelet image publishing ([#749](https://github.com/poseidon/typhoon/pull/749))
* Build Kubelet images internally and publish to Quay and Dockerhub
* [quay.io/poseidon/kubelet](https://quay.io/repository/poseidon/kubelet) (official)
* [docker.io/psdn/kubelet](https://hub.docker.com/r/psdn/kubelet) (fallback)
* Continue offering automated image builds with an alternate tag strategy (see [docs](https://typhoon.psdn.io/topics/security/#container-images))
* [Document](https://typhoon.psdn.io/advanced/customization/#kubelet) use of alternate Kubelet images during registry incidents
* Update Calico from v3.14.0 to [v3.14.1](https://docs.projectcalico.org/v3.14/release-notes/)
* Fix [CVE-2020-13597](https://github.com/kubernetes/kubernetes/issues/91507)
* Rename controller NoSchedule taint from `node-role.kubernetes.io/master` to `node-role.kubernetes.io/controller` ([#764](https://github.com/poseidon/typhoon/pull/764))
* Tolerate the new taint name for workloads that may run on controller nodes
* Remove node label `node.kubernetes.io/master` from controller nodes ([#764](https://github.com/poseidon/typhoon/pull/764))
* Use `node.kubernetes.io/controller` (present since v1.9.5, [#160](https://github.com/poseidon/typhoon/pull/160)) to node select controllers
* Remove unused Kubelet `-lock-file` and `-exit-on-lock-contention` ([#758](https://github.com/poseidon/typhoon/pull/758))
### Fedora CoreOS
#### Azure
* Use `strict` Fedora CoreOS Config (FCC) snippet parsing ([#755](https://github.com/poseidon/typhoon/pull/755))
* Reduce Calico vxlan interface MTU to maintain performance ([#767](https://github.com/poseidon/typhoon/pull/766))
#### AWS
* Fix Kubelet service race with hostname update ([#766](https://github.com/poseidon/typhoon/pull/766))
* Wait for a hostname to avoid Kubelet trying to register as `localhost`
### Flatcar Linux
* Use `strict` Container Linux Config (CLC) snippet parsing ([#755](https://github.com/poseidon/typhoon/pull/755))
* Require `terraform-provider-ct` v0.4+, recommend v0.5+ (**action required**)
### Addons
* Update nginx-ingress from v0.32.0 to [v0.33.0](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.33.0)
* Update Prometheus from v2.18.1 to [v2.19.0](https://github.com/prometheus/prometheus/releases/tag/v2.19.0)
* Update node-exporter from v1.0.0-rc.1 to [v1.0.1](https://github.com/prometheus/node_exporter/releases/tag/v1.0.1)
* Update kube-state-metrics from v1.9.6 to v1.9.7
* Update Grafana from v7.0.0 to v7.0.3
## v1.18.3
* Kubernetes [v1.18.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1183)
* Use Kubelet [TLS bootstrap](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/) with bootstrap token authentication ([#713](https://github.com/poseidon/typhoon/pull/713))
* Enable Node [Authorization](https://kubernetes.io/docs/reference/access-authn-authz/node/) and [NodeRestriction](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#noderestriction) to reduce authorization scope
* Renew Kubelet certificates every 72 hours
* Update etcd from v3.4.7 to [v3.4.9](https://github.com/etcd-io/etcd/releases/tag/v3.4.9)
* Update Calico from v3.13.1 to [v3.14.0](https://docs.projectcalico.org/v3.14/release-notes/)
* Add CoreDNS node affinity preference for controller nodes ([#188](https://github.com/poseidon/terraform-render-bootstrap/pull/188))
* Deprecate CoreOS Container Linux support (no OS [updates](https://coreos.com/os/eol/) after May 2020)
* Use a `fedora-coreos` module for Fedora CoreOS
* Use a `container-linux` module for Flatcar Linux
### AWS
* Fix Terraform plan error when `controller_count` exceeds AWS zones (e.g. 5 controllers) ([#714](https://github.com/poseidon/typhoon/pull/714))
* Regressed in v1.17.1 ([#605](https://github.com/poseidon/typhoon/pull/605))
### Azure
* Update Azure subnets to set `address_prefixes` list ([#730](https://github.com/poseidon/typhoon/pull/730))
* Fix warning that `address_prefix` is deprecated
* Require `terraform-provider-azurerm` v2.8.0+ (action required)
### DigitalOcean
* Promote DigitalOcean to beta on both Fedora CoreOS and Flatcar Linux
### Fedora CoreOS
* Fix Calico `install-cni` crashloop on Pod restarts ([#724](https://github.com/poseidon/typhoon/pull/724))
* SELinux enforcement requires consistent file context MCS level
* Restarting a node resolved the issue as a previous workaround
#### AWS
* Support Fedora CoreOS [image streams](https://docs.fedoraproject.org/en-US/fedora-coreos/update-streams/) ([#727](https://github.com/poseidon/typhoon/pull/727))
* Add `os_stream` variable to set the stream to `stable` (default), `testing`, or `next`
* Remove unused `os_image` variable
#### Google
* Support Fedora CoreOS [image streams](https://docs.fedoraproject.org/en-US/fedora-coreos/update-streams/) ([#723](https://github.com/poseidon/typhoon/pull/723))
* Add `os_stream` variable to set the stream to `stable` (default), `testing`, or `next`
* Deprecate `os_image` variable. Manual image uploads are no longer needed
### Flatcar Linux
#### Azure
* Use the Flatcar Linux Azure Marketplace image
* Restore [#664](https://github.com/poseidon/typhoon/pull/664) (reverted in [#707](https://github.com/poseidon/typhoon/pull/707)) but use Flatcar Linux new free offer (not byol)
* Change `os_image` to use a `flatcar-stable` default
#### Google
* Promote Flatcar Linux to beta
### Addons
* Update nginx-ingress from v0.30.0 to [v0.32.0](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.32.0)
* Add support for [IngressClass](https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class)
* Update Prometheus from v2.17.1 to v2.18.1
* Update kube-state-metrics from v1.9.5 to [v1.9.6](https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.6)
* Update node-exporter from v1.0.0-rc.0 to [v1.0.0-rc.1](https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.1)
* Update Grafana from v6.7.2 to [v7.0.0](https://grafana.com/docs/grafana/latest/guides/whats-new-in-v7-0/)
## v1.18.2
* Kubernetes [v1.18.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1182)
* Choose Fedora CoreOS or Flatcar Linux (**action required**)
* Use a `fedora-coreos` module for Fedora CoreOS
* Use a `container-linux` module for Flatcar Linux
* Change Container Linux modules' defaults from CoreOS Container Linux to [Flatcar Container Linux](https://typhoon.psdn.io/architecture/operating-systems/) ([#702](https://github.com/poseidon/typhoon/pull/702))
* CoreOS Container Linux [won't receive updates](https://coreos.com/os/eol/) after May 2020
### Fedora CoreOS
* Fix bootstrap race condition from SELinux unshared content label ([#708](https://github.com/poseidon/typhoon/pull/708))
#### Azure
* Add support for Fedora CoreOS ([#704](https://github.com/poseidon/typhoon/pull/704))
#### DigitalOcean
* Fix race condition creating firewall allow rules ([#709](https://github.com/poseidon/typhoon/pull/709))
### Flatcar Linux
#### AWS
* Change `os_image` default from `coreos-stable` to `flatcar-stable` ([#702](https://github.com/poseidon/typhoon/pull/702))
#### Azure
* Change `os_image` to be required. Recommend uploading a Flatcar Linux image (**action required**) ([#702](https://github.com/poseidon/typhoon/pull/702))
* Disable Flatcar Linux Azure Marketplace image [support](https://github.com/poseidon/typhoon/pull/664) (**breaking**, [#707](https://github.com/poseidon/typhoon/pull/707))
* Revert to manual uploading until marketplace issue is closed ([#703](https://github.com/poseidon/typhoon/issues/703))
#### Bare-Metal
* Recommend changing [os_channel](https://typhoon.psdn.io/cl/bare-metal/#required) from `coreos-stable` to `flatcar-stable`
#### Google
* Change `os_image` to be required. Recommend uploading a Flatcar Linux image (**action required**) ([#702](https://github.com/poseidon/typhoon/pull/702))
#### DigitalOcean
* Change `os_image` to be required. Recommend uploading a Flatcar Linux image (**action required**) ([#702](https://github.com/poseidon/typhoon/pull/702))
* Fix race condition creating firewall allow rules ([#709](https://github.com/poseidon/typhoon/pull/709))
## v1.18.1
* Kubernetes [v1.18.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1181)
* Choose Fedora CoreOS or Flatcar Linux (**action recommended**)
* Use a `fedora-coreos` module for Fedora CoreOS
* Use a `container-linux` module with OS set to Flatcar Linux
* Update etcd from v3.4.5 to [v3.4.7](https://github.com/etcd-io/etcd/releases/tag/v3.4.7)
* Change `kube-proxy` and `calico` or `flannel` to tolerate specific taints ([#682](https://github.com/poseidon/typhoon/pull/682))
* Tolerate master and not-ready taints, rather than tolerating all taints
* Update flannel from v0.11.0 to v0.12.0 ([#690](https://github.com/poseidon/typhoon/pull/690))
* Fix bootstrap when `networking` mode `flannel` (non-default) is chosen ([#689](https://github.com/poseidon/typhoon/pull/689))
* Regressed in v1.18.0 changes for Calico ([#675](https://github.com/poseidon/typhoon/pull/675))
* Rename Container Linux `controller_clc_snippets` to `controller_snippets` for consistency ([#688](https://github.com/poseidon/typhoon/pull/688))
* Rename Container Linux `worker_clc_snippets` to `worker_snippets` for consistency
* Rename Container Linux `clc_snippets` (bare-metal) to `snippets` for consistency
* Drop support for [gitRepo](https://kubernetes.io/docs/concepts/storage/volumes/#gitrepo) volumes ([kubelet#3](https://github.com/poseidon/kubelet/pull/3))
#### Azure
* Fix Azure worker UDP outbound connections ([#691](https://github.com/poseidon/typhoon/pull/691))
* Fix Azure worker clock sync timeouts
#### DigitalOcean
* Add support for Fedora CoreOS ([#699](https://github.com/poseidon/typhoon/pull/699))
#### Addons
* Refresh Prometheus rules/alerts and Grafana dashboards ([#692](https://github.com/poseidon/typhoon/pull/692))
* Update Grafana from v6.7.1 to v6.7.2
## v1.18.0
* Kubernetes [v1.18.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1180)
* Update etcd from v3.4.4 to [v3.4.5](https://github.com/etcd-io/etcd/releases/tag/v3.4.5)
* Switch from upstream hyperkube image to individual images ([#669](https://github.com/poseidon/typhoon/pull/669))
* Use upstream k8s.gcr.io `kube-apiserver`, `kube-controller-manager`, `kube-scheduler`, and `kube-proxy` container images
* Use [poseidon/kubelet](https://github.com/poseidon/kubelet) to package the upstream Kubelet binary and dependencies as a container image (checksummed, automated build)
* Add [quay.io/poseidon/kubelet](https://quay.io/repository/poseidon/kubelet) as a Typhoon distributed artifact in the security policy
* Update base images from debian 9 to debian 10
* Background: Kubernetes will [stop releasing](https://github.com/kubernetes/kubernetes/pull/88676) the hyperkube container image and provide the Kubelet as a binary for packaging
* Choose Fedora CoreOS or Flatcar Linux (**action recommended**)
* Use a `fedora-coreos` module for Fedora CoreOS
* Use a `container-linux` module with OS set for Flatcar Linux (varies, see docs)
* CoreOS Container Linux [won't receive updates](https://coreos.com/os/eol/) after May 2020
* Add support for Fedora CoreOS snippets (`terraform-provider-ct` v0.5+) ([#686](https://github.com/poseidon/typhoon/pull/686))
* Recommend updating `terraform-provider-ct` plugin from v0.4.0 to [v0.5.0](https://github.com/poseidon/terraform-provider-ct/releases/tag/v0.5.0)
* Set Fedora CoreOS log driver back to the default `journald` ([#681](https://github.com/poseidon/typhoon/pull/681))
* Deprecate `asset_dir` variable and remove docs ([#678](https://github.com/poseidon/typhoon/pull/678))
* Deprecate support for [gitRepo](https://kubernetes.io/docs/concepts/storage/volumes/#gitrepo) volumes. A future release will drop support.
#### AWS
* Fix Fedora CoreOS AMI to filter for stable images ([#685](https://github.com/poseidon/typhoon/pull/685))
* Latest Fedora CoreOS `testing` or `bodhi-update` images could be chosen depending on the region
#### Bare-Metal
* Update Fedora CoreOS default `os_stream` from testing to stable
#### Google Cloud
* Known: Use of stale Fedora CoreOS image may require terraform re-apply during bootstrap ([#687](https://github.com/poseidon/typhoon/pull/687))
#### DigitalOcean
* Rename `image` variable to `os_image` for consistency ([#677](https://github.com/poseidon/typhoon/pull/677)) (action required)
#### Addons
* Update Prometheus from v2.16.0 to [v2.17.1](https://github.com/prometheus/prometheus/releases/tag/v2.17.1)
* Update Grafana from v6.6.2 to [v6.7.1](https://github.com/grafana/grafana/releases/tag/v6.7.1)
## v1.17.4
* Kubernetes [v1.17.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1174)
* Update etcd from v3.4.3 to [v3.4.4](https://github.com/etcd-io/etcd/releases/tag/v3.4.4)
* On Container Linux, fetch using the docker transport format ([#659](https://github.com/poseidon/typhoon/pull/659))
* Update CoreDNS from v1.6.6 to v1.6.7 ([#648](https://github.com/poseidon/typhoon/pull/648))
* Update Calico from v3.12.0 to [v3.13.1](https://docs.projectcalico.org/v3.13/release-notes/)
#### AWS
* Promote Fedora CoreOS to stable ([#668](https://github.com/poseidon/typhoon/pull/668))
* Allow VPC route table extension via reference ([#654](https://github.com/poseidon/typhoon/pull/654))
* Fix `worker_node_labels` on Fedora CoreOS ([#651](https://github.com/poseidon/typhoon/pull/651))
* Fix automatic worker node delete on shutdown on Fedora CoreOS ([#657](https://github.com/poseidon/typhoon/pull/657))
#### Azure
* Upgrade to `terraform-provider-azurerm` [v2.0+](https://www.terraform.io/docs/providers/azurerm/guides/2.0-upgrade-guide.html) (action required)
* Change `worker_priority` from `Low` to `Spot` if used (action required)
* Switch to Azure's new Linux VM and Linux VM Scale Set resources
* Set controller's Azure disk caching to None
* Associate subnets (in addition to NICs) with security groups (aesthetic)
* Add support for Flatcar Container Linux ([#664](https://github.com/poseidon/typhoon/pull/664))
* Requires accepting Flatcar Linux Azure Marketplace terms
#### Bare-Metal
* Add `worker_node_labels` map variable for per-worker node labels ([#663](https://github.com/poseidon/typhoon/pull/663))
* Add `worker_node_taints` map variable for per-worker node taints ([#663](https://github.com/poseidon/typhoon/pull/663))
#### DigitalOcean
* Add support for Flatcar Container Linux ([#644](https://github.com/poseidon/typhoon/pull/644))
#### Google Cloud
* Promote Fedora CoreOS to beta ([#668](https://github.com/poseidon/typhoon/pull/668))
* Fix `worker_node_labels` on Fedora CoreOS ([#651](https://github.com/poseidon/typhoon/pull/651))
* Fix automatic worker node delete on shutdown on Fedora CoreOS ([#657](https://github.com/poseidon/typhoon/pull/657))
#### Addons
* Update nginx-ingress from v0.28.0 to [v0.30.0](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.30.0)
* Update Prometheus from v2.15.2 to [v2.16.0](https://github.com/prometheus/prometheus/releases/tag/v2.16.0)
* Refresh Prometheus rules and alerts
* Add a BlackboxProbeFailure alert
* Update kube-state-metrics from v1.9.4 to v1.9.5
* Update node-exporter from v0.18.1 to [v1.0.0-rc.0](https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.0)
* Update Grafana from v6.6.1 to v6.6.2
* Refresh Grafana dashboards
* Remove Container Linux Update Operator (CLUO) addon example ([#667](https://github.com/poseidon/typhoon/pull/667))
* CLUO hasn't been in active use in our clusters and won't be relevant
beyond Container Linux. Requires patches for use on Kubernetes v1.16+
## v1.17.3
* Kubernetes [v1.17.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1173)
* Update Calico from v3.11.2 to v3.12.0
* Allow Fedora CoreOS clusters to pass CNCF conformance suite
* Set Docker log driver to `json-file` as a workaround
* Try Fedora CoreOS or Flatcar Linux alongside CoreOS [Container Linux](https://coreos.com/os/eol/) clusters (recommended)
#### AWS
* Promote Fedora CoreOS to beta ([#645](https://github.com/poseidon/typhoon/pull/645))
#### Bare-Metal
* Promote Fedora CoreOS to beta ([#645](https://github.com/poseidon/typhoon/pull/645))
* Add Fedora CoreOS kernel arguments initrd and console ([#640](https://github.com/poseidon/typhoon/pull/640))
#### Google Cloud
* Add Terraform module for Fedora CoreOS ([#632](https://github.com/poseidon/typhoon/pull/632))
* Add support for Flatcar Container Linux ([#639](https://github.com/poseidon/typhoon/pull/639))
#### Addons
* Update nginx-ingress from v0.27.1 to v0.28.0
* Update kube-state-metrics from v1.9.3 to v1.9.4
* Update Grafana from v6.5.3 to v6.6.1
## v1.17.2
* Kubernetes [v1.17.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1172)
#### AWS
* Promote Fedora CoreOS from preview to alpha
#### Bare-Metal
* Promote Fedora CoreOS from preview to alpha
* Update Fedora CoreOS images location
* Use Fedora CoreOS production [download](https://getfedora.org/coreos/download/) streams
* Use live PXE kernel and initramfs images
#### Addons
* Update nginx-ingress from v0.26.1 to [v0.27.1](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.1) ([#625](https://github.com/poseidon/typhoon/pull/625))
* Change runAsUser from 33 to 101 for alpine-based image
* Update kube-state-metrics from v1.9.2 to v1.9.3
## v1.17.1
* Kubernetes [v1.17.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1171)
* Update CoreDNS from v1.6.5 to [v1.6.6](https://coredns.io/2019/12/11/coredns-1.6.6-release/) ([#602](https://github.com/poseidon/typhoon/pull/602))
* Update Calico from v3.10.2 to v3.11.2 ([#604](https://github.com/poseidon/typhoon/pull/604))
* Inline Kubelet service on Container Linux nodes ([#606](https://github.com/poseidon/typhoon/pull/606))
* Disable unused Kubelet `127.0.0.1:10248` healthz listener ([#607](https://github.com/poseidon/typhoon/pull/607))
* Enable kube-proxy metrics and allow Prometheus scrapes
* Allow TCP/10249 traffic with worker node sources
#### AWS
* Update Fedora CoreOS AMI filter for fedora-coreos-31 ([#620](https://github.com/poseidon/typhoon/pull/620))
#### Google
* Allow `terraform-provider-google` v3.0+ ([#617](https://github.com/poseidon/typhoon/pull/617))
* Only enforce `v2.19+` to ease migration, as no v3.x features are used
#### Addons
* Update Prometheus from v2.14.0 to [v2.15.2](https://github.com/prometheus/prometheus/releases/tag/v2.15.2)
* Add discovery for kube-proxy service endpoints
* Update kube-state-metrics from v1.8.0 to v1.9.2
* Reduce node-exporter DaemonSet tolerations ([#614](https://github.com/poseidon/typhoon/pull/614))
* Update Grafana from v6.5.1 to v6.5.3
## v1.17.0
* Kubernetes [v1.17.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1170)
* Manage clusters without using a local `asset_dir` ([#595](https://github.com/poseidon/typhoon/pull/595))
* Change `asset_dir` to be optional. Remove the variable to skip writing assets locally (**action recommended**)
* Allow keeping cluster assets only in Terraform state ([pluggable](https://www.terraform.io/docs/backends/types/remote.html), encryption) and allow `terraform apply` from stateless automation systems
* Improve asset unpacking on controllers
* Obtain kubeconfig from Terraform module outputs
* Replace usage of `template_dir` with `templatefile` function ([#587](https://github.com/poseidon/typhoon/pull/587))
* Require Terraform version v0.12.6+ (**action required**)
* Update CoreDNS from v1.6.2 to v1.6.5 ([#588](https://github.com/poseidon/typhoon/pull/588))
* Add health `lameduck` option to wait before shutdown
* Update Calico from v3.10.1 to v3.10.2 ([#599](https://github.com/poseidon/typhoon/pull/599))
* Reduce pod eviction timeout for deleting pods on unready nodes from 5m to 1m ([#597](https://github.com/poseidon/typhoon/pull/597))
* Present since [v1.13.3](#v1133), but mistakenly removed in v1.16.0
* Add CPU requests for control plane static pods ([#589](https://github.com/poseidon/typhoon/pull/589))
* May provide slight edge case benefits and aligns with upstream
#### Google
* Use new `google_compute_region_instance_group_manager` version block format
* Fixes warning that `instance_template` is deprecated
* Require `terraform-provider-google` v2.19.0+ (**action required**)
#### Addons
* Update Grafana from v6.4.4 to [v6.5.1](https://grafana.com/docs/guides/whats-new-in-v6-5/)
* Add pod networking details in dashboards ([#593](https://github.com/poseidon/typhoon/pull/593))
* Add node alerts and Grafana dashboard from node-exporter ([#591](https://github.com/poseidon/typhoon/pull/591))
* Reduce Prometheus high cardinality time series ([#596](https://github.com/poseidon/typhoon/pull/596))
## v1.16.3
* Kubernetes [v1.16.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163)
* Update etcd from v3.4.2 to v3.4.3 ([#582](https://github.com/poseidon/typhoon/pull/582))
* Upgrade Calico from v3.9.2 to [v3.10.1](https://docs.projectcalico.org/v3.10/release-notes/)
* Allow advertising service ClusterIPs to peer routers via a [BGPConfiguration](https://docs.projectcalico.org/v3.10/networking/advertise-service-ips)
* Switch `kube-proxy` from iptables to ipvs mode ([#574](https://github.com/poseidon/typhoon/pull/574))
#### Addons
* Update Prometheus from v2.13.0 to [v2.14.0](https://github.com/prometheus/prometheus/releases/tag/v2.14.0)
* Refresh rules, alerts, and dashboards from upstreams
* Remove addon-resizer from kube-state-metrics ([#575](https://github.com/poseidon/typhoon/pull/575))
* Update Grafana from v6.4.2 to v6.4.4
## v1.16.2
* Kubernetes [v1.16.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1162)
* Update etcd from v3.4.1 to v3.4.2 ([#570](https://github.com/poseidon/typhoon/pull/570))
* Update Calico from v3.9.1 to [v3.9.2](https://docs.projectcalico.org/v3.9/release-notes/)
* Default to using Calico and supporting NetworkPolicy on all platforms
#### Azure
* Change default networking provider from "flannel" to "calico" ([#573](https://github.com/poseidon/typhoon/pull/573))
#### Bare-Metal
* Add `controllers` and `workers` as typed lists of machine detail objects ([#566](https://github.com/poseidon/typhoon/pull/566))
* Define clusters' machines cleanly and with Terraform v0.12 type constraints (**action required**, see PR example)
* Remove `controller_names`, `controller_macs`, and `controller_domains` variables
* Remove `worker_names`, `worker_macs`, and `worker_domains` variables
#### DigitalOcean
* Change default networking provider from "flannel" to "calico" ([#573](https://github.com/poseidon/typhoon/pull/573))
#### Addons
* Update Grafana from v6.4.1 to [v6.4.2](https://github.com/grafana/grafana/releases/tag/v6.4.2)
* Change CLUO label from "app" to "name"
## v1.16.1
* Kubernetes [v1.16.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161)
* Update etcd from v3.4.0 to [v3.4.1](https://github.com/etcd-io/etcd/releases/tag/v3.4.1)
* Update Calico from v3.8.2 to [v3.9.1](https://docs.projectcalico.org/v3.9/release-notes/)
* Add Terraform v0.12 variables types ([#553](https://github.com/poseidon/typhoon/pull/553), [#557](https://github.com/poseidon/typhoon/pull/557), [#560](https://github.com/poseidon/typhoon/pull/560), [#556](https://github.com/poseidon/typhoon/pull/556), [#562](https://github.com/poseidon/typhoon/pull/562))
* Deprecate `cluster_domain_suffix` variable
#### AWS
* Add `worker_node_labels` variable to set initial worker node labels ([#550](https://github.com/poseidon/typhoon/pull/550))
* Add `node_labels` variable to internal `workers` pool module ([#550](https://github.com/poseidon/typhoon/pull/550))
* For Fedora CoreOS, detect most recent AMI in the region
#### Azure
* Promote `networking` provider Calico VXLAN out of experimental (set `networking = "calico"`)
* Add `worker_node_labels` variable to set initial worker node labels ([#550](https://github.com/poseidon/typhoon/pull/550))
* Add `node_labels` variable to internal `workers` pool module ([#550](https://github.com/poseidon/typhoon/pull/550))
* Change `workers` module default `vm_type` to `Standard_DS1_v2` (followup to [#539](https://github.com/poseidon/typhoon/pull/539))
#### Bare-Metal
* For Fedora CoreOS, use new kernel, initrd, and raw paths ([#563](https://github.com/poseidon/typhoon/pull/563))
* Fix Terraform missing comma error ([#549](https://github.com/poseidon/typhoon/pull/549))
* Remove deprecated `container_linux_oem` variable ([#562](https://github.com/poseidon/typhoon/pull/562))
#### DigitalOcean
* Promote `networking` provider Calico VXLAN out of experimental (set `networking = "calico"`)
* Fix Terraform missing comma error ([#549](https://github.com/poseidon/typhoon/pull/549))
#### Google Cloud
* Add `worker_node_labels` variable to set initial worker node labels ([#550](https://github.com/poseidon/typhoon/pull/550))
* Add `node_labels` variable to internal `workers` module ([#550](https://github.com/poseidon/typhoon/pull/550))
#### Addons
* Update Prometheus from v2.12.0 to [v2.13.0](https://github.com/prometheus/prometheus/releases/tag/v2.13.0)
* Fix Prometheus etcd target discovery and scraping ([#561](https://github.com/poseidon/typhoon/pull/561), regressed with Kubernetes v1.16.0)
* Update kube-state-metrics from v1.7.2 to v1.8.0
* Update nginx-ingress from v0.25.1 to [v0.26.1](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.26.1) ([#555](https://github.com/poseidon/typhoon/pull/555))
* Add lifecycle hook to allow draining for up to 5 minutes
* Update Grafana from v6.3.5 to [v6.4.1](https://github.com/grafana/grafana/releases/tag/v6.4.1)
## v1.16.0
* Kubernetes [v1.16.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1160) ([#543](https://github.com/poseidon/typhoon/pull/543))
* Read about several Kubernetes API [deprecations](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#deprecations-and-removals)!
* Remove legacy node role labels (no longer shown in `kubectl get nodes`)
* Rename node labels to `node.kubernetes.io/master` and `node.kubernetes.io/node` (migratory)
* Migrate control plane from self-hosted to static pods ([#536](https://github.com/poseidon/typhoon/pull/536))
* Run `kube-apiserver`, `kube-scheduler`, and `kube-controller-manager` as static pods on each controller
* `kubectl` edits to `kube-apiserver`, `kube-scheduler`, and `kube-controller-manager` are no longer possible (change)
* Remove bootkube, self-hosted pivot, and `pod-checkpointer`
* Update CoreDNS from v1.5.0 to v1.6.2 ([#535](https://github.com/poseidon/typhoon/pull/535))
* Update etcd from v3.3.15 to [v3.4.0](https://github.com/etcd-io/etcd/releases/tag/v3.4.0)
* Recommend updating `terraform-provider-ct` plugin from v0.3.2 to [v0.4.0](https://github.com/poseidon/terraform-provider-ct/releases/tag/v0.4.0)
#### Azure
* Change default `controller_type` to `Standard_B2s` ([#539](https://github.com/poseidon/typhoon/pull/539))
* `B2s` is cheaper by $17/month and provides 2 vCPU, 4GB RAM
* Change default `worker_type` to `Standard_DS1_v2` ([#539](https://github.com/poseidon/typhoon/pull/539))
* `F1` is previous generation. `DS1_v2` is newer, similar cost, and supports Low Priority mode
#### Addons
* Update Grafana from v6.3.3 to v6.3.5
## v1.15.3
* Kubernetes [v1.15.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1153)
* Update etcd from v3.3.13 to [v3.3.15](https://github.com/etcd-io/etcd/releases/tag/v3.3.15)
* Update Calico from v3.8.1 to [v3.8.2](https://docs.projectcalico.org/v3.8/release-notes/)
#### AWS
* Enable root block device encryption by default ([#527](https://github.com/poseidon/typhoon/pull/527))
* Require `terraform-provider-aws` v2.23+ (**action required**)
#### Addons
* Update Prometheus from v2.11.0 to [v2.12.0](https://github.com/prometheus/prometheus/releases/tag/v2.12.0)
* Update kube-state-metrics from v1.7.1 to v1.7.2
* Update Grafana from v6.2.5 to v6.3.3
* Use stable IDs for etcd, CoreDNS, and Nginx Ingress dashboards ([#530](https://github.com/poseidon/typhoon/pull/530))
* Update nginx-ingress from v0.25.0 to [v0.25.1](https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.25.1)
* Fix Nginx security advisories
## v1.15.2
* Kubernetes [v1.15.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1152)
@ -62,7 +1808,7 @@ Notable changes between versions.
* Require `terraform-provider-azurerm` v1.27+ to support Terraform v0.12 (action required)
* Avoid unneeded rotations of Regular priority virtual machine scale sets
* Azure only allows `eviction_policy` to be set for Low priority VMs. Supporting Low priority VMs meant when Regular VMs were used, each `terraform apply` rolled workers, to set eviction_policy to null.
* Terraform v0.12 nullable variables fix the issue so plan does not produce a diff.
* Terraform v0.12 nullable variables fix the issue so plan does not produce a diff.
#### Bare-Metal
@ -117,7 +1863,7 @@ Notable changes between versions.
* Update Grafana from v6.1.6 to v6.2.1
## v1.14.2
* Kubernetes [v1.14.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#v1142)
* Update etcd from v3.3.12 to [v3.3.13](https://github.com/etcd-io/etcd/releases/tag/v3.3.13)
* Upgrade Calico from v3.6.1 to [v3.7.2](https://docs.projectcalico.org/v3.7/release-notes/)
@ -188,7 +1934,7 @@ Notable changes between versions.
* Add ability to load balance TCP/UDP applications ([#442](https://github.com/poseidon/typhoon/pull/442))
* Add worker instances to a target pool, output as `worker_target_pool`
* Health check for workers with Ingress controllers. Forward rules don't support differing internal/external ports, but some Ingress controllers support TCP/UDP proxy as a workaround
* Health check for workers with Ingress controllers. Forward rules don't support differing internal/external ports, but some Ingress controllers support TCP/UDP proxy as a workaround
* Remove Haswell minimum CPU platform requirement ([#439](https://github.com/poseidon/typhoon/pull/439))
* Google Cloud API implements `min_cpu_platform` to mean "use exactly this CPU". Revert [#405](https://github.com/poseidon/typhoon/pull/405) added in v1.13.4.
* Fix error creating clusters in new regions without Haswell (e.g. europe-west2) ([#438](https://github.com/poseidon/typhoon/issues/438))
@ -373,7 +2119,7 @@ Notable changes between versions.
* Update Calico from v3.3.0 to [v3.3.1](https://docs.projectcalico.org/v3.3/releases/)
* Disable Felix usage reporting by default ([#345](https://github.com/poseidon/typhoon/pull/345))
* Improve flannel manifests
* [Rename](https://github.com/poseidon/terraform-render-bootkube/commit/d045a8e6b8eccfbb9d69bb51953b5a93d23f67f7) `kube-flannel` DaemonSet to `flannel` and `kube-flannel-cfg` ConfigMap to `flannel-config`
* [Rename](https://github.com/poseidon/terraform-render-bootkube/commit/d045a8e6b8eccfbb9d69bb51953b5a93d23f67f7) `kube-flannel` DaemonSet to `flannel` and `kube-flannel-cfg` ConfigMap to `flannel-config`
* [Drop](https://github.com/poseidon/terraform-render-bootkube/commit/39f9afb3360ec642e5b98457c8bd07eda35b6c96) unused mounts and add a CPU resource request
* Update CoreDNS from v1.2.4 to [v1.2.6](https://coredns.io/2018/11/05/coredns-1.2.6-release/)
* Enable CoreDNS `loop` and `loadbalance` plugins ([#340](https://github.com/poseidon/typhoon/pull/340))
@ -535,7 +2281,7 @@ Notable changes between versions.
* Force apiserver to stop listening on `127.0.0.1:8080`
* Replace `kube-dns` with [CoreDNS](https://coredns.io/) ([#261](https://github.com/poseidon/typhoon/pull/261))
* Edit the `coredns` ConfigMap to [customize](https://coredns.io/plugins/)
* CoreDNS doesn't use a resizer. For large clusters, scaling may be required.
* CoreDNS doesn't use a resizer. For large clusters, scaling may be required.
#### AWS
@ -580,7 +2326,7 @@ Notable changes between versions.
* Switch `kube-apiserver` port from 443 to 6443 ([#248](https://github.com/poseidon/typhoon/pull/248))
* Users who exposed kube-apiserver on a WAN via their router/load-balancer will need to adjust its configuration (e.g. DNAT 6443). Most apiservers are on a LAN (internal, VPN-only, etc) so if you didn't specially configure network gear for 443, no change is needed. (possible action required)
* Fix possible deadlock when provisioning clusters larger than 10 nodes ([#244](https://github.com/poseidon/typhoon/pull/244))
* Fix possible deadlock when provisioning clusters larger than 10 nodes ([#244](https://github.com/poseidon/typhoon/pull/244))
#### DigitalOcean
@ -648,7 +2394,7 @@ Notable changes between versions.
* Please change values stable, beta, or alpha to coreos-stable, coreos-beta, coreos-alpha (**action required!**)
* Replace `container_linux_version` variable with `os_version`
* Add `network_ip_autodetection_method` variable for Calico host IPv4 address detection
* Use Calico's default "first-found" to support single NIC and bonded NIC nodes
* Use Calico's default "first-found" to support single NIC and bonded NIC nodes
* Allow [alternative](https://docs.projectcalico.org/v3.1/reference/node/configuration#ip-autodetection-methods) methods for multi NIC nodes, like can-reach=IP or interface=REGEX
* Deprecate `container_linux_oem` variable
@ -681,7 +2427,7 @@ Notable changes between versions.
#### Google Cloud
* Add support for multi-controller clusters (i.e. multi-master) ([#54](https://github.com/poseidon/typhoon/issues/54), [#190](https://github.com/poseidon/typhoon/pull/190))
* Switch from Google Cloud network load balancer to a TCP proxy load balancer. Avoid a [bug](https://issuetracker.google.com/issues/67366622) in Google network load balancers that limited clusters to only bootstrapping one controller node.
* Switch from Google Cloud network load balancer to a TCP proxy load balancer. Avoid a [bug](https://issuetracker.google.com/issues/67366622) in Google network load balancers that limited clusters to only bootstrapping one controller node.
* Add TCP health check for apiserver pods on controllers. Replace kubelet check approximation.
#### Addons
@ -912,7 +2658,7 @@ Notable changes between versions.
* Container Linux stable, beta, and alpha now provide Docker 17.09 (instead
of 1.12)
* Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09
* Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69))
* Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69))
* Add optional `cluster_domain_suffix` variable (#74)
* Use kubernetes-incubator/bootkube v0.9.1

131
README.md
View File

@ -1,4 +1,11 @@
# Typhoon [![IRC](https://img.shields.io/badge/freenode-%23typhoon-0099ef.svg)]() <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
# Typhoon
[![Release](https://img.shields.io/github/v/release/poseidon/typhoon?style=flat-square)](https://github.com/poseidon/typhoon/releases)
[![Stars](https://img.shields.io/github/stars/poseidon/typhoon?style=flat-square)](https://github.com/poseidon/typhoon/stargazers)
[![Sponsors](https://img.shields.io/github/sponsors/poseidon?logo=github&style=flat-square)](https://github.com/sponsors/poseidon)
[![Mastodon](https://img.shields.io/badge/follow-news-6364ff?logo=mastodon&style=flat-square)](https://fosstodon.org/@typhoon)
<img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
Typhoon is a minimal and free Kubernetes distribution.
@ -11,44 +18,67 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.15.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/cl/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
* Kubernetes v1.31.3 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/flatcar-linux/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, CSI, or other [addons](https://typhoon.psdn.io/addons/overview/)
## Modules
Typhoon provides a Terraform Module for each supported operating system and platform.
Typhoon provides a Terraform Module for defining a Kubernetes cluster on each supported operating system and platform.
Typhoon is available for [Fedora CoreOS](https://getfedora.org/coreos/).
| Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------|
| AWS | Container Linux / Flatcar Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable |
| Azure | Container Linux | [azure/container-linux/kubernetes](azure/container-linux/kubernetes) | alpha |
| Bare-Metal | Container Linux / Flatcar Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
| Digital Ocean | Container Linux | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta |
| Google Cloud | Container Linux | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | stable |
A preview of Typhoon for [Fedora CoreOS](https://getfedora.org/coreos/) is available for testing.
| AWS | Fedora CoreOS | [aws/fedora-coreos/kubernetes](aws/fedora-coreos/kubernetes) | stable |
| Azure | Fedora CoreOS | [azure/fedora-coreos/kubernetes](azure/fedora-coreos/kubernetes) | alpha |
| Bare-Metal | Fedora CoreOS | [bare-metal/fedora-coreos/kubernetes](bare-metal/fedora-coreos/kubernetes) | stable |
| DigitalOcean | Fedora CoreOS | [digital-ocean/fedora-coreos/kubernetes](digital-ocean/fedora-coreos/kubernetes) | beta |
| Google Cloud | Fedora CoreOS | [google-cloud/fedora-coreos/kubernetes](google-cloud/fedora-coreos/kubernetes) | stable |
| Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------|
| AWS | Fedora CoreOS | [aws/fedora-coreos/kubernetes](aws/fedora-coreos/kubernetes) | preview |
| Bare-Metal | Fedora CoreOS | [bare-metal/fedora-coreos/kubernetes](bare-metal/fedora-coreos/kubernetes) | preview |
| AWS | Fedora CoreOS (ARM64) | [aws/fedora-coreos/kubernetes](aws/fedora-coreos/kubernetes) | alpha |
Typhoon is available for [Flatcar Linux](https://www.flatcar-linux.org/releases/).
| Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------|
| AWS | Flatcar Linux | [aws/flatcar-linux/kubernetes](aws/flatcar-linux/kubernetes) | stable |
| Azure | Flatcar Linux | [azure/flatcar-linux/kubernetes](azure/flatcar-linux/kubernetes) | alpha |
| Bare-Metal | Flatcar Linux | [bare-metal/flatcar-linux/kubernetes](bare-metal/flatcar-linux/kubernetes) | stable |
| DigitalOcean | Flatcar Linux | [digital-ocean/flatcar-linux/kubernetes](digital-ocean/flatcar-linux/kubernetes) | beta |
| Google Cloud | Flatcar Linux | [google-cloud/flatcar-linux/kubernetes](google-cloud/flatcar-linux/kubernetes) | stable |
| Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------|
| AWS | Flatcar Linux (ARM64) | [aws/flatcar-linux/kubernetes](aws/flatcar-linux/kubernetes) | alpha |
| Azure | Flatcar Linux (ARM64) | [azure/flatcar-linux/kubernetes](azure/flatcar-linux/kubernetes) | alpha |
Typhoon also provides Terraform Modules for optionally managing individual components applied onto clusters.
| Name | Terraform Module | Status |
|---------|------------------|--------|
| CoreDNS | [addons/coredns](addons/coredns) | beta |
| Cilium | [addons/cilium](addons/cilium) | beta |
| flannel | [addons/flannel](addons/flannel) | beta |
## Documentation
* [Docs](https://typhoon.psdn.io)
* Architecture [concepts](https://typhoon.psdn.io/architecture/concepts/) and [operating systems](https://typhoon.psdn.io/architecture/operating-systems/)
* Tutorials for [AWS](docs/cl/aws.md), [Azure](docs/cl/azure.md), [Bare-Metal](docs/cl/bare-metal.md), [Digital Ocean](docs/cl/digital-ocean.md), and [Google-Cloud](docs/cl/google-cloud.md)
* Fedora CoreOS tutorials for [AWS](docs/fedora-coreos/aws.md), [Azure](docs/fedora-coreos/azure.md), [Bare-Metal](docs/fedora-coreos/bare-metal.md), [DigitalOcean](docs/fedora-coreos/digitalocean.md), and [Google Cloud](docs/fedora-coreos/google-cloud.md)
* Flatcar Linux tutorials for [AWS](docs/flatcar-linux/aws.md), [Azure](docs/flatcar-linux/azure.md), [Bare-Metal](docs/flatcar-linux/bare-metal.md), [DigitalOcean](docs/flatcar-linux/digitalocean.md), and [Google Cloud](docs/flatcar-linux/google-cloud.md)
## Usage
Define a Kubernetes cluster by using the Terraform module for your chosen platform and operating system. Here's a minimal example:
```tf
module "google-cloud-yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.15.2"
module "yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=v1.31.3"
# Google Cloud
cluster_name = "yavin"
@ -57,13 +87,19 @@ module "google-cloud-yavin" {
dns_zone_name = "example-zone"
# configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
asset_dir = "/home/user/.secrets/clusters/yavin"
ssh_authorized_key = "ssh-ed25519 AAAAB3Nz..."
# optional
worker_count = 2
worker_preemptible = true
}
# Obtain cluster kubeconfig
resource "local_file" "kubeconfig-yavin" {
content = module.yavin.kubeconfig-admin
filename = "/home/user/.kube/configs/yavin-config"
file_permission = "0600"
}
```
Initialize modules, plan the changes to be made, and apply the changes.
@ -71,20 +107,20 @@ Initialize modules, plan the changes to be made, and apply the changes.
```sh
$ terraform init
$ terraform plan
Plan: 64 to add, 0 to change, 0 to destroy.
Plan: 62 to add, 0 to change, 0 to destroy.
$ terraform apply
Apply complete! Resources: 64 added, 0 changed, 0 destroyed.
Apply complete! Resources: 62 added, 0 changed, 0 destroyed.
```
In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.
```sh
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
$ export KUBECONFIG=/home/user/.kube/configs/yavin-config
$ kubectl get nodes
NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal controller,master Ready 6m v1.15.2
yavin-worker-jrbf.c.example-com.internal node Ready 5m v1.15.2
yavin-worker-mzdm.c.example-com.internal node Ready 5m v1.15.2
NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.31.3
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.31.3
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.31.3
```
List the pods.
@ -92,21 +128,18 @@ List the pods.
```
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-1cs8z 2/2 Running 0 6m
kube-system calico-node-d1l5b 2/2 Running 0 6m
kube-system calico-node-sp9ps 2/2 Running 0 6m
kube-system cilium-1cs8z 1/1 Running 0 6m
kube-system cilium-d1l5b 1/1 Running 0 6m
kube-system cilium-sp9ps 1/1 Running 0 6m
kube-system cilium-operator-68d778b448-g744f 1/1 Running 0 6m
kube-system coredns-1187388186-zj5dl 1/1 Running 0 6m
kube-system coredns-1187388186-dkh3o 1/1 Running 0 6m
kube-system kube-apiserver-zppls 1/1 Running 0 6m
kube-system kube-controller-manager-3271970485-gh9kt 1/1 Running 0 6m
kube-system kube-controller-manager-3271970485-h90v8 1/1 Running 1 6m
kube-system kube-apiserver-controller-0 1/1 Running 0 6m
kube-system kube-controller-manager-controller-0 1/1 Running 0 6m
kube-system kube-proxy-117v6 1/1 Running 0 6m
kube-system kube-proxy-9886n 1/1 Running 0 6m
kube-system kube-proxy-njn47 1/1 Running 0 6m
kube-system kube-scheduler-3895335239-5x87r 1/1 Running 0 6m
kube-system kube-scheduler-3895335239-bzrrt 1/1 Running 1 6m
kube-system pod-checkpointer-l6lrt 1/1 Running 0 6m
kube-system pod-checkpointer-l6lrt-controller-0 1/1 Running 0 6m
kube-system kube-scheduler-controller-0 1/1 Running 0 6m
```
## Non-Goals
@ -119,7 +152,7 @@ Typhoon is strict about minimalism, maturity, and scope. These are not in scope:
## Help
Ask questions on the IRC #typhoon channel on [freenode.net](http://freenode.net/).
Schedule a meeting via [Github Sponsors](https://github.com/sponsors/poseidon?frequency=one-time) to discuss your use case.
## Motivation
@ -129,12 +162,24 @@ Typhoon addresses real world needs, which you may share. It is honest about limi
## Social Contract
Typhoon is not a product, trial, or free-tier. It is not run by a company, does not offer support or services, and does not accept or make any money. It is not associated with any operating system or platform vendor.
Typhoon is not a product, trial, or free-tier. Typhoon does not offer support, services, or charge money. And Typhoon is independent of operating system or platform vendors.
Typhoon clusters will contain only [free](https://www.debian.org/intro/free) components. Cluster components will not collect data on users without their permission.
## Donations
## Sponsors
Typhoon does not accept money donations. Instead, we encourage you to donate to one of [these organizations](https://github.com/poseidon/typhoon/wiki/Donations) to show your appreciation.
Poseidon's Github [Sponsors](https://github.com/sponsors/poseidon) support the infrastructure and operational costs of providing Typhoon.
* [DigitalOcean](https://www.digitalocean.com/) kindly provides credits to support Typhoon test clusters.
<a href="https://www.digitalocean.com/">
<img src="https://opensource.nyc3.cdn.digitaloceanspaces.com/attribution/assets/SVG/DO_Logo_horizontal_blue.svg" width="201px">
</a>
<br>
<br>
<a href="https://deploy.equinix.com/">
<img src="https://storage.googleapis.com/poseidon/equinix.png" width="201px">
</a>
<br>
<br>
If you'd like your company here, please contact dghubble at psdn.io.

View File

@ -0,0 +1,36 @@
resource "kubernetes_cluster_role_binding" "operator" {
metadata {
name = "cilium-operator"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "cilium-operator"
}
subject {
kind = "ServiceAccount"
name = "cilium-operator"
namespace = "kube-system"
}
}
resource "kubernetes_cluster_role_binding" "agent" {
metadata {
name = "cilium-agent"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "cilium-agent"
}
subject {
kind = "ServiceAccount"
name = "cilium-agent"
namespace = "kube-system"
}
}

View File

@ -0,0 +1,112 @@
resource "kubernetes_cluster_role" "operator" {
metadata {
name = "cilium-operator"
}
# detect and restart [core|kube]dns pods on startup
rule {
verbs = ["get", "list", "watch", "delete"]
api_groups = [""]
resources = ["pods"]
}
rule {
verbs = ["list", "watch"]
api_groups = [""]
resources = ["nodes"]
}
rule {
verbs = ["patch"]
api_groups = [""]
resources = ["nodes", "nodes/status"]
}
rule {
verbs = ["get", "list", "watch"]
api_groups = ["discovery.k8s.io"]
resources = ["endpointslices"]
}
rule {
verbs = ["get", "list", "watch"]
api_groups = [""]
resources = ["services"]
}
# Perform LB IP allocation for BGP
rule {
verbs = ["update"]
api_groups = [""]
resources = ["services/status"]
}
# Perform the translation of a CNP that contains `ToGroup` to its endpoints
rule {
verbs = ["get", "list", "watch"]
api_groups = [""]
resources = ["services", "endpoints", "namespaces"]
}
rule {
verbs = ["*"]
api_groups = ["cilium.io"]
resources = ["ciliumnetworkpolicies", "ciliumnetworkpolicies/status", "ciliumnetworkpolicies/finalizers", "ciliumclusterwidenetworkpolicies", "ciliumclusterwidenetworkpolicies/status", "ciliumclusterwidenetworkpolicies/finalizers", "ciliumendpoints", "ciliumendpoints/status", "ciliumendpoints/finalizers", "ciliumnodes", "ciliumnodes/status", "ciliumnodes/finalizers", "ciliumidentities", "ciliumidentities/status", "ciliumidentities/finalizers", "ciliumlocalredirectpolicies", "ciliumlocalredirectpolicies/status", "ciliumlocalredirectpolicies/finalizers", "ciliumendpointslices", "ciliumloadbalancerippools", "ciliumloadbalancerippools/status", "ciliumcidrgroups", "ciliuml2announcementpolicies", "ciliuml2announcementpolicies/status", "ciliumpodippools"]
}
rule {
verbs = ["create", "get", "list", "update", "watch"]
api_groups = ["apiextensions.k8s.io"]
resources = ["customresourcedefinitions"]
}
# Cilium leader elects if among multiple operator replicas
rule {
verbs = ["create", "get", "update"]
api_groups = ["coordination.k8s.io"]
resources = ["leases"]
}
}
resource "kubernetes_cluster_role" "agent" {
metadata {
name = "cilium-agent"
}
rule {
verbs = ["get", "list", "watch"]
api_groups = ["networking.k8s.io"]
resources = ["networkpolicies"]
}
rule {
verbs = ["get", "list", "watch"]
api_groups = ["discovery.k8s.io"]
resources = ["endpointslices"]
}
rule {
verbs = ["get", "list", "watch"]
api_groups = [""]
resources = ["namespaces", "services", "pods", "endpoints", "nodes"]
}
rule {
verbs = ["patch"]
api_groups = [""]
resources = ["nodes/status"]
}
rule {
verbs = ["create", "get", "list", "watch", "update"]
api_groups = ["apiextensions.k8s.io"]
resources = ["customresourcedefinitions"]
}
rule {
verbs = ["*"]
api_groups = ["cilium.io"]
resources = ["ciliumnetworkpolicies", "ciliumnetworkpolicies/status", "ciliumclusterwidenetworkpolicies", "ciliumclusterwidenetworkpolicies/status", "ciliumendpoints", "ciliumendpoints/status", "ciliumnodes", "ciliumnodes/status", "ciliumidentities", "ciliumidentities/status", "ciliumlocalredirectpolicies", "ciliumlocalredirectpolicies/status", "ciliumegressnatpolicies", "ciliumendpointslices", "ciliumcidrgroups", "ciliuml2announcementpolicies", "ciliuml2announcementpolicies/status", "ciliumpodippools"]
}
}

196
addons/cilium/config.tf Normal file
View File

@ -0,0 +1,196 @@
resource "kubernetes_config_map" "cilium" {
metadata {
name = "cilium"
namespace = "kube-system"
}
data = {
# Identity allocation mode selects how identities are shared between cilium
# nodes by setting how they are stored. The options are "crd" or "kvstore".
# - "crd" stores identities in kubernetes as CRDs (custom resource definition).
# These can be queried with:
# kubectl get ciliumid
# - "kvstore" stores identities in a kvstore, etcd or consul, that is
# configured below. Cilium versions before 1.6 supported only the kvstore
# backend. Upgrades from these older cilium versions should continue using
# the kvstore by commenting out the identity-allocation-mode below, or
# setting it to "kvstore".
identity-allocation-mode = "crd"
cilium-endpoint-gc-interval = "5m0s"
nodes-gc-interval = "5m0s"
# If you want to run cilium in debug mode change this value to true
debug = "false"
# The agent can be put into the following three policy enforcement modes
# default, always and never.
# https://docs.cilium.io/en/latest/policy/intro/#policy-enforcement-modes
enable-policy = "default"
# Prometheus
enable-metrics = "true"
prometheus-serve-addr = ":9962"
operator-prometheus-serve-addr = ":9963"
proxy-prometheus-port = "9964" # envoy
# Enable IPv4 addressing. If enabled, all endpoints are allocated an IPv4
# address.
enable-ipv4 = "true"
# Enable IPv6 addressing. If enabled, all endpoints are allocated an IPv6
# address.
enable-ipv6 = "false"
# Enable probing for a more efficient clock source for the BPF datapath
enable-bpf-clock-probe = "true"
# Enable use of transparent proxying mechanisms (Linux 5.7+)
enable-bpf-tproxy = "false"
# If you want cilium monitor to aggregate tracing for packets, set this level
# to "low", "medium", or "maximum". The higher the level, the less packets
# that will be seen in monitor output.
monitor-aggregation = "medium"
# The monitor aggregation interval governs the typical time between monitor
# notification events for each allowed connection.
#
# Only effective when monitor aggregation is set to "medium" or higher.
monitor-aggregation-interval = "5s"
# The monitor aggregation flags determine which TCP flags which, upon the
# first observation, cause monitor notifications to be generated.
#
# Only effective when monitor aggregation is set to "medium" or higher.
monitor-aggregation-flags = "all"
# Specifies the ratio (0.0-1.0) of total system memory to use for dynamic
# sizing of the TCP CT, non-TCP CT, NAT and policy BPF maps.
bpf-map-dynamic-size-ratio = "0.0025"
# bpf-policy-map-max specified the maximum number of entries in endpoint
# policy map (per endpoint)
bpf-policy-map-max = "16384"
# bpf-lb-map-max specifies the maximum number of entries in bpf lb service,
# backend and affinity maps.
bpf-lb-map-max = "65536"
# Pre-allocation of map entries allows per-packet latency to be reduced, at
# the expense of up-front memory allocation for the entries in the maps. The
# default value below will minimize memory usage in the default installation;
# users who are sensitive to latency may consider setting this to "true".
#
# This option was introduced in Cilium 1.4. Cilium 1.3 and earlier ignore
# this option and behave as though it is set to "true".
#
# If this value is modified, then during the next Cilium startup the restore
# of existing endpoints and tracking of ongoing connections may be disrupted.
# As a result, reply packets may be dropped and the load-balancing decisions
# for established connections may change.
#
# If this option is set to "false" during an upgrade from 1.3 or earlier to
# 1.4 or later, then it may cause one-time disruptions during the upgrade.
preallocate-bpf-maps = "false"
# Name of the cluster. Only relevant when building a mesh of clusters.
cluster-name = "default"
# Unique ID of the cluster. Must be unique across all conneted clusters and
# in the range of 1 and 255. Only relevant when building a mesh of clusters.
cluster-id = "0"
# Encapsulation mode for communication between nodes
# Possible values:
# - disabled
# - vxlan (default)
# - geneve
routing-mode = "tunnel"
tunnel = "vxlan"
# Enables L7 proxy for L7 policy enforcement and visibility
enable-l7-proxy = "true"
auto-direct-node-routes = "false"
# enableXTSocketFallback enables the fallback compatibility solution
# when the xt_socket kernel module is missing and it is needed for
# the datapath L7 redirection to work properly. See documentation
# for details on when this can be disabled:
# http://docs.cilium.io/en/latest/install/system_requirements/#admin-kernel-version.
enable-xt-socket-fallback = "true"
# installIptablesRules enables installation of iptables rules to allow for
# TPROXY (L7 proxy injection), itpables based masquerading and compatibility
# with kube-proxy. See documentation for details on when this can be
# disabled.
install-iptables-rules = "true"
# masquerade traffic leaving the node destined for outside
enable-ipv4-masquerade = "true"
enable-ipv6-masquerade = "false"
# bpfMasquerade enables masquerading with BPF instead of iptables
enable-bpf-masquerade = "true"
# kube-proxy
kube-proxy-replacement = "true"
kube-proxy-replacement-healthz-bind-address = ":10256"
enable-session-affinity = "true"
# ClusterIPs from host namespace
bpf-lb-sock = "true"
# ClusterIPs from external nodes
bpf-lb-external-clusterip = "true"
# NodePort
enable-node-port = "true"
enable-health-check-nodeport = "false"
# ExternalIPs
enable-external-ips = "true"
# HostPort
enable-host-port = "true"
# IPAM
ipam = "cluster-pool"
disable-cnp-status-updates = "true"
cluster-pool-ipv4-cidr = "${var.pod_cidr}"
cluster-pool-ipv4-mask-size = "24"
# Health
agent-health-port = "9876"
enable-health-checking = "true"
enable-endpoint-health-checking = "true"
# Identity
enable-well-known-identities = "false"
enable-remote-node-identity = "true"
# Hubble server
enable-hubble = var.enable_hubble
hubble-disable-tls = "false"
hubble-listen-address = ":4244"
hubble-socket-path = "/var/run/cilium/hubble.sock"
hubble-tls-client-ca-files = "/var/lib/cilium/tls/hubble/client-ca.crt"
hubble-tls-cert-file = "/var/lib/cilium/tls/hubble/server.crt"
hubble-tls-key-file = "/var/lib/cilium/tls/hubble/server.key"
hubble-export-file-max-backups = "5"
hubble-export-file-max-size-mb = "10"
# Hubble metrics
hubble-metrics-server = ":9965"
hubble-metrics = "dns drop tcp flow port-distribution icmp httpV2"
enable-hubble-open-metrics = "false"
# Misc
enable-bandwidth-manager = "false"
enable-local-redirect-policy = "false"
policy-audit-mode = "false"
operator-api-serve-addr = "127.0.0.1:9234"
enable-l2-neigh-discovery = "true"
enable-k8s-terminating-endpoint = "true"
enable-k8s-networkpolicy = "true"
external-envoy-proxy = "false"
write-cni-conf-when-ready = "/host/etc/cni/net.d/05-cilium.conflist"
cni-exclusive = "true"
cni-log-file = "/var/run/cilium/cilium-cni.log"
}
}

379
addons/cilium/daemonset.tf Normal file
View File

@ -0,0 +1,379 @@
resource "kubernetes_daemonset" "cilium" {
wait_for_rollout = false
metadata {
name = "cilium"
namespace = "kube-system"
labels = {
k8s-app = "cilium"
}
}
spec {
strategy {
type = "RollingUpdate"
rolling_update {
max_unavailable = "1"
}
}
selector {
match_labels = {
k8s-app = "cilium-agent"
}
}
template {
metadata {
labels = {
k8s-app = "cilium-agent"
}
annotations = {
"prometheus.io/port" = "9962"
"prometheus.io/scrape" = "true"
}
}
spec {
host_network = true
priority_class_name = "system-node-critical"
service_account_name = "cilium-agent"
security_context {
seccomp_profile {
type = "RuntimeDefault"
}
}
toleration {
key = "node-role.kubernetes.io/controller"
operator = "Exists"
}
toleration {
key = "node.kubernetes.io/not-ready"
operator = "Exists"
}
dynamic "toleration" {
for_each = var.daemonset_tolerations
content {
key = toleration.value
operator = "Exists"
}
}
automount_service_account_token = true
enable_service_links = false
# Cilium v1.13.1 starts installing CNI plugins in yet another init container
# https://github.com/cilium/cilium/pull/24075
init_container {
name = "install-cni"
image = "quay.io/cilium/cilium:v1.16.4"
command = ["/install-plugin.sh"]
security_context {
allow_privilege_escalation = true
privileged = true
capabilities {
drop = ["ALL"]
}
}
volume_mount {
name = "cni-bin-dir"
mount_path = "/host/opt/cni/bin"
}
}
# Required to mount cgroup2 filesystem on the underlying Kubernetes node.
# We use nsenter command with host's cgroup and mount namespaces enabled.
init_container {
name = "mount-cgroup"
image = "quay.io/cilium/cilium:v1.16.4"
command = [
"sh",
"-ec",
# The statically linked Go program binary is invoked to avoid any
# dependency on utilities like sh and mount that can be missing on certain
# distros installed on the underlying host. Copy the binary to the
# same directory where we install cilium cni plugin so that exec permissions
# are available.
"cp /usr/bin/cilium-mount /hostbin/cilium-mount && nsenter --cgroup=/hostproc/1/ns/cgroup --mount=/hostproc/1/ns/mnt \"$${BIN_PATH}/cilium-mount\" $CGROUP_ROOT; rm /hostbin/cilium-mount"
]
env {
name = "CGROUP_ROOT"
value = "/run/cilium/cgroupv2"
}
env {
name = "BIN_PATH"
value = "/opt/cni/bin"
}
security_context {
allow_privilege_escalation = true
privileged = true
}
volume_mount {
name = "hostproc"
mount_path = "/hostproc"
}
volume_mount {
name = "cni-bin-dir"
mount_path = "/hostbin"
}
}
init_container {
name = "clean-cilium-state"
image = "quay.io/cilium/cilium:v1.16.4"
command = ["/init-container.sh"]
security_context {
allow_privilege_escalation = true
privileged = true
}
volume_mount {
name = "sys-fs-bpf"
mount_path = "/sys/fs/bpf"
}
volume_mount {
name = "var-run-cilium"
mount_path = "/var/run/cilium"
}
# Required to mount cgroup filesystem from the host to cilium agent pod
volume_mount {
name = "cilium-cgroup"
mount_path = "/run/cilium/cgroupv2"
mount_propagation = "HostToContainer"
}
}
container {
name = "cilium-agent"
image = "quay.io/cilium/cilium:v1.16.4"
command = ["cilium-agent"]
args = [
"--config-dir=/tmp/cilium/config-map"
]
env {
name = "K8S_NODE_NAME"
value_from {
field_ref {
api_version = "v1"
field_path = "spec.nodeName"
}
}
}
env {
name = "CILIUM_K8S_NAMESPACE"
value_from {
field_ref {
api_version = "v1"
field_path = "metadata.namespace"
}
}
}
env {
name = "KUBERNETES_SERVICE_HOST"
value_from {
config_map_key_ref {
name = "in-cluster"
key = "apiserver-host"
}
}
}
env {
name = "KUBERNETES_SERVICE_PORT"
value_from {
config_map_key_ref {
name = "in-cluster"
key = "apiserver-port"
}
}
}
port {
name = "peer-service"
protocol = "TCP"
container_port = 4244
}
# Metrics
port {
name = "metrics"
protocol = "TCP"
container_port = 9962
}
port {
name = "envoy-metrics"
protocol = "TCP"
container_port = 9964
}
port {
name = "hubble-metrics"
protocol = "TCP"
container_port = 9965
}
# Not yet used, prefer exec's
port {
name = "health"
protocol = "TCP"
container_port = 9876
}
lifecycle {
pre_stop {
exec {
command = ["/cni-uninstall.sh"]
}
}
}
security_context {
allow_privilege_escalation = true
privileged = true
}
liveness_probe {
exec {
command = ["cilium", "status", "--brief"]
}
initial_delay_seconds = 120
timeout_seconds = 5
period_seconds = 30
success_threshold = 1
failure_threshold = 10
}
readiness_probe {
exec {
command = ["cilium", "status", "--brief"]
}
initial_delay_seconds = 5
timeout_seconds = 5
period_seconds = 20
success_threshold = 1
failure_threshold = 3
}
# Load kernel modules
volume_mount {
name = "lib-modules"
read_only = true
mount_path = "/lib/modules"
}
# Access iptables concurrently
volume_mount {
name = "xtables-lock"
mount_path = "/run/xtables.lock"
}
# Keep state between restarts
volume_mount {
name = "var-run-cilium"
mount_path = "/var/run/cilium"
}
volume_mount {
name = "sys-fs-bpf"
mount_path = "/sys/fs/bpf"
mount_propagation = "Bidirectional"
}
# Configuration
volume_mount {
name = "config"
read_only = true
mount_path = "/tmp/cilium/config-map"
}
# Install config on host
volume_mount {
name = "cni-conf-dir"
mount_path = "/host/etc/cni/net.d"
}
# Hubble
volume_mount {
name = "hubble-tls"
mount_path = "/var/lib/cilium/tls/hubble"
read_only = true
}
}
termination_grace_period_seconds = 1
# Load kernel modules
volume {
name = "lib-modules"
host_path {
path = "/lib/modules"
}
}
# Access iptables concurrently with other processes (e.g. kube-proxy)
volume {
name = "xtables-lock"
host_path {
path = "/run/xtables.lock"
type = "FileOrCreate"
}
}
# Keep state between restarts
volume {
name = "var-run-cilium"
host_path {
path = "/var/run/cilium"
type = "DirectoryOrCreate"
}
}
# Keep state for bpf maps between restarts
volume {
name = "sys-fs-bpf"
host_path {
path = "/sys/fs/bpf"
type = "DirectoryOrCreate"
}
}
# Mount host cgroup2 filesystem
volume {
name = "hostproc"
host_path {
path = "/proc"
type = "Directory"
}
}
volume {
name = "cilium-cgroup"
host_path {
path = "/run/cilium/cgroupv2"
type = "DirectoryOrCreate"
}
}
# Read configuration
volume {
name = "config"
config_map {
name = "cilium"
}
}
# Install CNI plugin and config on host
volume {
name = "cni-bin-dir"
host_path {
path = "/opt/cni/bin"
type = "DirectoryOrCreate"
}
}
volume {
name = "cni-conf-dir"
host_path {
path = "/etc/cni/net.d"
type = "DirectoryOrCreate"
}
}
# Hubble TLS (optional)
volume {
name = "hubble-tls"
projected {
default_mode = "0400"
sources {
secret {
name = "hubble-server-certs"
optional = true
items {
key = "ca.crt"
path = "client-ca.crt"
}
items {
key = "tls.crt"
path = "server.crt"
}
items {
key = "tls.key"
path = "server.key"
}
}
}
}
}
}
}
}
}

163
addons/cilium/deployment.tf Normal file
View File

@ -0,0 +1,163 @@
resource "kubernetes_deployment" "operator" {
wait_for_rollout = false
metadata {
name = "cilium-operator"
namespace = "kube-system"
}
spec {
replicas = 1
strategy {
type = "RollingUpdate"
rolling_update {
max_unavailable = "1"
}
}
selector {
match_labels = {
name = "cilium-operator"
}
}
template {
metadata {
labels = {
name = "cilium-operator"
}
annotations = {
"prometheus.io/scrape" = "true"
"prometheus.io/port" = "9963"
}
}
spec {
host_network = true
priority_class_name = "system-cluster-critical"
service_account_name = "cilium-operator"
security_context {
seccomp_profile {
type = "RuntimeDefault"
}
}
toleration {
key = "node-role.kubernetes.io/controller"
operator = "Exists"
}
toleration {
key = "node.kubernetes.io/not-ready"
operator = "Exists"
}
topology_spread_constraint {
max_skew = 1
topology_key = "kubernetes.io/hostname"
when_unsatisfiable = "DoNotSchedule"
label_selector {
match_labels = {
name = "cilium-operator"
}
}
}
automount_service_account_token = true
enable_service_links = false
container {
name = "cilium-operator"
image = "quay.io/cilium/operator-generic:v1.16.4"
command = ["cilium-operator-generic"]
args = [
"--config-dir=/tmp/cilium/config-map",
"--debug=$(CILIUM_DEBUG)"
]
env {
name = "K8S_NODE_NAME"
value_from {
field_ref {
api_version = "v1"
field_path = "spec.nodeName"
}
}
}
env {
name = "CILIUM_K8S_NAMESPACE"
value_from {
field_ref {
api_version = "v1"
field_path = "metadata.namespace"
}
}
}
env {
name = "KUBERNETES_SERVICE_HOST"
value_from {
config_map_key_ref {
name = "in-cluster"
key = "apiserver-host"
}
}
}
env {
name = "KUBERNETES_SERVICE_PORT"
value_from {
config_map_key_ref {
name = "in-cluster"
key = "apiserver-port"
}
}
}
env {
name = "CILIUM_DEBUG"
value_from {
config_map_key_ref {
name = "cilium"
key = "debug"
optional = true
}
}
}
port {
name = "metrics"
protocol = "TCP"
host_port = 9963
container_port = 9963
}
port {
name = "health"
container_port = 9234
protocol = "TCP"
}
liveness_probe {
http_get {
scheme = "HTTP"
host = "127.0.0.1"
port = "9234"
path = "/healthz"
}
initial_delay_seconds = 60
timeout_seconds = 3
period_seconds = 10
}
readiness_probe {
http_get {
scheme = "HTTP"
host = "127.0.0.1"
port = "9234"
path = "/healthz"
}
timeout_seconds = 3
period_seconds = 15
failure_threshold = 5
}
volume_mount {
name = "config"
read_only = true
mount_path = "/tmp/cilium/config-map"
}
}
volume {
name = "config"
config_map {
name = "cilium"
}
}
}
}
}
}

View File

@ -0,0 +1,15 @@
resource "kubernetes_service_account" "operator" {
metadata {
name = "cilium-operator"
namespace = "kube-system"
}
automount_service_account_token = false
}
resource "kubernetes_service_account" "agent" {
metadata {
name = "cilium-agent"
namespace = "kube-system"
}
automount_service_account_token = false
}

View File

@ -0,0 +1,17 @@
variable "pod_cidr" {
type = string
description = "CIDR IP range to assign Kubernetes pods"
default = "10.2.0.0/16"
}
variable "daemonset_tolerations" {
type = list(string)
description = "List of additional taint keys kube-system DaemonSets should tolerate (e.g. ['custom-role', 'gpu-role'])"
default = []
}
variable "enable_hubble" {
type = bool
description = "Run the embedded Hubble Server and mount hubble-server-certs Secret"
default = true
}

View File

@ -0,0 +1,8 @@
terraform {
required_providers {
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.8"
}
}
}

View File

@ -1,4 +0,0 @@
apiVersion: v1
kind: Namespace
metadata:
name: reboot-coordinator

View File

@ -1,12 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: reboot-coordinator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: reboot-coordinator
subjects:
- kind: ServiceAccount
namespace: reboot-coordinator
name: default

View File

@ -1,45 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: reboot-coordinator
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- get
- update
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- delete
- apiGroups:
- "extensions"
resources:
- daemonsets
verbs:
- get

View File

@ -1,68 +0,0 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: container-linux-update-agent
namespace: reboot-coordinator
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
app: container-linux-update-agent
template:
metadata:
labels:
app: container-linux-update-agent
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: update-agent
image: quay.io/coreos/container-linux-update-operator:v0.7.0
command:
- "/bin/update-agent"
env:
# read by update-agent as the node name to manage reboots for
- name: UPDATE_AGENT_NODE
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 10m
memory: 20Mi
limits:
cpu: 20m
memory: 40Mi
volumeMounts:
- mountPath: /var/run/dbus
name: var-run-dbus
- mountPath: /etc/coreos
name: etc-coreos
- mountPath: /usr/share/coreos
name: usr-share-coreos
- mountPath: /etc/os-release
name: etc-os-release
volumes:
- name: var-run-dbus
hostPath:
path: /var/run/dbus
- name: etc-coreos
hostPath:
path: /etc/coreos
- name: usr-share-coreos
hostPath:
path: /usr/share/coreos
- name: etc-os-release
hostPath:
path: /etc/os-release

View File

@ -1,39 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: container-linux-update-operator
namespace: reboot-coordinator
spec:
replicas: 1
selector:
matchLabels:
app: container-linux-update-operator
template:
metadata:
labels:
app: container-linux-update-operator
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: update-operator
image: quay.io/coreos/container-linux-update-operator:v0.7.0
command:
- "/bin/update-operator"
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 10m
memory: 20Mi
limits:
cpu: 20m
memory: 40Mi

View File

@ -0,0 +1,37 @@
resource "kubernetes_cluster_role" "coredns" {
metadata {
name = "system:coredns"
}
rule {
api_groups = [""]
resources = [
"endpoints",
"services",
"pods",
"namespaces",
]
verbs = [
"list",
"watch",
]
}
rule {
api_groups = [""]
resources = [
"nodes",
]
verbs = [
"get",
]
}
rule {
api_groups = ["discovery.k8s.io"]
resources = [
"endpointslices",
]
verbs = [
"list",
"watch",
]
}
}

30
addons/coredns/config.tf Normal file
View File

@ -0,0 +1,30 @@
resource "kubernetes_config_map" "coredns" {
metadata {
name = "coredns"
namespace = "kube-system"
}
data = {
"Corefile" = <<-EOF
.:53 {
errors
health {
lameduck 5s
}
ready
log . {
class error
}
kubernetes ${var.cluster_domain_suffix} in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
EOF
}
}

View File

@ -0,0 +1,151 @@
resource "kubernetes_deployment" "coredns" {
wait_for_rollout = false
metadata {
name = "coredns"
namespace = "kube-system"
labels = {
k8s-app = "coredns"
"kubernetes.io/name" = "CoreDNS"
}
}
spec {
replicas = var.replicas
strategy {
type = "RollingUpdate"
rolling_update {
max_unavailable = "1"
}
}
selector {
match_labels = {
k8s-app = "coredns"
tier = "control-plane"
}
}
template {
metadata {
labels = {
k8s-app = "coredns"
tier = "control-plane"
}
}
spec {
affinity {
node_affinity {
preferred_during_scheduling_ignored_during_execution {
weight = 100
preference {
match_expressions {
key = "node.kubernetes.io/controller"
operator = "Exists"
}
}
}
}
pod_anti_affinity {
preferred_during_scheduling_ignored_during_execution {
weight = 100
pod_affinity_term {
label_selector {
match_expressions {
key = "tier"
operator = "In"
values = ["control-plane"]
}
match_expressions {
key = "k8s-app"
operator = "In"
values = ["coredns"]
}
}
topology_key = "kubernetes.io/hostname"
}
}
}
}
dns_policy = "Default"
priority_class_name = "system-cluster-critical"
security_context {
seccomp_profile {
type = "RuntimeDefault"
}
}
service_account_name = "coredns"
toleration {
key = "node-role.kubernetes.io/controller"
effect = "NoSchedule"
}
container {
name = "coredns"
image = "registry.k8s.io/coredns/coredns:v1.12.0"
args = ["-conf", "/etc/coredns/Corefile"]
port {
name = "dns"
container_port = 53
protocol = "UDP"
}
port {
name = "dns-tcp"
container_port = 53
protocol = "TCP"
}
port {
name = "metrics"
container_port = 9153
protocol = "TCP"
}
resources {
requests = {
cpu = "100m"
memory = "70Mi"
}
limits = {
memory = "170Mi"
}
}
security_context {
capabilities {
add = ["NET_BIND_SERVICE"]
drop = ["all"]
}
read_only_root_filesystem = true
}
liveness_probe {
http_get {
path = "/health"
port = "8080"
scheme = "HTTP"
}
initial_delay_seconds = 60
timeout_seconds = 5
success_threshold = 1
failure_threshold = 5
}
readiness_probe {
http_get {
path = "/ready"
port = "8181"
scheme = "HTTP"
}
}
volume_mount {
name = "config"
mount_path = "/etc/coredns"
read_only = true
}
}
volume {
name = "config"
config_map {
name = "coredns"
items {
key = "Corefile"
path = "Corefile"
}
}
}
}
}
}
}

View File

@ -0,0 +1,24 @@
resource "kubernetes_service_account" "coredns" {
metadata {
name = "coredns"
namespace = "kube-system"
}
automount_service_account_token = false
}
resource "kubernetes_cluster_role_binding" "coredns" {
metadata {
name = "system:coredns"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "system:coredns"
}
subject {
kind = "ServiceAccount"
name = "coredns"
namespace = "kube-system"
}
}

31
addons/coredns/service.tf Normal file
View File

@ -0,0 +1,31 @@
resource "kubernetes_service" "coredns" {
metadata {
name = "coredns"
namespace = "kube-system"
labels = {
"k8s-app" = "coredns"
"kubernetes.io/name" = "CoreDNS"
}
annotations = {
"prometheus.io/scrape" = "true"
"prometheus.io/port" = "9153"
}
}
spec {
type = "ClusterIP"
cluster_ip = var.cluster_dns_service_ip
selector = {
k8s-app = "coredns"
}
port {
name = "dns"
protocol = "UDP"
port = 53
}
port {
name = "dns-tcp"
protocol = "TCP"
port = 53
}
}
}

View File

@ -0,0 +1,15 @@
variable "replicas" {
type = number
description = "CoreDNS replica count"
default = 2
}
variable "cluster_dns_service_ip" {
description = "Must be set to `cluster_dns_service_ip` output by cluster"
default = "10.3.0.10"
}
variable "cluster_domain_suffix" {
description = "Must be set to `cluster_domain_suffix` output by cluster"
default = "cluster.local"
}

View File

@ -0,0 +1,9 @@
terraform {
required_providers {
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.8"
}
}
}

View File

@ -0,0 +1,18 @@
resource "kubernetes_cluster_role_binding" "flannel" {
metadata {
name = "flannel"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "flannel"
}
subject {
kind = "ServiceAccount"
name = "flannel"
namespace = "kube-system"
}
}

View File

@ -0,0 +1,24 @@
resource "kubernetes_cluster_role" "flannel" {
metadata {
name = "flannel"
}
rule {
api_groups = [""]
resources = ["pods"]
verbs = ["get"]
}
rule {
api_groups = [""]
resources = ["nodes"]
verbs = ["list", "watch"]
}
rule {
api_groups = [""]
resources = ["nodes/status"]
verbs = ["patch"]
}
}

44
addons/flannel/config.tf Normal file
View File

@ -0,0 +1,44 @@
resource "kubernetes_config_map" "config" {
metadata {
name = "flannel-config"
namespace = "kube-system"
labels = {
k8s-app = "flannel"
tier = "node"
}
}
data = {
"cni-conf.json" = <<-EOF
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
EOF
"net-conf.json" = <<-EOF
{
"Network": "${var.pod_cidr}",
"Backend": {
"Type": "vxlan",
"Port": 4789
}
}
EOF
}
}

167
addons/flannel/daemonset.tf Normal file
View File

@ -0,0 +1,167 @@
resource "kubernetes_daemonset" "flannel" {
metadata {
name = "flannel"
namespace = "kube-system"
labels = {
k8s-app = "flannel"
}
}
spec {
strategy {
type = "RollingUpdate"
rolling_update {
max_unavailable = "1"
}
}
selector {
match_labels = {
k8s-app = "flannel"
}
}
template {
metadata {
labels = {
k8s-app = "flannel"
}
}
spec {
host_network = true
priority_class_name = "system-node-critical"
service_account_name = "flannel"
security_context {
seccomp_profile {
type = "RuntimeDefault"
}
}
toleration {
key = "node-role.kubernetes.io/controller"
operator = "Exists"
}
toleration {
key = "node.kubernetes.io/not-ready"
operator = "Exists"
}
dynamic "toleration" {
for_each = var.daemonset_tolerations
content {
key = toleration.value
operator = "Exists"
}
}
init_container {
name = "install-cni"
image = "quay.io/poseidon/flannel-cni:v0.4.2"
command = ["/install-cni.sh"]
env {
name = "CNI_NETWORK_CONFIG"
value_from {
config_map_key_ref {
name = "flannel-config"
key = "cni-conf.json"
}
}
}
volume_mount {
name = "cni-bin-dir"
mount_path = "/host/opt/cni/bin/"
}
volume_mount {
name = "cni-conf-dir"
mount_path = "/host/etc/cni/net.d"
}
}
container {
name = "flannel"
image = "docker.io/flannel/flannel:v0.26.1"
command = [
"/opt/bin/flanneld",
"--ip-masq",
"--kube-subnet-mgr",
"--iface=$(POD_IP)"
]
env {
name = "POD_NAME"
value_from {
field_ref {
field_path = "metadata.name"
}
}
}
env {
name = "POD_NAMESPACE"
value_from {
field_ref {
field_path = "metadata.namespace"
}
}
}
env {
name = "POD_IP"
value_from {
field_ref {
field_path = "status.podIP"
}
}
}
security_context {
privileged = true
}
resources {
requests = {
cpu = "100m"
}
}
volume_mount {
name = "flannel-config"
mount_path = "/etc/kube-flannel/"
}
volume_mount {
name = "run-flannel"
mount_path = "/run/flannel"
}
volume_mount {
name = "xtables-lock"
mount_path = "/run/xtables.lock"
}
}
volume {
name = "flannel-config"
config_map {
name = "flannel-config"
}
}
volume {
name = "run-flannel"
host_path {
path = "/run/flannel"
}
}
# Used by install-cni
volume {
name = "cni-bin-dir"
host_path {
path = "/opt/cni/bin"
}
}
volume {
name = "cni-conf-dir"
host_path {
path = "/etc/cni/net.d"
type = "DirectoryOrCreate"
}
}
# Acces iptables concurrently
volume {
name = "xtables-lock"
host_path {
path = "/run/xtables.lock"
type = "FileOrCreate"
}
}
}
}
}
}

View File

@ -0,0 +1,7 @@
resource "kubernetes_service_account" "flannel" {
metadata {
name = "flannel"
namespace = "kube-system"
}
}

View File

@ -0,0 +1,11 @@
variable "pod_cidr" {
type = string
description = "CIDR IP range to assign Kubernetes pods"
default = "10.2.0.0/16"
}
variable "daemonset_tolerations" {
type = list(string)
description = "List of additional taint keys kube-system DaemonSets should tolerate (e.g. ['custom-role', 'gpu-role'])"
default = []
}

View File

@ -0,0 +1,8 @@
terraform {
required_providers {
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.8"
}
}
}

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-coredns
namespace: monitoring
data:
coredns.json: |-
{
@ -26,7 +22,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -41,6 +37,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -53,6 +50,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": "true"
},
@ -76,7 +74,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(coredns_dns_request_count_total{instance=~\"$instance\"}[5m])) by (proto)",
"expr": "sum(rate(coredns_dns_requests_total{instance=~\"$instance\"}[5m])) by (proto)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{proto}}",
@ -132,6 +130,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -144,6 +143,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": "true"
},
@ -167,7 +167,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(coredns_dns_request_type_count_total{instance=~\"$instance\"}[5m])) by (type)",
"expr": "sum(rate(coredns_dns_requests_total{instance=~\"$instance\"}[5m])) by (type)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{type}}",
@ -223,6 +223,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -235,6 +236,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": "true"
},
@ -258,7 +260,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(coredns_dns_request_count_total{instance=~\"$instance\"}[5m])) by (zone)",
"expr": "sum(rate(coredns_dns_requests_total{instance=~\"$instance\"}[5m])) by (zone)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{zone}}",
@ -327,6 +329,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -339,6 +342,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": false
},
@ -432,6 +436,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -444,6 +449,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": false
},
@ -467,7 +473,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(coredns_dns_response_rcode_count_total{instance=~\"$instance\"}[5m])) by (rcode)",
"expr": "sum(rate(coredns_dns_responses_total{instance=~\"$instance\"}[5m])) by (rcode)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{rcode}}",
@ -536,6 +542,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -548,6 +555,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": false
},
@ -641,6 +649,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -653,6 +662,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": false
},
@ -759,6 +769,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -771,6 +782,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": false
},
@ -794,7 +806,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(coredns_cache_size{instance=~\"$instance\"}) by (type)",
"expr": "sum(coredns_cache_entries{instance=~\"$instance\"}) by (type)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{type}}",
@ -850,6 +862,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -862,6 +875,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": false
},
@ -976,17 +990,43 @@ data:
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": null,
"label": "pod",
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(coredns_build_info{job=\"coredns\"}, instance)",
"query": "label_values(coredns_build_info{cluster=\"$cluster\", job=\"coredns\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
@ -1029,7 +1069,12 @@ data:
"30d"
]
},
"timezone": "browser",
"timezone": "",
"title": "CoreDNS",
"uid": "2f3f749259235f58698ea949170d3bd5",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-coredns
namespace: monitoring

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-etcd
namespace: monitoring
data:
etcd.json: |-
{
@ -15,7 +11,6 @@ data:
"editable": true,
"gnetId": null,
"hideControls": false,
"id": 6,
"links": [
],
@ -145,7 +140,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(grpc_server_started_total{job=\"$cluster\",grpc_type=\"unary\"}[5m]))",
"expr": "sum(rate(grpc_server_started_total{job=\"$cluster\",grpc_type=\"unary\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "RPC Rate",
@ -154,7 +149,7 @@ data:
"step": 2
},
{
"expr": "sum(rate(grpc_server_handled_total{job=\"$cluster\",grpc_type=\"unary\",grpc_code!=\"OK\"}[5m]))",
"expr": "sum(rate(grpc_server_handled_total{job=\"$cluster\",grpc_type=\"unary\",grpc_code=~\"Unknown|FailedPrecondition|ResourceExhausted|Internal|Unavailable|DataLoss|DeadlineExceeded\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "RPC Failed Rate",
@ -347,7 +342,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "etcd_debugging_mvcc_db_total_size_in_bytes{job=\"$cluster\"}",
"expr": "etcd_mvcc_db_total_size_in_bytes{job=\"$cluster\"}",
"hide": false,
"interval": "",
"intervalFactor": 2,
@ -435,7 +430,7 @@ data:
"steppedLine": true,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket{job=\"$cluster\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket{job=\"$cluster\"}[$__rate_interval])) by (instance, le))",
"hide": false,
"intervalFactor": 2,
"legendFormat": "{{instance}} WAL fsync",
@ -444,7 +439,7 @@ data:
"step": 4
},
{
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket{job=\"$cluster\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket{job=\"$cluster\"}[$__rate_interval])) by (instance, le))",
"intervalFactor": 2,
"legendFormat": "{{instance}} DB fsync",
"metric": "etcd_disk_backend_commit_duration_seconds_bucket",
@ -622,7 +617,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(etcd_network_client_grpc_received_bytes_total{job=\"$cluster\"}[5m])",
"expr": "rate(etcd_network_client_grpc_received_bytes_total{job=\"$cluster\"}[$__rate_interval])",
"intervalFactor": 2,
"legendFormat": "{{instance}} Client Traffic In",
"metric": "etcd_network_client_grpc_received_bytes_total",
@ -708,7 +703,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(etcd_network_client_grpc_sent_bytes_total{job=\"$cluster\"}[5m])",
"expr": "rate(etcd_network_client_grpc_sent_bytes_total{job=\"$cluster\"}[$__rate_interval])",
"intervalFactor": 2,
"legendFormat": "{{instance}} Client Traffic Out",
"metric": "etcd_network_client_grpc_sent_bytes_total",
@ -794,7 +789,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_network_peer_received_bytes_total{job=\"$cluster\"}[5m])) by (instance)",
"expr": "sum(rate(etcd_network_peer_received_bytes_total{job=\"$cluster\"}[$__rate_interval])) by (instance)",
"intervalFactor": 2,
"legendFormat": "{{instance}} Peer Traffic In",
"metric": "etcd_network_peer_received_bytes_total",
@ -883,7 +878,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_network_peer_sent_bytes_total{job=\"$cluster\"}[5m])) by (instance)",
"expr": "sum(rate(etcd_network_peer_sent_bytes_total{job=\"$cluster\"}[$__rate_interval])) by (instance)",
"hide": false,
"interval": "",
"intervalFactor": 2,
@ -977,7 +972,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_server_proposals_failed_total{job=\"$cluster\"}[5m]))",
"expr": "sum(rate(etcd_server_proposals_failed_total{job=\"$cluster\"}[$__rate_interval]))",
"intervalFactor": 2,
"legendFormat": "Proposal Failure Rate",
"metric": "etcd_server_proposals_failed_total",
@ -993,7 +988,7 @@ data:
"step": 2
},
{
"expr": "sum(rate(etcd_server_proposals_committed_total{job=\"$cluster\"}[5m]))",
"expr": "sum(rate(etcd_server_proposals_committed_total{job=\"$cluster\"}[$__rate_interval]))",
"intervalFactor": 2,
"legendFormat": "Proposal Commit Rate",
"metric": "etcd_server_proposals_committed_total",
@ -1001,7 +996,7 @@ data:
"step": 2
},
{
"expr": "sum(rate(etcd_server_proposals_applied_total{job=\"$cluster\"}[5m]))",
"expr": "sum(rate(etcd_server_proposals_applied_total{job=\"$cluster\"}[$__rate_interval]))",
"intervalFactor": 2,
"legendFormat": "Proposal Apply Rate",
"refId": "D",
@ -1136,6 +1131,131 @@ data:
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"decimals": 0,
"editable": true,
"error": false,
"fieldConfig": {
"defaults": {
"custom": {
}
},
"overrides": [
]
},
"fill": 0,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 28
},
"hiddenSeries": false,
"id": 42,
"isNew": true,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"nullPointMode": "connected",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.4.3",
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (instance, le) (rate(etcd_network_peer_round_trip_time_seconds_bucket{job=\"$cluster\"}[$__rate_interval])))",
"interval": "",
"intervalFactor": 2,
"legendFormat": "{{instance}} Peer round trip time",
"metric": "etcd_network_peer_round_trip_time_seconds_bucket",
"refId": "A",
"step": 2
}
],
"thresholds": [
],
"timeFrom": null,
"timeRegions": [
],
"timeShift": null,
"title": "Peer round trip time",
"tooltip": {
"msResolution": false,
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"$$hashKey": "object:925",
"decimals": null,
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:926",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"title": "New row"
@ -1145,7 +1265,7 @@ data:
"sharedCrosshair": false,
"style": "dark",
"tags": [
"etcd-mixin"
],
"templating": {
"list": [
@ -1155,7 +1275,7 @@ data:
"value": "Prometheus"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@ -1181,7 +1301,7 @@ data:
],
"query": "label_values(etcd_server_has_leader, job)",
"refresh": 1,
"refresh": 2,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
@ -1224,7 +1344,12 @@ data:
"30d"
]
},
"timezone": "browser",
"timezone": "",
"title": "etcd",
"uid": "c2f4e12cdf69feb95caa41a5a1b423d9",
"version": 215
}
kind: ConfigMap
metadata:
name: grafana-dashboards-etcd
namespace: monitoring

View File

@ -0,0 +1,7644 @@
apiVersion: v1
data:
cluster-total.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"panels": [
{
"collapse": false,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"panels": [
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Current Bandwidth",
"titleSize": "h6",
"type": "row"
},
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 1
},
"id": 3,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{namespace}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Rate of Bytes Received",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 1
},
"id": 4,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{namespace}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Rate of Bytes Transmitted",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"columns": [
{
"text": "Time",
"value": "Time"
},
{
"text": "Value #A",
"value": "Value #A"
},
{
"text": "Value #B",
"value": "Value #B"
},
{
"text": "Value #C",
"value": "Value #C"
},
{
"text": "Value #D",
"value": "Value #D"
},
{
"text": "Value #E",
"value": "Value #E"
},
{
"text": "Value #F",
"value": "Value #F"
},
{
"text": "Value #G",
"value": "Value #G"
},
{
"text": "Value #H",
"value": "Value #H"
},
{
"text": "namespace",
"value": "namespace"
}
],
"datasource": "$datasource",
"fill": 1,
"fontSize": "90%",
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 10
},
"id": 5,
"lines": true,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null as zero",
"renderer": "flot",
"scroll": true,
"showHeader": true,
"sort": {
"col": 0,
"desc": false
},
"spaceLength": 10,
"span": 24,
"styles": [
{
"alias": "Time",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Time",
"thresholds": [
],
"type": "hidden",
"unit": "short"
},
{
"alias": "Current Bandwidth Received",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Current Bandwidth Transmitted",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Average Bandwidth Received",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Average Bandwidth Transmitted",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Rate of Received Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Received Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "d/8b7a8b326d7a6f1f04244066368c67af/kubernetes-networking-namespace-pods?orgId=1&refresh=30s&var-namespace=$__cell",
"pattern": "namespace",
"thresholds": [
],
"type": "number",
"unit": "short"
}
],
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sort_desc(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sort_desc(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sort_desc(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sort_desc(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sort_desc(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sort_desc(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sort_desc(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"timeFrom": null,
"timeShift": null,
"title": "Current Status",
"type": "table"
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 10
},
"id": 6,
"panels": [
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 11
},
"id": 7,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{namespace}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Rate of Bytes Received",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 11
},
"id": 8,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{namespace}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Rate of Bytes Transmitted",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Average Bandwidth",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 11
},
"id": 9,
"panels": [
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Bandwidth History",
"titleSize": "h6",
"type": "row"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 12
},
"id": 10,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": true,
"min": true,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 24,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{namespace}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 21
},
"id": 11,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": true,
"min": true,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 24,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{namespace}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 30
},
"id": 12,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 31
},
"id": 13,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": true,
"min": true,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 24,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{namespace}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 40
},
"id": 14,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": true,
"min": true,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 24,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{namespace}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Packets",
"titleSize": "h6",
"type": "row"
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 31
},
"id": 15,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 50
},
"id": 16,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": true,
"min": true,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 24,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{namespace}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 59
},
"id": 17,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": true,
"min": true,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 24,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\",namespace=~\".+\"}[$interval:$resolution])) by (namespace))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{namespace}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 59
},
"id": 18,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": true,
"min": true,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [
{
"targetBlank": true,
"title": "What is TCP Retransmit?",
"url": "https://accedian.com/enterprises/blog/network-packet-loss-retransmissions-and-duplicate-acknowledgements/"
}
],
"minSpan": 24,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(rate(node_netstat_Tcp_RetransSegs{cluster=\"$cluster\"}[$interval:$resolution]) / rate(node_netstat_Tcp_OutSegs{cluster=\"$cluster\"}[$interval:$resolution])) by (instance))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{instance}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of TCP Retransmits out of all sent segments",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 59
},
"id": 19,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": true,
"min": true,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 2,
"links": [
{
"targetBlank": true,
"title": "Why monitor SYN retransmits?",
"url": "https://github.com/prometheus/node_exporter/issues/1023#issuecomment-408128365"
}
],
"minSpan": 24,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(rate(node_netstat_TcpExt_TCPSynRetrans{cluster=\"$cluster\"}[$interval:$resolution]) / rate(node_netstat_Tcp_RetransSegs{cluster=\"$cluster\"}[$interval:$resolution])) by (instance))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{instance}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of TCP SYN Retransmits out of all retransmits",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Errors",
"titleSize": "h6",
"type": "row"
}
],
"refresh": "10s",
"rows": [
],
"schemaVersion": 18,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "resolution",
"options": [
{
"selected": false,
"text": "30s",
"value": "30s"
},
{
"selected": true,
"text": "5m",
"value": "5m"
},
{
"selected": false,
"text": "1h",
"value": "1h"
}
],
"query": "30s,5m,1h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
},
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(up{job=\"kubernetes-cadvisor\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "UTC",
"title": "Kubernetes / Networking / Cluster",
"uid": "ff635a025bcfea7bc3dd4f508990a3e9",
"version": 0
}
namespace-by-pod.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"panels": [
{
"collapse": false,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"panels": [
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Current Bandwidth",
"titleSize": "h6",
"type": "row"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"decimals": 0,
"format": "time_series",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 1
},
"height": 9,
"id": 3,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"minSpan": 12,
"nullPointMode": "connected",
"nullText": null,
"options": {
"fieldOptions": {
"calcs": [
"last"
],
"defaults": {
"max": 10000000000,
"min": 0,
"title": "$namespace",
"unit": "Bps"
},
"mappings": [
],
"override": {
},
"thresholds": [
{
"color": "dark-green",
"index": 0,
"value": null
},
{
"color": "dark-yellow",
"index": 1,
"value": 5000000000
},
{
"color": "dark-red",
"index": 2,
"value": 7000000000
}
],
"values": false
}
},
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 12,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution]))",
"format": "time_series",
"instant": null,
"intervalFactor": 1,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"timeFrom": null,
"timeShift": null,
"title": "Current Rate of Bytes Received",
"type": "gauge",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"decimals": 0,
"format": "time_series",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 1
},
"height": 9,
"id": 4,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"minSpan": 12,
"nullPointMode": "connected",
"nullText": null,
"options": {
"fieldOptions": {
"calcs": [
"last"
],
"defaults": {
"max": 10000000000,
"min": 0,
"title": "$namespace",
"unit": "Bps"
},
"mappings": [
],
"override": {
},
"thresholds": [
{
"color": "dark-green",
"index": 0,
"value": null
},
{
"color": "dark-yellow",
"index": 1,
"value": 5000000000
},
{
"color": "dark-red",
"index": 2,
"value": 7000000000
}
],
"values": false
}
},
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 12,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution]))",
"format": "time_series",
"instant": null,
"intervalFactor": 1,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"timeFrom": null,
"timeShift": null,
"title": "Current Rate of Bytes Transmitted",
"type": "gauge",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"columns": [
{
"text": "Time",
"value": "Time"
},
{
"text": "Value #A",
"value": "Value #A"
},
{
"text": "Value #B",
"value": "Value #B"
},
{
"text": "Value #C",
"value": "Value #C"
},
{
"text": "Value #D",
"value": "Value #D"
},
{
"text": "Value #E",
"value": "Value #E"
},
{
"text": "Value #F",
"value": "Value #F"
},
{
"text": "pod",
"value": "pod"
}
],
"datasource": "$datasource",
"fill": 1,
"fontSize": "100%",
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 10
},
"id": 5,
"lines": true,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null as zero",
"renderer": "flot",
"scroll": true,
"showHeader": true,
"sort": {
"col": 0,
"desc": false
},
"spaceLength": 10,
"span": 24,
"styles": [
{
"alias": "Time",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Time",
"thresholds": [
],
"type": "hidden",
"unit": "short"
},
{
"alias": "Bandwidth Received",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Bandwidth Transmitted",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Rate of Received Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Received Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "d/7a18067ce943a40ae25454675c19ff5c/kubernetes-networking-pod?orgId=1&refresh=30s&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"timeFrom": null,
"timeShift": null,
"title": "Current Status",
"type": "table"
},
{
"collapse": false,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 19
},
"id": 6,
"panels": [
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Bandwidth",
"titleSize": "h6",
"type": "row"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 20
},
"id": 7,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 20
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 29
},
"id": 9,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 30
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 10,
"w": 12,
"x": 12,
"y": 30
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Packets",
"titleSize": "h6",
"type": "row"
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 30
},
"id": 12,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 40
},
"id": 13,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 10,
"w": 12,
"x": 12,
"y": 40
},
"id": 14,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Errors",
"titleSize": "h6",
"type": "row"
}
],
"refresh": "10s",
"rows": [
],
"schemaVersion": 18,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(up{job=\"kubernetes-cadvisor\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "kube-system",
"value": "kube-system"
},
"datasource": "$datasource",
"definition": "label_values(container_network_receive_packets_total{cluster=\"$cluster\"}, namespace)",
"hide": 0,
"includeAll": true,
"label": null,
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(container_network_receive_packets_total{cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "resolution",
"options": [
{
"selected": false,
"text": "30s",
"value": "30s"
},
{
"selected": true,
"text": "5m",
"value": "5m"
},
{
"selected": false,
"text": "1h",
"value": "1h"
}
],
"query": "30s,5m,1h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "UTC",
"title": "Kubernetes / Networking / Namespace (Pods)",
"uid": "8b7a8b326d7a6f1f04244066368c67af",
"version": 0
}
namespace-by-workload.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"panels": [
{
"collapse": false,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"panels": [
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Current Bandwidth",
"titleSize": "h6",
"type": "row"
},
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 1
},
"id": 3,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{ workload }}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Rate of Bytes Received",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 1
},
"id": 4,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{ workload }}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Rate of Bytes Transmitted",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"columns": [
{
"text": "Time",
"value": "Time"
},
{
"text": "Value #A",
"value": "Value #A"
},
{
"text": "Value #B",
"value": "Value #B"
},
{
"text": "Value #C",
"value": "Value #C"
},
{
"text": "Value #D",
"value": "Value #D"
},
{
"text": "Value #E",
"value": "Value #E"
},
{
"text": "Value #F",
"value": "Value #F"
},
{
"text": "Value #G",
"value": "Value #G"
},
{
"text": "Value #H",
"value": "Value #H"
},
{
"text": "workload",
"value": "workload"
}
],
"datasource": "$datasource",
"fill": 1,
"fontSize": "90%",
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 10
},
"id": 5,
"lines": true,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null as zero",
"renderer": "flot",
"scroll": true,
"showHeader": true,
"sort": {
"col": 0,
"desc": false
},
"spaceLength": 10,
"span": 24,
"styles": [
{
"alias": "Time",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Time",
"thresholds": [
],
"type": "hidden",
"unit": "short"
},
{
"alias": "Current Bandwidth Received",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Current Bandwidth Transmitted",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Average Bandwidth Received",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Average Bandwidth Transmitted",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Rate of Received Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Received Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Workload",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTooltip": "Drill down",
"linkUrl": "d/728bf77cc1166d2f3133bf25846876cc/kubernetes-networking-workload?orgId=1&refresh=30s&var-namespace=$namespace&var-type=$type&var-workload=$__cell",
"pattern": "workload",
"thresholds": [
],
"type": "number",
"unit": "short"
}
],
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sort_desc(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sort_desc(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sort_desc(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sort_desc(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sort_desc(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sort_desc(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sort_desc(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"timeFrom": null,
"timeShift": null,
"title": "Current Status",
"type": "table"
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 19
},
"id": 6,
"panels": [
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 20
},
"id": 7,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(avg(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{ workload }}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Rate of Bytes Received",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 20
},
"id": 8,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(avg(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{ workload }}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Rate of Bytes Transmitted",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Average Bandwidth",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 29
},
"id": 9,
"panels": [
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Bandwidth HIstory",
"titleSize": "h6",
"type": "row"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 38
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{workload}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 38
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{workload}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 39
},
"id": 12,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 40
},
"id": 13,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_packets_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{workload}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 40
},
"id": 14,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{workload}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Packets",
"titleSize": "h6",
"type": "row"
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 40
},
"id": 15,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 41
},
"id": 16,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{workload}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 41
},
"id": 17,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\",namespace=\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{workload}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Errors",
"titleSize": "h6",
"type": "row"
}
],
"refresh": "10s",
"rows": [
],
"schemaVersion": 18,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(up{job=\"kubernetes-cadvisor\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "kube-system",
"value": "kube-system"
},
"datasource": "$datasource",
"definition": "label_values(container_network_receive_packets_total{cluster=\"$cluster\"}, namespace)",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(container_network_receive_packets_total{cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "deployment",
"value": "deployment"
},
"datasource": "$datasource",
"definition": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\"}, workload_type)",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "type",
"options": [
],
"query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=\"$namespace\", workload=~\".+\"}, workload_type)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "resolution",
"options": [
{
"selected": false,
"text": "30s",
"value": "30s"
},
{
"selected": true,
"text": "5m",
"value": "5m"
},
{
"selected": false,
"text": "1h",
"value": "1h"
}
],
"query": "30s,5m,1h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "UTC",
"title": "Kubernetes / Networking / Namespace (Workload)",
"uid": "bbb2a765a623ae38130206c7d94a160f",
"version": 0
}
pod-total.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"panels": [
{
"collapse": false,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"panels": [
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Current Bandwidth",
"titleSize": "h6",
"type": "row"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"decimals": 0,
"format": "time_series",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 1
},
"height": 9,
"id": 3,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"minSpan": 12,
"nullPointMode": "connected",
"nullText": null,
"options": {
"fieldOptions": {
"calcs": [
"last"
],
"defaults": {
"max": 10000000000,
"min": 0,
"title": "$namespace: $pod",
"unit": "Bps"
},
"mappings": [
],
"override": {
},
"thresholds": [
{
"color": "dark-green",
"index": 0,
"value": null
},
{
"color": "dark-yellow",
"index": 1,
"value": 5000000000
},
{
"color": "dark-red",
"index": 2,
"value": 7000000000
}
],
"values": false
}
},
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 12,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=~\"$namespace\", pod=~\"$pod\"}[$interval:$resolution]))",
"format": "time_series",
"instant": null,
"intervalFactor": 1,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"timeFrom": null,
"timeShift": null,
"title": "Current Rate of Bytes Received",
"type": "gauge",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"decimals": 0,
"format": "time_series",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 1
},
"height": 9,
"id": 4,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"minSpan": 12,
"nullPointMode": "connected",
"nullText": null,
"options": {
"fieldOptions": {
"calcs": [
"last"
],
"defaults": {
"max": 10000000000,
"min": 0,
"title": "$namespace: $pod",
"unit": "Bps"
},
"mappings": [
],
"override": {
},
"thresholds": [
{
"color": "dark-green",
"index": 0,
"value": null
},
{
"color": "dark-yellow",
"index": 1,
"value": 5000000000
},
{
"color": "dark-red",
"index": 2,
"value": 7000000000
}
],
"values": false
}
},
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 12,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=~\"$namespace\", pod=~\"$pod\"}[$interval:$resolution]))",
"format": "time_series",
"instant": null,
"intervalFactor": 1,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"timeFrom": null,
"timeShift": null,
"title": "Current Rate of Bytes Transmitted",
"type": "gauge",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"collapse": false,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 10
},
"id": 5,
"panels": [
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Bandwidth",
"titleSize": "h6",
"type": "row"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 11
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{cluster=\"$cluster\",namespace=~\"$namespace\", pod=~\"$pod\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 11
},
"id": 7,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{cluster=\"$cluster\",namespace=~\"$namespace\", pod=~\"$pod\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 20
},
"id": 8,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 21
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{cluster=\"$cluster\",namespace=~\"$namespace\", pod=~\"$pod\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 10,
"w": 12,
"x": 12,
"y": 21
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_total{cluster=\"$cluster\",namespace=~\"$namespace\", pod=~\"$pod\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Packets",
"titleSize": "h6",
"type": "row"
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 21
},
"id": 11,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 32
},
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{cluster=\"$cluster\",namespace=~\"$namespace\", pod=~\"$pod\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 10,
"w": 12,
"x": 12,
"y": 32
},
"id": 13,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{cluster=\"$cluster\",namespace=~\"$namespace\", pod=~\"$pod\"}[$interval:$resolution])) by (pod)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Errors",
"titleSize": "h6",
"type": "row"
}
],
"refresh": "10s",
"rows": [
],
"schemaVersion": 18,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(up{job=\"kubernetes-cadvisor\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "kube-system",
"value": "kube-system"
},
"datasource": "$datasource",
"definition": "label_values(container_network_receive_packets_total{cluster=\"$cluster\"}, namespace)",
"hide": 0,
"includeAll": true,
"label": null,
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(container_network_receive_packets_total{cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"definition": "label_values(container_network_receive_packets_total{cluster=\"$cluster\",namespace=~\"$namespace\"}, pod)",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(container_network_receive_packets_total{cluster=\"$cluster\",namespace=~\"$namespace\"}, pod)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "resolution",
"options": [
{
"selected": false,
"text": "30s",
"value": "30s"
},
{
"selected": true,
"text": "5m",
"value": "5m"
},
{
"selected": false,
"text": "1h",
"value": "1h"
}
],
"query": "30s,5m,1h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "UTC",
"title": "Kubernetes / Networking / Pod",
"uid": "7a18067ce943a40ae25454675c19ff5c",
"version": 0
}
workload-total.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"panels": [
{
"collapse": false,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"panels": [
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Current Bandwidth",
"titleSize": "h6",
"type": "row"
},
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 1
},
"id": 3,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{ pod }}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Rate of Bytes Received",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 1
},
"id": 4,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{ pod }}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Rate of Bytes Transmitted",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 10
},
"id": 5,
"panels": [
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 11
},
"id": 6,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(avg(irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{ pod }}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Rate of Bytes Received",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 11
},
"id": 7,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"sort": "current",
"sortDesc": true,
"total": false,
"values": true
},
"lines": false,
"linewidth": 1,
"links": [
],
"minSpan": 24,
"nullPointMode": "null",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 24,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(avg(irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{ pod }}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Rate of Bytes Transmitted",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "series",
"name": null,
"show": false,
"values": [
"current"
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Average Bandwidth",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 11
},
"id": 8,
"panels": [
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Bandwidth HIstory",
"titleSize": "h6",
"type": "row"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 12
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 12
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 21
},
"id": 11,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 22
},
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 22
},
"id": 13,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Packets",
"titleSize": "h6",
"type": "row"
},
{
"collapse": true,
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 22
},
"id": 14,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 23
},
"id": 15,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_receive_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 23
},
"id": 16,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [
],
"minSpan": 12,
"nullPointMode": "connected",
"paceLength": 10,
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum(irate(container_network_transmit_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\",namespace=~\"$namespace\"}[$interval:$resolution])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{pod}}",
"refId": "A",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Errors",
"titleSize": "h6",
"type": "row"
}
],
"refresh": "10s",
"rows": [
],
"schemaVersion": 18,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info{job=\"kube-state-metrics\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "kube-system",
"value": "kube-system"
},
"datasource": "$datasource",
"definition": "label_values(container_network_receive_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\"}, namespace)",
"hide": 0,
"includeAll": true,
"label": null,
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(container_network_receive_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"definition": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\"}, workload)",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "workload",
"options": [
],
"query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\"}, workload)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "deployment",
"value": "deployment"
},
"datasource": "$datasource",
"definition": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\"}, workload_type)",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "type",
"options": [
],
"query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\",namespace=~\"$namespace\", workload=~\"$workload\"}, workload_type)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "resolution",
"options": [
{
"selected": false,
"text": "30s",
"value": "30s"
},
{
"selected": true,
"text": "5m",
"value": "5m"
},
{
"selected": false,
"text": "1h",
"value": "1h"
}
],
"query": "30s,5m,1h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "5m",
"value": "5m"
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "interval",
"options": [
{
"selected": true,
"text": "4h",
"value": "4h"
}
],
"query": "4h",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "interval",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "UTC",
"title": "Kubernetes / Networking / Workload",
"uid": "728bf77cc1166d2f3133bf25846876cc",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s-network
namespace: monitoring

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s-nodes
namespace: monitoring
data:
kubelet.json: |-
{
@ -25,3683 +21,2105 @@ data:
"links": [
],
"refresh": "",
"rows": [
"panels": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 2,
"interval": null,
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 2,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(up{job=\"kubelet\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Up",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "min"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 3,
"interval": null,
"links": [
"mappings": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 2,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(kubelet_running_pod_count{job=\"kubelet\", instance=~\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": "",
"title": "Running Pods",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "min"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"thresholds": {
"mode": "absolute",
"steps": [
]
},
"id": 4,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 2,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(kubelet_running_container_count{job=\"kubelet\", instance=~\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": "",
"title": "Running Container",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "min"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 2,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(volume_manager_total_volumes{job=\"kubelet\", instance=~\"$instance\", state=\"actual_state_of_world\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": "",
"title": "Actual Volume Count",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "min"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 6,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 2,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(volume_manager_total_volumes{job=\"kubelet\", instance=~\"$instance\",state=\"desired_state_of_world\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": "",
"title": "Desired Volume Count",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "min"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 7,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 2,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(kubelet_node_config_error{job=\"kubelet\", instance=~\"$instance\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": "",
"title": "Config Error Count",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "min"
"unit": "none"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_runtime_operations_total{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (operation_type, instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Operation Rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_runtime_operations_errors_total{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_type)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Operation Error Rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_runtime_operations_duration_seconds_bucket{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_type, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Operation duration 99th quantile",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_pod_start_duration_seconds_count{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} pod",
"refId": "A"
},
{
"expr": "sum(rate(kubelet_pod_worker_duration_seconds_count{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} worker",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Pod Start Rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 12,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_start_duration_seconds_count{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} pod",
"refId": "A"
},
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} worker",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Pod Start Duration",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 13,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"hideEmpty": "true",
"hideZero": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(storage_operation_duration_seconds_count{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_name, volume_plugin)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_name}} {{volume_plugin}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Storage Operation Rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 14,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"hideEmpty": "true",
"hideZero": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(storage_operation_errors_total{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, operation_name, volume_plugin)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_name}} {{volume_plugin}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Storage Operation Error Rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 15,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"hideEmpty": "true",
"hideZero": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": true,
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(storage_operation_duration_seconds_bucket{job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, operation_name, volume_plugin, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_name}} {{volume_plugin}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Storage Operation Duration 99th quantile",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 16,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_cgroup_manager_duration_seconds_count{job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, operation_type)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{operation_type}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Cgroup manager operation rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 17,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_cgroup_manager_duration_seconds_bucket{job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, operation_type, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Cgroup manager 99th quantile",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "Pod lifecycle event generator",
"fill": 1,
"gridPos": {
},
"id": 18,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_pleg_relist_duration_seconds_count{job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "PLEG relist rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 19,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_interval_seconds_bucket{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "PLEG relist interval",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 20,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_duration_seconds_bucket{job=\"kubelet\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "PLEG relist duration",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 21,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(rest_client_requests_total{job=\"kubelet\", instance=~\"$instance\",code=~\"2..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "2xx",
"refId": "A"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kubelet\", instance=~\"$instance\",code=~\"3..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "3xx",
"refId": "B"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kubelet\", instance=~\"$instance\",code=~\"4..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "4xx",
"refId": "C"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kubelet\", instance=~\"$instance\",code=~\"5..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5xx",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "RPC Rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 22,
"legend": {
"alignAsTable": "true",
"avg": false,
"current": "true",
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"total": false,
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kubelet\", instance=~\"$instance\"}[5m])) by (instance, verb, url, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{verb}} {{url}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Request duration 99th quantile",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 23,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes{job=\"kubelet\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 24,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(process_cpu_seconds_total{job=\"kubelet\",instance=~\"$instance\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 25,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "go_goroutines{job=\"kubelet\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Goroutines",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"gridPos": {
"h": 7,
"w": 4,
"x": 0,
"y": 0
},
"id": 2,
"links": [
],
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": null,
"multi": false,
"name": "instance",
"options": [
"textMode": "auto"
},
"pluginVersion": "7",
"targets": [
{
"expr": "sum(kubelet_node_name{cluster=\"$cluster\", job=\"kubelet\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"title": "Running Kubelets",
"transparent": false,
"type": "stat"
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"links": [
],
"query": "label_values(kubelet_runtime_operations{job=\"kubelet\"}, instance)",
"refresh": 2,
"regex": "",
],
"mappings": [
],
"thresholds": {
"mode": "absolute",
"steps": [
]
},
"unit": "none"
}
},
"gridPos": {
"h": 7,
"w": 4,
"x": 4,
"y": 0
},
"id": 3,
"links": [
],
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "7",
"targets": [
{
"expr": "sum(kubelet_running_pods{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}) OR sum(kubelet_running_pod_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"title": "Running Pods",
"transparent": false,
"type": "stat"
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"links": [
],
"mappings": [
],
"thresholds": {
"mode": "absolute",
"steps": [
]
},
"unit": "none"
}
},
"gridPos": {
"h": 7,
"w": 4,
"x": 8,
"y": 0
},
"id": 4,
"links": [
],
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "7",
"targets": [
{
"expr": "sum(kubelet_running_containers{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}) OR sum(kubelet_running_container_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"title": "Running Containers",
"transparent": false,
"type": "stat"
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"links": [
],
"mappings": [
],
"thresholds": {
"mode": "absolute",
"steps": [
]
},
"unit": "none"
}
},
"gridPos": {
"h": 7,
"w": 4,
"x": 12,
"y": 0
},
"id": 5,
"links": [
],
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "7",
"targets": [
{
"expr": "sum(volume_manager_total_volumes{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\", state=\"actual_state_of_world\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"title": "Actual Volume Count",
"transparent": false,
"type": "stat"
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"links": [
],
"mappings": [
],
"thresholds": {
"mode": "absolute",
"steps": [
]
},
"unit": "none"
}
},
"gridPos": {
"h": 7,
"w": 4,
"x": 16,
"y": 0
},
"id": 6,
"links": [
],
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "7",
"targets": [
{
"expr": "sum(volume_manager_total_volumes{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\",state=\"desired_state_of_world\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"title": "Desired Volume Count",
"transparent": false,
"type": "stat"
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"links": [
],
"mappings": [
],
"thresholds": {
"mode": "absolute",
"steps": [
]
},
"unit": "none"
}
},
"gridPos": {
"h": 7,
"w": 4,
"x": 20,
"y": 0
},
"id": 7,
"links": [
],
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "7",
"targets": [
{
"expr": "sum(rate(kubelet_node_config_error{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"title": "Config Error Count",
"transparent": false,
"type": "stat"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 7
},
"id": 8,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_runtime_operations_total{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (operation_type, instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Operation Rate",
"tooltip": {
"shared": true,
"sort": 0,
"tagValuesQuery": "",
"tags": [
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Kubelet",
"uid": "3138fa155d5915769fbded898ac09fd9",
"version": 0
}
nodes.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
]
},
"yaxes": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(node_load1{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "load 1m",
"refId": "A"
},
{
"expr": "max(node_load5{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "load 5m",
"refId": "B"
},
{
"expr": "max(node_load15{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "load 15m",
"refId": "C"
},
{
"expr": "count(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\", mode=\"user\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "logical cores",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "System load",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (cpu) (irate(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", mode!=\"idle\", instance=\"$instance\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cpu}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Usage Per Core",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
]
},
{
"collapse": false,
"collapsed": false,
"panels": [
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 7
},
"id": 9,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": "true",
"avg": "true",
"current": "true",
"max": "false",
"min": "false",
"rightSide": "true",
"show": "true",
"total": "false",
"values": "true"
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max (sum by (cpu) (irate(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", mode!=\"idle\", instance=\"$instance\"}[2m])) ) * 100\n",
"format": "time_series",
"intervalFactor": 10,
"legendFormat": "{{ cpu }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Utilization",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percent",
"label": null,
"logBase": 1,
"max": 100,
"min": 0,
"show": true
},
{
"format": "percent",
"label": null,
"logBase": 1,
"max": 100,
"min": 0,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "avg(sum by (cpu) (irate(node_cpu_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", mode!=\"idle\", instance=\"$instance\"}[2m]))) * 100\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "CPU Usage",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
"expr": "sum(rate(kubelet_runtime_operations_errors_total{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (instance, operation_type)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
"refId": "A"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Operation Error Rate",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"collapse": false,
"collapsed": false,
"panels": [
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 24,
"x": 0,
"y": 14
},
"id": 10,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(\n node_memory_MemTotal_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_MemFree_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Buffers_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Cached_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory used",
"refId": "A"
},
{
"expr": "max(node_memory_Buffers_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory buffers",
"refId": "B"
},
{
"expr": "max(node_memory_Cached_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory cached",
"refId": "C"
},
{
"expr": "max(node_memory_MemFree_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory free",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 7,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(\n (\n (\n node_memory_MemTotal_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_MemFree_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Buffers_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_memory_Cached_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n )\n / node_memory_MemTotal_bytes{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n ) * 100)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "Memory Usage",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
"expr": "histogram_quantile(0.99, sum(rate(kubelet_runtime_operations_duration_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (instance, operation_type, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
"refId": "A"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Operation duration 99th quantile",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"collapse": false,
"collapsed": false,
"panels": [
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 21
},
"id": 11,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "read",
"yaxis": 1
},
{
"alias": "io time",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(rate(node_disk_read_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "read",
"refId": "A"
},
{
"expr": "max(rate(node_disk_written_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "written",
"refId": "B"
},
{
"expr": "max(rate(node_disk_io_time_seconds_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "io time",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk I/O",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
"expr": "sum(rate(kubelet_pod_start_duration_seconds_count{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} pod",
"refId": "A"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node:node_filesystem_usage:{cluster=\"$cluster\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}} disk used",
"refId": "A"
},
{
"expr": "node:node_filesystem_usage:{cluster=\"$cluster\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}} disk free",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Space Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
"expr": "sum(rate(kubelet_pod_worker_duration_seconds_count{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} worker",
"refId": "B"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Pod Start Rate",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"collapse": false,
"collapsed": false,
"panels": [
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 21
},
"id": 12,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_receive_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\", device!~\"lo\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Received",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_start_duration_seconds_count{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} pod",
"refId": "A"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_transmit_bytes_total{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\", device!~\"lo\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Transmitted",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} worker",
"refId": "B"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Pod Start Duration",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"collapse": false,
"collapsed": false,
"panels": [
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 28
},
"id": 13,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(\n node_filesystem_files{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_filesystem_files_free{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "inodes used",
"refId": "A"
},
{
"expr": "max(node_filesystem_files_free{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "inodes free",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Inodes Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 13,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(\n (\n (\n node_filesystem_files{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n - node_filesystem_files_free{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n )\n / node_filesystem_files{cluster=\"$cluster\", job=\"node-exporter\", instance=\"$instance\"}\n ) * 100)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "Inodes Usage",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
"expr": "sum(rate(storage_operation_duration_seconds_count{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (instance, operation_name, volume_plugin)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_name}} {{volume_plugin}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Storage Operation Rate",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 28
},
"id": 14,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(storage_operation_errors_total{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (instance, operation_name, volume_plugin)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_name}} {{volume_plugin}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Storage Operation Error Rate",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 24,
"x": 0,
"y": 35
},
"id": 15,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"hideEmpty": true,
"hideZero": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(storage_operation_duration_seconds_bucket{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}[$__rate_interval])) by (instance, operation_name, volume_plugin, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_name}} {{volume_plugin}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Storage Operation Duration 99th quantile",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 42
},
"id": 16,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_cgroup_manager_duration_seconds_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}[$__rate_interval])) by (instance, operation_type)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{operation_type}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Cgroup manager operation rate",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 42
},
"id": 17,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_cgroup_manager_duration_seconds_bucket{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}[$__rate_interval])) by (instance, operation_type, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{operation_type}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Cgroup manager 99th quantile",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "Pod lifecycle event generator",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 49
},
"id": 18,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubelet_pleg_relist_duration_seconds_count{cluster=\"$cluster\", job=\"kubelet\", instance=~\"$instance\"}[$__rate_interval])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "PLEG relist rate",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 49
},
"id": 19,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_interval_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "PLEG relist interval",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 24,
"x": 0,
"y": 56
},
"id": 20,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_duration_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "PLEG relist duration",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 24,
"x": 0,
"y": 63
},
"id": 21,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\",code=~\"2..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "2xx",
"refId": "A"
},
{
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\",code=~\"3..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "3xx",
"refId": "B"
},
{
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\",code=~\"4..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "4xx",
"refId": "C"
},
{
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\",code=~\"5..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5xx",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "RPC Rate",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 24,
"x": 0,
"y": 70
},
"id": 22,
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{cluster=\"$cluster\",job=\"kubelet\", instance=~\"$instance\"}[$__rate_interval])) by (instance, verb, url, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{verb}} {{url}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Request duration 99th quantile",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 8,
"x": 0,
"y": 77
},
"id": 23,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 8,
"x": 8,
"y": 77
},
"id": 24,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(process_cpu_seconds_total{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}[$__rate_interval])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 8,
"x": 16,
"y": 77
},
"id": 25,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "go_goroutines{cluster=\"$cluster\",job=\"kubelet\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Goroutines",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"refresh": "10s",
"rows": [
],
"schemaVersion": 14,
"style": "dark",
@ -3712,11 +2130,11 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@ -3740,10 +2158,10 @@ data:
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"query": "label_values(up{job=\"kubelet\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -3759,17 +2177,17 @@ data:
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"includeAll": true,
"label": "instance",
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(node_boot_time_seconds{cluster=\"$cluster\", job=\"node-exporter\"}, instance)",
"query": "label_values(up{job=\"kubelet\",cluster=\"$cluster\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -3809,9 +2227,9 @@ data:
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Nodes",
"uid": "fa49a4706d07a042595b664c87fb33ea",
"timezone": "UTC",
"title": "Kubernetes / Kubelet",
"uid": "3138fa155d5915769fbded898ac09fd9",
"version": 0
}
proxy.json: |-
@ -3835,7 +2253,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -3863,7 +2281,11 @@ data:
},
"id": 2,
"interval": null,
"interval": "1m",
"legend": {
"alignAsTable": true,
"rightSide": true
},
"links": [
],
@ -3902,7 +2324,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(up{job=\"kube-proxy\"})",
"expr": "sum(up{cluster=\"$cluster\", job=\"kube-proxy\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -3934,18 +2356,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 3,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -3969,7 +2394,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubeproxy_sync_proxy_rules_duration_seconds_count{job=\"kube-proxy\", instance=~\"$instance\"}[5m]))",
"expr": "sum(rate(kubeproxy_sync_proxy_rules_duration_seconds_count{cluster=\"$cluster\", job=\"kube-proxy\", instance=~\"$instance\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "rate",
@ -4025,20 +2450,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 4,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -4060,7 +2488,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99,rate(kubeproxy_sync_proxy_rules_duration_seconds_bucket{job=\"kube-proxy\", instance=~\"$instance\"}[5m]))",
"expr": "histogram_quantile(0.99,rate(kubeproxy_sync_proxy_rules_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-proxy\", instance=~\"$instance\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -4129,18 +2557,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 5,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -4164,7 +2595,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(kubeproxy_network_programming_duration_seconds_count{job=\"kube-proxy\", instance=~\"$instance\"}[5m]))",
"expr": "sum(rate(kubeproxy_network_programming_duration_seconds_count{cluster=\"$cluster\", job=\"kube-proxy\", instance=~\"$instance\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "rate",
@ -4220,20 +2651,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 6,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -4255,7 +2689,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(kubeproxy_network_programming_duration_seconds_bucket{job=\"kube-proxy\", instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(kubeproxy_network_programming_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-proxy\", instance=~\"$instance\"}[$__rate_interval])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -4324,18 +2758,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 7,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -4359,28 +2796,28 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-proxy\", instance=~\"$instance\",code=~\"2..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\", job=\"kube-proxy\", instance=~\"$instance\",code=~\"2..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "2xx",
"refId": "A"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-proxy\", instance=~\"$instance\",code=~\"3..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\", job=\"kube-proxy\", instance=~\"$instance\",code=~\"3..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "3xx",
"refId": "B"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-proxy\", instance=~\"$instance\",code=~\"4..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\", job=\"kube-proxy\", instance=~\"$instance\",code=~\"4..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "4xx",
"refId": "C"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-proxy\", instance=~\"$instance\",code=~\"5..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\", job=\"kube-proxy\", instance=~\"$instance\",code=~\"5..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5xx",
@ -4436,18 +2873,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 8,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -4471,7 +2911,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-proxy\",instance=~\"$instance\",verb=\"POST\"}[5m])) by (verb, url, le))",
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-proxy\",instance=~\"$instance\",verb=\"POST\"}[$__rate_interval])) by (verb, url, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}",
@ -4540,20 +2980,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 9,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -4575,7 +3018,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-proxy\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))",
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-proxy\", instance=~\"$instance\", verb=\"GET\"}[$__rate_interval])) by (verb, url, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}",
@ -4644,18 +3087,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 10,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -4679,7 +3125,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes{job=\"kube-proxy\",instance=~\"$instance\"}",
"expr": "process_resident_memory_bytes{cluster=\"$cluster\", job=\"kube-proxy\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -4735,18 +3181,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 11,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -4770,7 +3219,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(process_cpu_seconds_total{job=\"kube-proxy\",instance=~\"$instance\"}[5m])",
"expr": "rate(process_cpu_seconds_total{cluster=\"$cluster\", job=\"kube-proxy\",instance=~\"$instance\"}[$__rate_interval])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -4826,18 +3275,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 12,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -4861,7 +3313,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "go_goroutines{job=\"kube-proxy\",instance=~\"$instance\"}",
"expr": "go_goroutines{cluster=\"$cluster\", job=\"kube-proxy\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -4927,11 +3379,11 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@ -4945,6 +3397,32 @@ data:
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(up{job=\"kube-proxy\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
@ -4955,10 +3433,10 @@ data:
"options": [
],
"query": "label_values(kubeproxy_network_programming_duration_seconds_bucket{job=\"kube-proxy\"}, instance)",
"query": "label_values(up{job=\"kube-proxy\", cluster=\"$cluster\", job=\"kube-proxy\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -4998,8 +3476,12 @@ data:
"30d"
]
},
"timezone": "",
"timezone": "UTC",
"title": "Kubernetes / Proxy",
"uid": "632e265de029684c40b21cb76bca4f94",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s-nodes
namespace: monitoring

View File

@ -0,0 +1,6622 @@
apiVersion: v1
data:
k8s-resources-pod.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "requests",
"color": "#F2495C",
"fill": 0,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
},
{
"alias": "limits",
"color": "#FF9830",
"fill": 0,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
}
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace=\"$namespace\", pod=\"$pod\", cluster=\"$cluster\"}) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"cpu\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "requests",
"legendLink": null,
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"cpu\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limits",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 2,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": true,
"max": true,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(increase(container_cpu_cfs_throttled_periods_total{job=\"kubernetes-cadvisor\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", cluster=\"$cluster\"}[$__rate_interval])) by (container) /sum(increase(container_cpu_cfs_periods_total{job=\"kubernetes-cadvisor\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", cluster=\"$cluster\"}[$__rate_interval])) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
{
"colorMode": "critical",
"fill": true,
"line": true,
"op": "gt",
"value": 0.80000000000000004,
"yaxis": "left"
}
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Throttling",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Throttling",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 3,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Container",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "container",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container) / sum(cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 4,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
},
{
"alias": "limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
}
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"memory\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "requests",
"legendLink": null,
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"memory\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limits",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage (WSS)",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage (WSS)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Usage (RSS)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Cache)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #G",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Usage (Swap)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #H",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Container",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "container",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(cluster:namespace:pod_memory:active:kube_pod_container_resource_requests{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", image!=\"\"}) by (container) / sum(cluster:namespace:pod_memory:active:kube_pod_container_resource_requests{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(cluster:namespace:pod_memory:active:kube_pod_container_resource_limits{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container) / sum(cluster:namespace:pod_memory:active:kube_pod_container_resource_limits{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(container_memory_rss{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container != \"\", container != \"POD\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
},
{
"expr": "sum(container_memory_cache{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container != \"\", container != \"POD\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "G",
"step": 10
},
{
"expr": "sum(container_memory_swap{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container != \"\", container != \"POD\"}) by (container)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "H",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Bandwidth",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Rate of Packets",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 10,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_receive_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 11,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(container_network_transmit_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval])) by (pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Rate of Packets Dropped",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"decimals": -1,
"fill": 10,
"id": 12,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "ceil(sum by(pod) (rate(container_fs_reads_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval])))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Reads",
"legendLink": null,
"step": 10
},
{
"expr": "ceil(sum by(pod) (rate(container_fs_writes_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\",namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval])))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Writes",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "IOPS",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 13,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_fs_reads_bytes_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Reads",
"legendLink": null,
"step": 10
},
{
"expr": "sum by(pod) (rate(container_fs_writes_bytes_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Writes",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ThroughPut",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Storage IO - Distribution(Pod - Read & Writes)",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"decimals": -1,
"fill": 10,
"id": 14,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "ceil(sum by(container) (rate(container_fs_reads_total{job=\"kubernetes-cadvisor\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]) + rate(container_fs_writes_total{job=\"kubernetes-cadvisor\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval])))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "IOPS(Reads+Writes)",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 15,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(container) (rate(container_fs_reads_bytes_total{job=\"kubernetes-cadvisor\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]) + rate(container_fs_writes_bytes_total{job=\"kubernetes-cadvisor\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{container}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ThroughPut(Read+Write)",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Storage IO - Distribution(Containers)",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 16,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"sort": {
"col": 4,
"desc": true
},
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "IOPS(Reads)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": -1,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "IOPS(Writes)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": -1,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "IOPS(Reads + Writes)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": -1,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Throughput(Read)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Throughput(Write)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Throughput(Read + Write)",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Container",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "container",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum by(container) (rate(container_fs_reads_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum by(container) (rate(container_fs_writes_total{job=\"kubernetes-cadvisor\",device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum by(container) (rate(container_fs_reads_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]) + rate(container_fs_writes_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum by(container) (rate(container_fs_reads_bytes_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum by(container) (rate(container_fs_writes_bytes_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum by(container) (rate(container_fs_reads_bytes_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]) + rate(container_fs_writes_bytes_total{job=\"kubernetes-cadvisor\", device=~\"mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+\", container!=\"\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[$__rate_interval]))",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Storage IO",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Storage IO - Distribution",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(up{job=\"kube-state-metrics\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_namespace_status_phase{job=\"kube-state-metrics\", cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\"}, pod)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "UTC",
"title": "Kubernetes / Compute Resources / Pod",
"uid": "6581e46e4e5c7ba40a07646395ef7b23",
"version": 0
}
k8s-resources-workload.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "/d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "/d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(\n container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n/sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=\"$workload\", workload_type=\"$type\"}\n) by (pod)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Current Receive Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Current Transmit Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Rate of Received Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Received Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "/d/6581e46e4e5c7ba40a07646395ef7b23/k8s-resources-pod?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-pod=$__cell",
"pattern": "pod",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Network Usage",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Current Network Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Bandwidth",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Container Bandwidth by Pod: Received",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Container Bandwidth by Pod: Transmitted",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Average Container Bandwidth by Pod",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 10,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 11,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Rate of Packets",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 12,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 13,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\"$workload\", workload_type=\"$type\"}) by (pod))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Rate of Packets Dropped",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(up{job=\"kube-state-metrics\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_namespace_status_phase{job=\"kube-state-metrics\", cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "type",
"options": [
],
"query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\"}, workload_type)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "workload",
"options": [
],
"query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}, workload)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "UTC",
"title": "Kubernetes / Compute Resources / Workload",
"uid": "a164a7f0339f99e89cea5cb47e9be617",
"version": 0
}
k8s-resources-workloads-namespace.json: |-
{
"annotations": {
"list": [
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "quota - requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hiddenSeries": true,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
},
{
"alias": "quota - limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hiddenSeries": true,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
}
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}} - {{workload_type}}",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - requests",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - limits",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Running Pods",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "CPU Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "CPU Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Workload",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "/d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"pattern": "workload",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workload Type",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "workload_type",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(\n node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=\"$cluster\", namespace=\"$namespace\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Quota",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "quota - requests",
"color": "#F2495C",
"dashes": true,
"fill": 0,
"hiddenSeries": true,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
},
{
"alias": "quota - limits",
"color": "#FF9830",
"dashes": true,
"fill": 0,
"hiddenSeries": true,
"hideTooltip": true,
"legend": true,
"linewidth": 2,
"stack": false
}
],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}} - {{workload_type}}",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"requests.memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - requests",
"legendLink": null,
"step": 10
},
{
"expr": "scalar(kube_resourcequota{cluster=\"$cluster\", namespace=\"$namespace\", type=\"hard\",resource=\"limits.memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "quota - limits",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Running Pods",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Memory Usage",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Requests %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Memory Limits",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "bytes"
},
{
"alias": "Memory Limits %",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "percentunit"
},
{
"alias": "Workload",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "/d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$__cell_2",
"pattern": "workload",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workload Type",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "workload_type",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload, workload_type)",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "sum(\n container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "sum(\n container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "sum(\n container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", container!=\"\", image!=\"\"}\n * on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n/sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\"}\n* on(namespace,pod)\n group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}\n) by (workload, workload_type)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Quota",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Memory Quota",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Current Receive Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Current Transmit Bandwidth",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [
],
"type": "number",
"unit": "Bps"
},
{
"alias": "Rate of Received Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #C",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #D",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Received Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #E",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Rate of Transmitted Packets Dropped",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #F",
"thresholds": [
],
"type": "number",
"unit": "pps"
},
{
"alias": "Workload",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": true,
"linkTargetBlank": false,
"linkTooltip": "Drill down to pods",
"linkUrl": "/d/a164a7f0339f99e89cea5cb47e9be617/k8s-resources-workload?var-datasource=$datasource&var-cluster=$cluster&var-namespace=$namespace&var-workload=$__cell&var-type=$type",
"pattern": "workload",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "Workload Type",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "workload_type",
"thresholds": [
],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [
],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "C",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "D",
"step": 10
},
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "E",
"step": 10
},
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload_type=\"$type\"}) by (workload))\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "F",
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Current Network Usage",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Current Network Usage",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Receive Bandwidth",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Transmit Bandwidth",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Bandwidth",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Container Bandwidth by Workload: Received",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 9,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(avg(irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Average Container Bandwidth by Workload: Transmitted",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Average Container Bandwidth by Workload",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 10,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 11,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Rate of Packets",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 12,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_receive_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Received Packets Dropped",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 13,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [
],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(sum(irate(container_network_transmit_packets_dropped_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])\n* on (namespace,pod)\ngroup_left(workload,workload_type) namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\".+\", workload_type=\"$type\"}) by (workload))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{workload}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Rate of Transmitted Packets Dropped",
"tooltip": {
"shared": false,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "pps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Rate of Packets Dropped",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(up{job=\"kube-state-metrics\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "",
"value": ""
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{job=\"kube-state-metrics\", cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"auto": false,
"auto_count": 30,
"auto_min": "10s",
"current": {
"text": "deployment",
"value": "deployment"
},
"datasource": "$datasource",
"definition": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\".+\"}, workload_type)",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "type",
"options": [
],
"query": "label_values(namespace_workload_pod:kube_pod_owner:relabel{cluster=\"$cluster\", namespace=\"$namespace\", workload=~\".+\"}, workload_type)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "UTC",
"title": "Kubernetes / Compute Resources / Namespace (Workloads)",
"uid": "a87fb0d919ec0ea5f6543124e16c42a5",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s-resources-2
namespace: monitoring

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s
namespace: monitoring
data:
apiserver.json: |-
{
@ -25,7 +21,25 @@ data:
"links": [
],
"refresh": "",
"panels": [
{
"content": "The SLO (service level objective) and other metrics displayed on this dashboard are for informational purposes only.",
"datasource": null,
"description": "The SLO (service level objective) and other metrics displayed on this dashboard are for informational purposes only.",
"gridPos": {
"h": 2,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"mode": "markdown",
"span": 12,
"title": "Notice",
"type": "text"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -41,7 +55,9 @@ data:
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"decimals": 3,
"description": "How many percent of requests (both read and write) in 30 days have been answered successfully and fast enough?",
"format": "percentunit",
"gauge": {
"maxValue": 100,
"minValue": 0,
@ -52,8 +68,12 @@ data:
"gridPos": {
},
"id": 2,
"interval": null,
"id": 3,
"interval": "1m",
"legend": {
"alignAsTable": true,
"rightSide": true
},
"links": [
],
@ -82,7 +102,7 @@ data:
"to": "null"
}
],
"span": 2,
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
@ -92,7 +112,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(up{job=\"apiserver\"})",
"expr": "apiserver_request:availability30d{verb=\"all\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -100,7 +120,7 @@ data:
}
],
"thresholds": "",
"title": "Up",
"title": "Availability (30d) > 99.000%",
"tooltip": {
"shared": false
},
@ -113,7 +133,7 @@ data:
"value": "null"
}
],
"valueName": "min"
"valueName": "avg"
},
{
"aliasColors": {
@ -123,19 +143,24 @@ data:
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"decimals": 3,
"description": "How much error budget is left looking at our 0.990% availability guarantees?",
"fill": 10,
"fillGradient": 0,
"gridPos": {
},
"id": 3,
"id": 4,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -154,37 +179,16 @@ data:
],
"spaceLength": 10,
"span": 5,
"span": 8,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"2..\"}[5m]))",
"expr": "100 * (apiserver_request:availability30d{verb=\"all\", cluster=\"$cluster\"} - 0.990000)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "2xx",
"legendFormat": "errorbudget",
"refId": "A"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"3..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "3xx",
"refId": "B"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"4..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "4xx",
"refId": "C"
},
{
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\", instance=~\"$instance\",code=~\"5..\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5xx",
"refId": "D"
}
],
"thresholds": [
@ -192,7 +196,7 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "RPC Rate",
"title": "ErrorBudget (30d) > 99.000%",
"tooltip": {
"shared": false,
"sort": 0,
@ -210,7 +214,8 @@ data:
},
"yaxes": [
{
"format": "ops",
"decimals": 3,
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
@ -218,7 +223,221 @@ data:
"show": true
},
{
"format": "ops",
"decimals": 3,
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"decimals": 3,
"description": "How many percent of read requests (LIST,GET) in 30 days have been answered successfully and fast enough?",
"format": "percentunit",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": "1m",
"legend": {
"alignAsTable": true,
"rightSide": true
},
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "apiserver_request:availability30d{verb=\"read\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Read Availability (30d)",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many read requests (LIST,GET) per second do the apiservers get by code?",
"fill": 10,
"fillGradient": 0,
"gridPos": {
},
"id": 6,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "/2../i",
"color": "#56A64B"
},
{
"alias": "/3../i",
"color": "#F2CC0C"
},
{
"alias": "/4../i",
"color": "#3274D9"
},
{
"alias": "/5../i",
"color": "#E02F44"
}
],
"spaceLength": 10,
"span": 3,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (code) (code_resource:apiserver_request_total:rate5m{verb=\"read\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ code }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Read SLI - Requests",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
@ -235,21 +454,25 @@ data:
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many percent of read requests (LIST,GET) per second are returned with errors (5xx)?",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 4,
"id": 7,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": false,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": false
},
"lines": true,
"linewidth": 1,
@ -266,15 +489,15 @@ data:
],
"spaceLength": 10,
"span": 5,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (verb, le))",
"expr": "sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"read\",code=~\"5..\", cluster=\"$cluster\"}) / sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"read\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{verb}}",
"legendFormat": "{{ resource }}",
"refId": "A"
}
],
@ -283,7 +506,505 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "Request duration 99th quantile",
"title": "Read SLI - Errors",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many seconds is the 99th percentile for reading (LIST|GET) a given resource?",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 8,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{verb=\"read\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ resource }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Read SLI - Duration",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"decimals": 3,
"description": "How many percent of write requests (POST|PUT|PATCH|DELETE) in 30 days have been answered successfully and fast enough?",
"format": "percentunit",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 9,
"interval": "1m",
"legend": {
"alignAsTable": true,
"rightSide": true
},
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "apiserver_request:availability30d{verb=\"write\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Write Availability (30d)",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many write requests (POST|PUT|PATCH|DELETE) per second do the apiservers get by code?",
"fill": 10,
"fillGradient": 0,
"gridPos": {
},
"id": 10,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "/2../i",
"color": "#56A64B"
},
{
"alias": "/3../i",
"color": "#F2CC0C"
},
{
"alias": "/4../i",
"color": "#3274D9"
},
{
"alias": "/5../i",
"color": "#E02F44"
}
],
"spaceLength": 10,
"span": 3,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (code) (code_resource:apiserver_request_total:rate5m{verb=\"write\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ code }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Write SLI - Requests",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many percent of write requests (POST|PUT|PATCH|DELETE) per second are returned with errors (5xx)?",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 11,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"write\",code=~\"5..\", cluster=\"$cluster\"}) / sum by (resource) (code_resource:apiserver_request_total:rate5m{verb=\"write\", cluster=\"$cluster\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ resource }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Write SLI - Errors",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"description": "How many seconds is the 99th percentile for writing (POST|PUT|PATCH|DELETE) a given resource?",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 12,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{verb=\"write\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ resource }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Write SLI - Duration",
"tooltip": {
"shared": false,
"sort": 0,
@ -340,18 +1061,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 5,
"id": 13,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": false,
"sideWidth": null,
"total": false,
"values": false
},
@ -375,7 +1099,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(workqueue_adds_total{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (instance, name)",
"expr": "sum(rate(workqueue_adds_total{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}[$__rate_interval])) by (instance, name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}",
@ -431,18 +1155,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 6,
"id": 14,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": false,
"sideWidth": null,
"total": false,
"values": false
},
@ -466,7 +1193,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(workqueue_depth{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (instance, name)",
"expr": "sum(rate(workqueue_depth{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}[$__rate_interval])) by (instance, name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}",
@ -522,20 +1249,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 7,
"id": 15,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -557,7 +1287,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\"}[5m])) by (instance, name, le))",
"expr": "histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job=\"apiserver\", instance=~\"$instance\", cluster=\"$cluster\"}[$__rate_interval])) by (instance, name, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}",
@ -626,18 +1356,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 8,
"id": 16,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -661,307 +1394,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "etcd_helper_cache_entry_total{job=\"apiserver\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ETCD Cache Entry Total",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(etcd_helper_cache_hit_total{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (intance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} hit",
"refId": "A"
},
{
"expr": "sum(rate(etcd_helper_cache_miss_total{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} miss",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ETCD Cache Hit/Miss Rate",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "ops",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99,sum(rate(etcd_request_cache_get_duration_seconds_bucket{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} get",
"refId": "A"
},
{
"expr": "histogram_quantile(0.99,sum(rate(etcd_request_cache_add_duration_seconds_bucket{job=\"apiserver\",instance=~\"$instance\"}[5m])) by (instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} miss",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "ETCD Cache Duration 99th Quantile",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes{job=\"apiserver\",instance=~\"$instance\"}",
"expr": "process_resident_memory_bytes{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -1017,18 +1450,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 12,
"id": 17,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -1052,7 +1488,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(process_cpu_seconds_total{job=\"apiserver\",instance=~\"$instance\"}[5m])",
"expr": "rate(process_cpu_seconds_total{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}[$__rate_interval])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -1108,18 +1544,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 13,
"id": 18,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -1143,7 +1582,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "go_goroutines{job=\"apiserver\",instance=~\"$instance\"}",
"expr": "go_goroutines{job=\"apiserver\",instance=~\"$instance\", cluster=\"$cluster\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -1209,11 +1648,11 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@ -1227,6 +1666,32 @@ data:
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(up{job=\"apiserver\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
@ -1237,10 +1702,10 @@ data:
"options": [
],
"query": "label_values(apiserver_request_total{job=\"apiserver\"}, instance)",
"query": "label_values(up{job=\"apiserver\", cluster=\"$cluster\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -1280,7 +1745,7 @@ data:
"30d"
]
},
"timezone": "",
"timezone": "UTC",
"title": "Kubernetes / API server",
"uid": "09ec8aa1e996d6ffcd6817bbaff4db1b",
"version": 0
@ -1306,7 +1771,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -1334,7 +1799,11 @@ data:
},
"id": 2,
"interval": null,
"interval": "1m",
"legend": {
"alignAsTable": true,
"rightSide": true
},
"links": [
],
@ -1373,7 +1842,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(up{job=\"kube-controller-manager\"})",
"expr": "sum(up{cluster=\"$cluster\", job=\"kube-controller-manager\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -1405,20 +1874,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 3,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1440,10 +1912,10 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(workqueue_adds_total{job=\"kube-controller-manager\", instance=~\"$instance\"}[5m])) by (instance, name)",
"expr": "sum(rate(workqueue_adds_total{cluster=\"$cluster\", job=\"kube-controller-manager\", instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance, name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}",
"legendFormat": "{{cluster}} {{instance}} {{name}}",
"refId": "A"
}
],
@ -1509,20 +1981,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 4,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1544,10 +2019,10 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(workqueue_depth{job=\"kube-controller-manager\", instance=~\"$instance\"}[5m])) by (instance, name)",
"expr": "sum(rate(workqueue_depth{cluster=\"$cluster\", job=\"kube-controller-manager\", instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance, name)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}",
"legendFormat": "{{cluster}} {{instance}} {{name}}",
"refId": "A"
}
],
@ -1613,20 +2088,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 5,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1648,10 +2126,10 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\"}[5m])) by (instance, name, le))",
"expr": "histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-controller-manager\", instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance, name, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} {{name}}",
"legendFormat": "{{cluster}} {{instance}} {{name}}",
"refId": "A"
}
],
@ -1717,18 +2195,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 6,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -1752,28 +2233,28 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"2..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"2..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "2xx",
"refId": "A"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"3..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"3..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "3xx",
"refId": "B"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"4..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"4..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "4xx",
"refId": "C"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"5..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"5..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5xx",
@ -1829,18 +2310,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 7,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -1864,7 +2348,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"POST\"}[5m])) by (verb, url, le))",
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"POST\"}[$__rate_interval])) by (verb, url, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}",
@ -1933,20 +2417,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 8,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -1968,7 +2455,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))",
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"GET\"}[$__rate_interval])) by (verb, url, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}",
@ -2037,18 +2524,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 9,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -2072,7 +2562,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes{job=\"kube-controller-manager\",instance=~\"$instance\"}",
"expr": "process_resident_memory_bytes{cluster=\"$cluster\", job=\"kube-controller-manager\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -2128,18 +2618,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 10,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -2163,7 +2656,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(process_cpu_seconds_total{job=\"kube-controller-manager\",instance=~\"$instance\"}[5m])",
"expr": "rate(process_cpu_seconds_total{cluster=\"$cluster\", job=\"kube-controller-manager\",instance=~\"$instance\"}[$__rate_interval])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -2219,18 +2712,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 11,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -2254,7 +2750,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "go_goroutines{job=\"kube-controller-manager\",instance=~\"$instance\"}",
"expr": "go_goroutines{cluster=\"$cluster\", job=\"kube-controller-manager\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -2320,11 +2816,11 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@ -2338,6 +2834,32 @@ data:
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(up{job=\"kube-controller-manager\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
@ -2348,10 +2870,10 @@ data:
"options": [
],
"query": "label_values(process_cpu_seconds_total{job=\"kube-controller-manager\"}, instance)",
"query": "label_values(up{cluster=\"$cluster\", job=\"kube-controller-manager\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -2391,7 +2913,7 @@ data:
"30d"
]
},
"timezone": "",
"timezone": "UTC",
"title": "Kubernetes / Controller Manager",
"uid": "72e0e05bef5099e5f049b05fdc429ed4",
"version": 0
@ -2417,7 +2939,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -2432,18 +2954,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 2,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
@ -2467,14 +2992,14 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "(\n sum without(instance, node) (kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n -\n sum without(instance, node) (kubelet_volume_stats_available_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n)\n",
"expr": "(\n sum without(instance, node) (topk(1, (kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})))\n -\n sum without(instance, node) (topk(1, (kubelet_volume_stats_available_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})))\n)\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Used Space",
"refId": "A"
},
{
"expr": "sum without(instance, node) (kubelet_volume_stats_available_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n",
"expr": "sum without(instance, node) (topk(1, (kubelet_volume_stats_available_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Free Space",
@ -2543,7 +3068,11 @@ data:
},
"id": 3,
"interval": null,
"interval": "1m",
"legend": {
"alignAsTable": true,
"rightSide": true
},
"links": [
],
@ -2582,7 +3111,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "(\n kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"}\n -\n kubelet_volume_stats_available_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"}\n)\n/\nkubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"}\n* 100\n",
"expr": "max without(instance,node) (\n(\n topk(1, kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n -\n topk(1, kubelet_volume_stats_available_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n)\n/\ntopk(1, kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n* 100)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -2627,18 +3156,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 4,
"interval": "1m",
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": true,
"min": true,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": true
},
@ -2662,14 +3194,14 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum without(instance, node) (kubelet_volume_stats_inodes_used{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n",
"expr": "sum without(instance, node) (topk(1, (kubelet_volume_stats_inodes_used{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})))\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Used inodes",
"refId": "A"
},
{
"expr": "(\n sum without(instance, node) (kubelet_volume_stats_inodes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n -\n sum without(instance, node) (kubelet_volume_stats_inodes_used{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n)\n",
"expr": "(\n sum without(instance, node) (topk(1, (kubelet_volume_stats_inodes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})))\n -\n sum without(instance, node) (topk(1, (kubelet_volume_stats_inodes_used{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})))\n)\n",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": " Free inodes",
@ -2738,7 +3270,11 @@ data:
},
"id": 5,
"interval": null,
"interval": "1m",
"legend": {
"alignAsTable": true,
"rightSide": true
},
"links": [
],
@ -2777,7 +3313,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "kubelet_volume_stats_inodes_used{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"}\n/\nkubelet_volume_stats_inodes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"}\n* 100\n",
"expr": "max without(instance,node) (\ntopk(1, kubelet_volume_stats_inodes_used{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n/\ntopk(1, kubelet_volume_stats_inodes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\", persistentvolumeclaim=\"$volume\"})\n* 100)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -2819,11 +3355,11 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@ -2847,10 +3383,10 @@ data:
"options": [
],
"query": "label_values(kubelet_volume_stats_capacity_bytes, cluster)",
"query": "label_values(kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -2876,7 +3412,7 @@ data:
"query": "label_values(kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -2902,7 +3438,7 @@ data:
"query": "label_values(kubelet_volume_stats_capacity_bytes{cluster=\"$cluster\", job=\"kubelet\", namespace=\"$namespace\"}, persistentvolumeclaim)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -2942,669 +3478,11 @@ data:
"30d"
]
},
"timezone": "",
"timezone": "UTC",
"title": "Kubernetes / Persistent Volumes",
"uid": "919b92a8e8041bd567af9edab12c840c",
"version": 0
}
pods.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "$datasource",
"enable": true,
"expr": "time() == BOOL timestamp(rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[2m]) > 0)",
"hide": false,
"iconColor": "rgba(215, 44, 44, 1)",
"name": "Restarts",
"showIn": 0,
"tags": [
"restart"
],
"type": "rows"
}
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(container) (container_memory_usage_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container=~\"$container\", container!=\"POD\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Current: {{ container }}",
"refId": "A"
},
{
"expr": "sum by(container) (kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Requested: {{ container }}",
"refId": "B"
},
{
"expr": "sum by(container) (kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"memory\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Limit: {{ container }}",
"refId": "C"
},
{
"expr": "sum by(container) (container_memory_cache{job=\"kubernetes-cadvisor\", namespace=\"$namespace\", pod=~\"$pod\", container=~\"$container\", container!=\"POD\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Cache: {{ container }}",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by (container) (irate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", image!=\"\", pod=\"$pod\", container=~\"$container\", container!=\"POD\"}[4m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Current: {{ container }}",
"refId": "A"
},
{
"expr": "sum by(container) (kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Requested: {{ container }}",
"refId": "B"
},
{
"expr": "sum by(container) (kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", resource=\"cpu\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Limit: {{ container }}",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sort_desc(sum by (pod) (irate(container_network_receive_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[4m])))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "RX: {{ pod }}",
"refId": "A"
},
{
"expr": "sort_desc(sum by (pod) (irate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}[4m])))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "TX: {{ pod }}",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network I/O",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 5,
"legend": {
"alignAsTable": true,
"avg": true,
"current": true,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by (container) (kube_pod_container_status_restarts_total{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Restarts: {{ container }}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Total Restarts Per Container",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"multi": false,
"name": "namespace",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\"}, namespace)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Pod",
"multi": false,
"name": "pod",
"options": [
],
"query": "label_values(kube_pod_info{cluster=\"$cluster\", namespace=~\"$namespace\"}, pod)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "Container",
"multi": false,
"name": "container",
"options": [
],
"query": "label_values(kube_pod_container_info{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"}, container)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Pods",
"uid": "ab4f13a9892a76a4d21ce8c2445bf4ea",
"version": 0
}
scheduler.json: |-
{
"__inputs": [
@ -3626,7 +3504,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -3654,7 +3532,11 @@ data:
},
"id": 2,
"interval": null,
"interval": "1m",
"legend": {
"alignAsTable": true,
"rightSide": true
},
"links": [
],
@ -3693,7 +3575,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(up{job=\"kube-scheduler\"})",
"expr": "sum(up{cluster=\"$cluster\", job=\"kube-scheduler\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -3725,20 +3607,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 3,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -3760,31 +3645,31 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(scheduler_e2e_scheduling_duration_seconds_count{job=\"kube-scheduler\", instance=~\"$instance\"}[5m])) by (instance)",
"expr": "sum(rate(scheduler_e2e_scheduling_duration_seconds_count{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} e2e",
"legendFormat": "{{cluster}} {{instance}} e2e",
"refId": "A"
},
{
"expr": "sum(rate(scheduler_binding_duration_seconds_count{job=\"kube-scheduler\", instance=~\"$instance\"}[5m])) by (instance)",
"expr": "sum(rate(scheduler_binding_duration_seconds_count{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} binding",
"legendFormat": "{{cluster}} {{instance}} binding",
"refId": "B"
},
{
"expr": "sum(rate(scheduler_scheduling_algorithm_duration_seconds_count{job=\"kube-scheduler\", instance=~\"$instance\"}[5m])) by (instance)",
"expr": "sum(rate(scheduler_scheduling_algorithm_duration_seconds_count{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} scheduling algorithm",
"legendFormat": "{{cluster}} {{instance}} scheduling algorithm",
"refId": "C"
},
{
"expr": "sum(rate(scheduler_volume_scheduling_duration_seconds_count{job=\"kube-scheduler\", instance=~\"$instance\"}[5m])) by (instance)",
"expr": "sum(rate(scheduler_volume_scheduling_duration_seconds_count{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} volume",
"legendFormat": "{{cluster}} {{instance}} volume",
"refId": "D"
}
],
@ -3837,20 +3722,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 4,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -3872,31 +3760,31 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(scheduler_e2e_scheduling_duration_seconds_bucket{job=\"kube-scheduler\",instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(scheduler_e2e_scheduling_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-scheduler\",instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} e2e",
"legendFormat": "{{cluster}} {{instance}} e2e",
"refId": "A"
},
{
"expr": "histogram_quantile(0.99, sum(rate(scheduler_binding_duration_seconds_bucket{job=\"kube-scheduler\",instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(scheduler_binding_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-scheduler\",instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} binding",
"legendFormat": "{{cluster}} {{instance}} binding",
"refId": "B"
},
{
"expr": "histogram_quantile(0.99, sum(rate(scheduler_scheduling_algorithm_duration_seconds_bucket{job=\"kube-scheduler\",instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(scheduler_scheduling_algorithm_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-scheduler\",instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} scheduling algorithm",
"legendFormat": "{{cluster}} {{instance}} scheduling algorithm",
"refId": "C"
},
{
"expr": "histogram_quantile(0.99, sum(rate(scheduler_volume_scheduling_duration_seconds_bucket{job=\"kube-scheduler\",instance=~\"$instance\"}[5m])) by (instance, le))",
"expr": "histogram_quantile(0.99, sum(rate(scheduler_volume_scheduling_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-scheduler\",instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}} volume",
"legendFormat": "{{cluster}} {{instance}} volume",
"refId": "D"
}
],
@ -3962,18 +3850,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 5,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -3997,28 +3888,28 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-scheduler\", instance=~\"$instance\",code=~\"2..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\",code=~\"2..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "2xx",
"refId": "A"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-scheduler\", instance=~\"$instance\",code=~\"3..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\",code=~\"3..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "3xx",
"refId": "B"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-scheduler\", instance=~\"$instance\",code=~\"4..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\",code=~\"4..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "4xx",
"refId": "C"
},
{
"expr": "sum(rate(rest_client_requests_total{job=\"kube-scheduler\", instance=~\"$instance\",code=~\"5..\"}[5m]))",
"expr": "sum(rate(rest_client_requests_total{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\",code=~\"5..\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5xx",
@ -4074,18 +3965,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 6,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -4109,7 +4003,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-scheduler\", instance=~\"$instance\", verb=\"POST\"}[5m])) by (verb, url, le))",
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\", verb=\"POST\"}[$__rate_interval])) by (verb, url, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}",
@ -4178,20 +4072,23 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 7,
"interval": "1m",
"legend": {
"alignAsTable": "true",
"alignAsTable": true,
"avg": false,
"current": "true",
"current": true,
"max": false,
"min": false,
"rightSide": "true",
"show": "true",
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": "true"
"values": true
},
"lines": true,
"linewidth": 1,
@ -4213,7 +4110,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_latency_seconds_bucket{job=\"kube-scheduler\", instance=~\"$instance\", verb=\"GET\"}[5m])) by (verb, url, le))",
"expr": "histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\", verb=\"GET\"}[$__rate_interval])) by (verb, url, le))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{verb}} {{url}}",
@ -4282,18 +4179,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 8,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -4317,7 +4217,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "process_resident_memory_bytes{job=\"kube-scheduler\", instance=~\"$instance\"}",
"expr": "process_resident_memory_bytes{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -4373,18 +4273,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 9,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -4408,7 +4311,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(process_cpu_seconds_total{job=\"kube-scheduler\", instance=~\"$instance\"}[5m])",
"expr": "rate(process_cpu_seconds_total{cluster=\"$cluster\", job=\"kube-scheduler\", instance=~\"$instance\"}[$__rate_interval])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -4464,18 +4367,21 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 10,
"interval": "1m",
"legend": {
"alignAsTable": false,
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"rightSide": true,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -4499,7 +4405,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "go_goroutines{job=\"kube-scheduler\",instance=~\"$instance\"}",
"expr": "go_goroutines{cluster=\"$cluster\", job=\"kube-scheduler\",instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{instance}}",
@ -4565,863 +4471,11 @@ data:
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": null,
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(process_cpu_seconds_total{job=\"kube-scheduler\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Kubernetes / Scheduler",
"uid": "2e6b6a3b4bddf1427b3a55aa1311c656",
"version": 0
}
statefulset.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 2,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "cores",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$statefulset.*\"}[3m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "CPU",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 3,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "GB",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$statefulset.*\"}) / 1024^3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Memory",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 4,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "Bps",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 4,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(container_network_transmit_bytes_total{job=\"kubernetes-cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$statefulset.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\",pod=~\"$statefulset.*\"}[3m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Network",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"height": "100px",
"panels": [
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_replicas{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Desired Replicas",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 6,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "min(kube_statefulset_status_replicas_current{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Replicas of current version",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 7,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_status_observed_generation{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", statefulset=\"$statefulset\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Observed Generation",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "$datasource",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 8,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "max(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "",
"title": "Metadata Generation",
"tooltip": {
"shared": false
},
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "0",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(kube_statefulset_replicas{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas specified",
"refId": "A"
},
{
"expr": "max(kube_statefulset_status_replicas{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas created",
"refId": "B"
},
{
"expr": "min(kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "ready",
"refId": "C"
},
{
"expr": "min(kube_statefulset_status_replicas_current{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "replicas of current version",
"refId": "D"
},
{
"expr": "min(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\", statefulset=\"$statefulset\", cluster=\"$cluster\", namespace=\"$namespace\"}) without (instance, pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "updated",
"refId": "E"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Replicas",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"kubernetes-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@ -5445,10 +4499,10 @@ data:
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation, cluster)",
"query": "label_values(up{job=\"kube-scheduler\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -5464,43 +4518,17 @@ data:
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Namespace",
"includeAll": true,
"label": null,
"multi": false,
"name": "namespace",
"name": "instance",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\"}, namespace)",
"query": "label_values(up{job=\"kube-scheduler\", cluster=\"$cluster\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "Name",
"multi": false,
"name": "statefulset",
"options": [
],
"query": "label_values(kube_statefulset_metadata_generation{job=\"kube-state-metrics\", namespace=\"$namespace\"}, statefulset)",
"refresh": 2,
"regex": "",
"sort": 0,
"sort": 1,
"tagValuesQuery": "",
"tags": [
@ -5540,8 +4568,12 @@ data:
"30d"
]
},
"timezone": "",
"title": "Kubernetes / StatefulSets",
"uid": "a31c1f46e6f727cb37c0d731a7245005",
"timezone": "UTC",
"title": "Kubernetes / Scheduler",
"uid": "2e6b6a3b4bddf1427b3a55aa1311c656",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-k8s
namespace: monitoring

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-nginx-ingress
namespace: monitoring
data:
nginx.json: |-
{
@ -26,7 +22,7 @@ data:
"links": [
],
"refresh": "",
"refresh": "10s",
"rows": [
{
"collapse": false,
@ -94,7 +90,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "round(sum(irate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",namespace=~\"$namespace\"}[2m])), 0.01)",
"expr": "round(sum(irate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\", controller_namespace=~\"$namespace\"}[2m])), 0.01)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -176,7 +172,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(avg_over_time(nginx_ingress_controller_nginx_process_connections{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"expr": "sum(avg_over_time(nginx_ingress_controller_nginx_process_connections{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",state=\"active\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -258,7 +254,7 @@ data:
"tableColumn": "",
"targets": [
{
"expr": "sum(rate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",namespace=~\"$namespace\",status!~\"[4-5].*\"}[2m])) / sum(rate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",namespace=~\"$namespace\"}[2m]))",
"expr": "sum(rate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",status!~\"[4-5].*\"}[2m])) / sum(rate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
@ -300,6 +296,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -312,6 +309,7 @@ data:
"min": false,
"rightSide": "true",
"show": "true",
"sideWidth": null,
"total": false,
"values": "true"
},
@ -335,7 +333,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "round(sum(irate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (ingress), 0.01)",
"expr": "round(sum(irate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (ingress), 0.01)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ingress}}",
@ -391,6 +389,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -403,6 +402,7 @@ data:
"min": false,
"rightSide": "true",
"show": "true",
"sideWidth": null,
"total": false,
"values": "true"
},
@ -426,7 +426,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",namespace=~\"$namespace\",ingress=~\"$ingress\",status!~\"[4-5].*\"}[2m])) by (ingress) / sum(rate(nginx_ingress_controller_requests{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (ingress)",
"expr": "sum(rate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\",status!~\"[4-5].*\"}[2m])) by (ingress) / sum(rate(nginx_ingress_controller_requests{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (ingress)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ingress}}",
@ -495,6 +495,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -507,6 +508,7 @@ data:
"min": false,
"rightSide": "true",
"show": "true",
"sideWidth": null,
"total": false,
"values": "true"
},
@ -530,21 +532,21 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"expr": "histogram_quantile(0.99, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{cluster=~\"$cluster\", ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ingress}} 99%",
"refId": "A"
},
{
"expr": "histogram_quantile(0.90, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"expr": "histogram_quantile(0.90, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{cluster=~\"$cluster\", ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ingress}} 90%",
"refId": "B"
},
{
"expr": "histogram_quantile(0.50, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"expr": "histogram_quantile(0.50, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{cluster=~\"$cluster\", ingress!=\"\",controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\",ingress=~\"$ingress\"}[2m])) by (le, ingress))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ingress}} 50%",
@ -613,6 +615,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -625,6 +628,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": "true"
},
@ -648,14 +652,14 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum (irate (nginx_ingress_controller_request_size_sum{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"expr": "sum (irate (nginx_ingress_controller_request_size_sum{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "received",
"refId": "A"
},
{
"expr": "sum (irate (nginx_ingress_controller_response_size_sum{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"expr": "sum (irate (nginx_ingress_controller_response_size_sum{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "sent",
@ -711,6 +715,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -723,6 +728,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": "true"
},
@ -746,7 +752,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "avg(nginx_ingress_controller_nginx_process_resident_memory_bytes{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}) by (controller_pod)",
"expr": "avg(nginx_ingress_controller_nginx_process_resident_memory_bytes{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}) by (controller_pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{controller_pod}}",
@ -802,6 +808,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
@ -814,6 +821,7 @@ data:
"min": false,
"rightSide": false,
"show": "true",
"sideWidth": null,
"total": false,
"values": "true"
},
@ -837,7 +845,7 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(nginx_ingress_controller_nginx_process_cpu_seconds_total{controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m])) by (controller_pod)",
"expr": "sum(rate(nginx_ingress_controller_nginx_process_cpu_seconds_total{cluster=~\"$cluster\", controller_pod=~\"$controller\",controller_class=~\"$controller_class\",controller_namespace=~\"$namespace\"}[2m])) by (controller_pod)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{controller_pod}}",
@ -921,6 +929,32 @@ data:
"allValue": ".*",
"current": {
},
"datasource": "$datasource",
"hide": 2,
"includeAll": true,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(kube_pod_info, cluster)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".*",
"current": {
},
"datasource": "$datasource",
"hide": 0,
@ -931,7 +965,7 @@ data:
"options": [
],
"query": "label_values(nginx_ingress_controller_config_hash, controller_namespace)",
"query": "label_values(nginx_ingress_controller_config_hash{cluster=~\"$cluster\"}, controller_namespace)",
"refresh": 2,
"regex": "",
"sort": 0,
@ -957,7 +991,7 @@ data:
"options": [
],
"query": "label_values(nginx_ingress_controller_config_hash{namespace=~\"$namespace\"}, controller_class)",
"query": "label_values(nginx_ingress_controller_config_hash{cluster=~\"$cluster\", namespace=~\"$namespace\"}, controller_class)",
"refresh": 2,
"regex": "",
"sort": 0,
@ -983,7 +1017,7 @@ data:
"options": [
],
"query": "label_values(nginx_ingress_controller_config_hash{namespace=~\"$namespace\",controller_class=~\"$controller_class\"}, controller_pod)",
"query": "label_values(nginx_ingress_controller_config_hash{cluster=~\"$cluster\", namespace=~\"$namespace\", controller_class=~\"$controller_class\"}, controller_pod)",
"refresh": 2,
"regex": "",
"sort": 0,
@ -1009,7 +1043,7 @@ data:
"options": [
],
"query": "label_values(nginx_ingress_controller_requests{namespace=~\"$namespace\",controller_class=~\"$controller_class\",controller=~\"$controller\"}, ingress)",
"query": "label_values(nginx_ingress_controller_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", controller_class=~\"$controller_class\", controller=~\"$controller\"}, ingress)",
"refresh": 2,
"regex": "",
"sort": 0,
@ -1052,7 +1086,12 @@ data:
"30d"
]
},
"timezone": "browser",
"timezone": "",
"title": "Nginx Ingress Controller",
"uid": "f4af03eca476c08ecf2b5cf15fd60168",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-nginx-ingress
namespace: monitoring

View File

@ -0,0 +1,976 @@
apiVersion: v1
data:
nodes.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "30s",
"rows": [
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(\n (1 - sum without (mode) (rate(node_cpu_seconds_total{job=\"node-exporter\", mode=~\"idle|iowait|steal\", instance=\"$instance\"}[$__rate_interval])))\n/ ignoring(cpu) group_left\n count without (cpu, mode) (node_cpu_seconds_total{job=\"node-exporter\", mode=\"idle\", instance=\"$instance\"})\n)\n",
"format": "time_series",
"intervalFactor": 5,
"legendFormat": "{{cpu}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "CPU Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
},
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": 1,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"fillGradient": 0,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "node_load1{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "1m load average",
"refId": "A"
},
{
"expr": "node_load5{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "5m load average",
"refId": "B"
},
{
"expr": "node_load15{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "15m load average",
"refId": "C"
},
{
"expr": "count(node_cpu_seconds_total{job=\"node-exporter\", instance=\"$instance\", mode=\"idle\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "logical cores",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Load Average",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 9,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "(\n node_memory_MemTotal_bytes{job=\"node-exporter\", instance=\"$instance\"}\n-\n node_memory_MemFree_bytes{job=\"node-exporter\", instance=\"$instance\"}\n-\n node_memory_Buffers_bytes{job=\"node-exporter\", instance=\"$instance\"}\n-\n node_memory_Cached_bytes{job=\"node-exporter\", instance=\"$instance\"}\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory used",
"refId": "A"
},
{
"expr": "node_memory_Buffers_bytes{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory buffers",
"refId": "B"
},
{
"expr": "node_memory_Cached_bytes{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory cached",
"refId": "C"
},
{
"expr": "node_memory_MemFree_bytes{job=\"node-exporter\", instance=\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "memory free",
"refId": "D"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Memory Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(50, 172, 45, 0.97)",
"rgba(237, 129, 40, 0.89)",
"rgba(245, 54, 54, 0.9)"
],
"datasource": "$datasource",
"format": "percent",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": true,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
},
"id": 5,
"interval": null,
"links": [
],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "100 -\n(\n avg(node_memory_MemAvailable_bytes{job=\"node-exporter\", instance=\"$instance\"})\n/\n avg(node_memory_MemTotal_bytes{job=\"node-exporter\", instance=\"$instance\"})\n* 100\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": "80, 90",
"title": "Memory Usage",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"fillGradient": 0,
"gridPos": {
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "/ read| written/",
"yaxis": 1
},
{
"alias": "/ io time/",
"yaxis": 2
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_disk_read_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!~\"dm.*\"}[$__rate_interval])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}} read",
"refId": "A"
},
{
"expr": "rate(node_disk_written_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!~\"dm.*\"}[$__rate_interval])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}} written",
"refId": "B"
},
{
"expr": "rate(node_disk_io_time_seconds_total{job=\"node-exporter\", instance=\"$instance\", device!~\"dm.*\"}[$__rate_interval])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}} io time",
"refId": "C"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk I/O",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 7,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
{
"alias": "used",
"color": "#E0B400"
},
{
"alias": "available",
"color": "#73BF69"
}
],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum(\n max by (device) (\n node_filesystem_size_bytes{job=\"node-exporter\", instance=\"$instance\", fstype!~\"tmpfs|nsfs|vfat\"}\n -\n node_filesystem_avail_bytes{job=\"node-exporter\", instance=\"$instance\", fstype!~\"tmpfs|nsfs|vfat\"}\n )\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "used",
"refId": "A"
},
{
"expr": "sum(\n max by (device) (\n node_filesystem_avail_bytes{job=\"node-exporter\", instance=\"$instance\", fstype!~\"tmpfs|nsfs|vfat\"}\n )\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "available",
"refId": "B"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Disk Space Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"fillGradient": 0,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_receive_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!=\"lo\"}[$__rate_interval])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Received",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"fillGradient": 0,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(node_network_transmit_bytes_total{job=\"node-exporter\", instance=\"$instance\", device!=\"lo\"}[$__rate_interval])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{device}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Network Transmitted",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": false,
"title": "Dashboard Row",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"node-exporter-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [
],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(node_exporter_build_info{job=\"node-exporter\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Node Exporter / Nodes",
"uid": "fa49a4706d07a042595b664c87fb33ea",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-node-exporter
namespace: monitoring

View File

@ -1,11 +1,13 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-prom
namespace: monitoring
data:
prometheus-remote-write.json: |-
{
"__inputs": [
],
"__requires": [
],
"annotations": {
"list": [
@ -15,14 +17,15 @@ data:
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [
],
"refresh": "10s",
"refresh": "60s",
"rows": [
{
"collapse": false,
"height": "250px",
"collapsed": false,
"panels": [
{
"aliasColors": {
@ -33,13 +36,20 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 2,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -48,11 +58,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -62,12 +73,11 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"} - ignoring(queue) group_right(instance) prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}",
"expr": "(\n prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"} \n- \n ignoring(remote_name, url) group_right(instance) (prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"} != 0)\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
@ -93,11 +103,11 @@ data:
},
"yaxes": [
{
"format": "s",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -106,7 +116,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
@ -119,13 +129,20 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"fillGradient": 0,
"gridPos": {
},
"id": 3,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -134,11 +151,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -148,12 +166,11 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) - ignoring (queue) group_right(instance) rate(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"expr": "clamp_min(\n rate(prometheus_remote_storage_highest_timestamp_in_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) \n- \n ignoring (remote_name, url) group_right(instance) rate(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n, 0)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
@ -183,7 +200,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -192,7 +209,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
}
@ -202,11 +219,12 @@ data:
"repeatRowId": null,
"showTitle": true,
"title": "Timestamps",
"titleSize": "h6"
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"height": "250px",
"collapsed": false,
"panels": [
{
"aliasColors": {
@ -217,13 +235,20 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 3,
"fillGradient": 0,
"gridPos": {
},
"id": 4,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -232,11 +257,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -246,12 +272,11 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(prometheus_remote_storage_samples_in_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])- ignoring(queue) group_right(instance) rate(prometheus_remote_storage_succeeded_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) - rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"expr": "rate(\n prometheus_remote_storage_samples_in_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])\n- \n ignoring(remote_name, url) group_right(instance) (rate(prometheus_remote_storage_succeeded_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) or rate(prometheus_remote_storage_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]))\n- \n (rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) or rate(prometheus_remote_storage_samples_dropped_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]))\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
@ -281,7 +306,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -290,7 +315,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
}
@ -300,11 +325,12 @@ data:
"repeatRowId": null,
"showTitle": true,
"title": "Samples",
"titleSize": "h6"
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"height": "250px",
"collapsed": false,
"panels": [
{
"aliasColors": {
@ -315,13 +341,20 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"fillGradient": 0,
"gridPos": {
},
"id": 5,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -330,16 +363,18 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"minSpan": 6,
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
@ -347,9 +382,8 @@ data:
"expr": "prometheus_remote_storage_shards{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
@ -357,7 +391,7 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "Num. Shards",
"title": "Current Shards",
"tooltip": {
"shared": true,
"sort": 0,
@ -379,7 +413,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -388,7 +422,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
@ -401,13 +435,20 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"fillGradient": 0,
"gridPos": {
},
"id": 6,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -416,26 +457,26 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_shard_capacity{cluster=~\"$cluster\", instance=~\"$instance\"}",
"expr": "prometheus_remote_storage_shards_max{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
@ -443,7 +484,7 @@ data:
],
"timeFrom": null,
"timeShift": null,
"title": "Capacity",
"title": "Max Shards",
"tooltip": {
"shared": true,
"sort": 0,
@ -465,7 +506,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -474,7 +515,193 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 7,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_shards_min{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Min Shards",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_shards_desired{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Desired Shards",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
@ -484,11 +711,12 @@ data:
"repeatRowId": null,
"showTitle": true,
"title": "Shards",
"titleSize": "h6"
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"height": "250px",
"collapsed": false,
"panels": [
{
"aliasColors": {
@ -499,13 +727,20 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 6,
"fillGradient": 0,
"gridPos": {
},
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -514,11 +749,410 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_shard_capacity{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Shard Capacity",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 10,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_remote_storage_pending_samples{cluster=~\"$cluster\", instance=~\"$instance\"} or prometheus_remote_storage_samples_pending{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Pending Samples",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Shard Details",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_tsdb_wal_segment_current{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "TSDB Current Segment",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "none",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "prometheus_wal_watcher_current_segment{cluster=~\"$cluster\", instance=~\"$instance\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}} {{consumer}}",
"refId": "A"
}
],
"thresholds": [
],
"timeFrom": null,
"timeShift": null,
"title": "Remote Write Current Segment",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [
]
},
"yaxes": [
{
"format": "none",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Segments",
"titleSize": "h6",
"type": "row"
},
{
"collapse": false,
"collapsed": false,
"panels": [
{
"aliasColors": {
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"fillGradient": 0,
"gridPos": {
},
"id": 13,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [
],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -528,12 +1162,11 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"expr": "rate(prometheus_remote_storage_dropped_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) or rate(prometheus_remote_storage_samples_dropped_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
@ -563,7 +1196,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -572,7 +1205,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
@ -585,13 +1218,20 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 7,
"fillGradient": 0,
"gridPos": {
},
"id": 14,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -600,11 +1240,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -614,12 +1255,11 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(prometheus_remote_storage_failed_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"expr": "rate(prometheus_remote_storage_failed_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) or rate(prometheus_remote_storage_samples_failed_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
@ -649,7 +1289,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -658,7 +1298,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
@ -671,13 +1311,20 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 8,
"fillGradient": 0,
"gridPos": {
},
"id": 15,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -686,11 +1333,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -700,12 +1348,11 @@ data:
"steppedLine": false,
"targets": [
{
"expr": "rate(prometheus_remote_storage_retried_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"expr": "rate(prometheus_remote_storage_retried_samples_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]) or rate(prometheus_remote_storage_samples_retried_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
@ -735,7 +1382,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -744,7 +1391,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
},
@ -757,13 +1404,20 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 9,
"fillGradient": 0,
"gridPos": {
},
"id": 16,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"sideWidth": null,
"total": false,
"values": false
},
@ -772,11 +1426,12 @@ data:
"links": [
],
"nullPointMode": "null as zero",
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"repeat": null,
"seriesOverrides": [
],
@ -789,9 +1444,8 @@ data:
"expr": "rate(prometheus_remote_storage_enqueue_retries_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{cluster}}:{{instance}}-{{queue}}",
"legendLink": null,
"step": 10
"legendFormat": "{{cluster}}:{{instance}} {{remote_name}}:{{url}}",
"refId": "A"
}
],
"thresholds": [
@ -821,7 +1475,7 @@ data:
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"min": null,
"show": true
},
{
@ -830,7 +1484,7 @@ data:
"logBase": 1,
"max": null,
"min": null,
"show": false
"show": true
}
]
}
@ -839,22 +1493,19 @@ data:
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Misc Rates.",
"titleSize": "h6"
"title": "Misc. Rates",
"titleSize": "h6",
"type": "row"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"prometheus-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"label": null,
"name": "datasource",
@ -869,23 +1520,30 @@ data:
{
"allValue": null,
"current": {
"selected": true,
"text": "All",
"value": "$__all"
"text": {
"selected": true,
"text": "All",
"value": "$__all"
},
"value": {
"selected": true,
"text": "All",
"value": "$__all"
}
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "instance",
"multi": true,
"name": "instance",
"label": null,
"multi": false,
"name": "cluster",
"options": [
],
"query": "label_values(prometheus_build_info, instance)",
"refresh": 1,
"query": "label_values(kube_pod_container_info{image=~\".*prometheus.*\"}, cluster)",
"refresh": 2,
"regex": "",
"sort": 2,
"sort": 0,
"tagValuesQuery": "",
"tags": [
@ -897,23 +1555,56 @@ data:
{
"allValue": null,
"current": {
"selected": true,
"text": "All",
"value": "$__all"
"text": {
"selected": true,
"text": "All",
"value": "$__all"
},
"value": {
"selected": true,
"text": "All",
"value": "$__all"
}
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"label": null,
"multi": false,
"name": "instance",
"options": [
],
"query": "label_values(kube_pod_container_info{image=~\".*prometheus.*\"}, cluster)",
"refresh": 1,
"query": "label_values(prometheus_build_info{cluster=~\"$cluster\"}, instance)",
"refresh": 2,
"regex": "",
"sort": 2,
"sort": 0,
"tagValuesQuery": "",
"tags": [
],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": null,
"multi": false,
"name": "url",
"options": [
],
"query": "label_values(prometheus_remote_storage_shards{cluster=~\"$cluster\", instance=~\"$instance\"}, url)",
"refresh": 2,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [
@ -925,7 +1616,7 @@ data:
]
},
"time": {
"from": "now-1h",
"from": "now-6h",
"to": "now"
},
"timepicker": {
@ -953,9 +1644,8 @@ data:
"30d"
]
},
"timezone": "utc",
"title": "Prometheus Remote Write",
"uid": "",
"timezone": "browser",
"title": "Prometheus / Remote Write",
"version": 0
}
prometheus.json: |-
@ -972,7 +1662,7 @@ data:
"links": [
],
"refresh": "10s",
"refresh": "60s",
"rows": [
{
"collapse": false,
@ -1030,6 +1720,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
@ -1048,6 +1739,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
@ -1066,6 +1758,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "instance",
@ -1084,6 +1777,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "job",
@ -1102,6 +1796,7 @@ data:
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "version",
@ -1155,7 +1850,7 @@ data:
"title": "Prometheus Stats",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
@ -1254,7 +1949,7 @@ data:
"title": "Target Sync",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@ -1340,7 +2035,7 @@ data:
"title": "Targets",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@ -1438,7 +2133,7 @@ data:
"title": "Average Scrape Interval Duration",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@ -1507,6 +2202,14 @@ data:
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (job) (rate(prometheus_target_scrapes_exceeded_body_size_limit_total[1m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "exceeded body size limit: {{job}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum by (job) (rate(prometheus_target_scrapes_exceeded_sample_limit_total[1m]))",
"format": "time_series",
@ -1548,7 +2251,7 @@ data:
"title": "Scrape failures",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@ -1634,7 +2337,7 @@ data:
"title": "Appended Samples",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@ -1732,7 +2435,7 @@ data:
"title": "Head Series",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@ -1818,7 +2521,7 @@ data:
"title": "Head Chunks",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@ -1916,7 +2619,7 @@ data:
"title": "Query Rate",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@ -2002,7 +2705,7 @@ data:
"title": "Stage Duration",
"tooltip": {
"shared": true,
"sort": 0,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
@ -2046,17 +2749,17 @@ data:
"schemaVersion": 14,
"style": "dark",
"tags": [
"prometheus-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "Prometheus",
"value": "Prometheus"
"text": "default",
"value": "default"
},
"hide": 0,
"label": null,
"label": "Data Source",
"name": "datasource",
"options": [
@ -2067,7 +2770,7 @@ data:
"type": "datasource"
},
{
"allValue": null,
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
@ -2082,7 +2785,7 @@ data:
"options": [
],
"query": "label_values(prometheus_build_info, job)",
"query": "label_values(prometheus_build_info{job=\"prometheus\"}, job)",
"refresh": 1,
"regex": "",
"sort": 2,
@ -2095,7 +2798,7 @@ data:
"useTags": false
},
{
"allValue": null,
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
@ -2110,7 +2813,7 @@ data:
"options": [
],
"query": "label_values(prometheus_build_info, instance)",
"query": "label_values(prometheus_build_info{job=~\"$job\"}, instance)",
"refresh": 1,
"regex": "",
"sort": 2,
@ -2154,7 +2857,11 @@ data:
]
},
"timezone": "utc",
"title": "Prometheus",
"title": "Prometheus / Overview",
"uid": "",
"version": 0
}
kind: ConfigMap
metadata:
name: grafana-dashboards-prom
namespace: monitoring

View File

@ -18,12 +18,13 @@ spec:
labels:
name: grafana
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: grafana
image: docker.io/grafana/grafana:6.2.5
image: docker.io/grafana/grafana:9.3.1
env:
- name: GF_PATHS_CONFIG
value: "/etc/grafana/custom.ini"
@ -31,15 +32,22 @@ spec:
- name: http
containerPort: 8080
livenessProbe:
httpGet:
path: /metrics
tcpSocket:
port: 8080
initialDelaySeconds: 10
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 5
successThreshold: 1
readinessProbe:
httpGet:
path: /api/health
scheme: HTTP
path: /robots.txt
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
resources:
requests:
cpu: 100m
@ -56,14 +64,20 @@ spec:
mountPath: /etc/grafana/provisioning/dashboards
- name: dashboards-etcd
mountPath: /etc/grafana/dashboards/etcd
- name: dashboards-node-exporter
mountPath: /etc/grafana/dashboards/node-exporter
- name: dashboards-prom
mountPath: /etc/grafana/dashboards/prom
- name: dashboards-k8s
mountPath: /etc/grafana/dashboards/k8s
- name: dashboards-k8s-nodes
mountPath: /etc/grafana/dashboards/k8s-nodes
- name: dashboards-k8s-resources
mountPath: /etc/grafana/dashboards/k8s-resources
- name: dashboards-k8s-resources-1
mountPath: /etc/grafana/dashboards/k8s-resources-1
- name: dashboards-k8s-resources-2
mountPath: /etc/grafana/dashboards/k8s-resources-2
- name: dashboards-k8s-network
mountPath: /etc/grafana/dashboards/k8s-network
- name: dashboards-coredns
mountPath: /etc/grafana/dashboards/coredns
- name: dashboards-nginx-ingress
@ -81,6 +95,9 @@ spec:
- name: dashboards-etcd
configMap:
name: grafana-dashboards-etcd
- name: dashboards-node-exporter
configMap:
name: grafana-dashboards-node-exporter
- name: dashboards-prom
configMap:
name: grafana-dashboards-prom
@ -90,9 +107,15 @@ spec:
- name: dashboards-k8s-nodes
configMap:
name: grafana-dashboards-k8s-nodes
- name: dashboards-k8s-resources
- name: dashboards-k8s-resources-1
configMap:
name: grafana-dashboards-k8s-resources
name: grafana-dashboards-k8s-resources-1
- name: dashboards-k8s-network
configMap:
name: grafana-dashboards-k8s-network
- name: dashboards-k8s-resources-2
configMap:
name: grafana-dashboards-k8s-resources-2
- name: dashboards-coredns
configMap:
name: grafana-dashboards-coredns

View File

@ -0,0 +1,6 @@
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: public
spec:
controller: k8s.io/public

View File

@ -17,16 +17,16 @@ spec:
labels:
name: nginx-ingress-controller
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
nodeSelector:
node-role.kubernetes.io/node: ""
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.0
image: registry.k8s.io/ingress-nginx/controller:v1.5.1
args:
- /nginx-ingress-controller
- --controller-class=k8s.io/public
- --ingress-class=public
# use downward API
env:
@ -49,7 +49,6 @@ spec:
containerPort: 10254
hostPort: 10254
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
@ -57,22 +56,28 @@ spec:
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33 # www-data
runAsUser: 101 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 60
terminationGracePeriodSeconds: 300

View File

@ -29,7 +29,7 @@ rules:
- list
- watch
- apiGroups:
- ""
- ""
resources:
- events
verbs:
@ -51,3 +51,19 @@ rules:
- ingresses/status
verbs:
- update
- apiGroups:
- "networking.k8s.io"
resources:
- ingressclasses
verbs:
- get
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- "endpointslices"
verbs:
- get
- list
- watch

View File

@ -10,6 +10,7 @@ rules:
- configmaps
- pods
- secrets
- endpoints
verbs:
- get
- apiGroups:
@ -37,3 +38,11 @@ rules:
- endpoints
verbs:
- get
- apiGroups:
- "coordination.k8s.io"
resources:
- leases
verbs:
- create
- get
- update

View File

@ -0,0 +1,6 @@
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: public
spec:
controller: k8s.io/public

View File

@ -17,16 +17,16 @@ spec:
labels:
name: nginx-ingress-controller
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
nodeSelector:
node-role.kubernetes.io/node: ""
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.0
image: registry.k8s.io/ingress-nginx/controller:v1.5.1
args:
- /nginx-ingress-controller
- --controller-class=k8s.io/public
- --ingress-class=public
# use downward API
env:
@ -49,7 +49,6 @@ spec:
containerPort: 10254
hostPort: 10254
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
@ -57,22 +56,28 @@ spec:
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33 # www-data
runAsUser: 101 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 60
terminationGracePeriodSeconds: 300

View File

@ -29,7 +29,7 @@ rules:
- list
- watch
- apiGroups:
- ""
- ""
resources:
- events
verbs:
@ -51,3 +51,19 @@ rules:
- ingresses/status
verbs:
- update
- apiGroups:
- "networking.k8s.io"
resources:
- ingressclasses
verbs:
- get
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- "endpointslices"
verbs:
- get
- list
- watch

View File

@ -10,6 +10,7 @@ rules:
- configmaps
- pods
- secrets
- endpoints
verbs:
- get
- apiGroups:
@ -32,8 +33,11 @@ rules:
verbs:
- create
- apiGroups:
- ""
- "coordination.k8s.io"
resources:
- endpoints
- leases
verbs:
- create
- get
- update

View File

@ -0,0 +1,6 @@
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: public
spec:
controller: k8s.io/public

View File

@ -1,7 +1,7 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: ingress-controller-public
name: nginx-ingress-controller
namespace: ingress
spec:
replicas: 2
@ -10,21 +10,23 @@ spec:
maxUnavailable: 1
selector:
matchLabels:
name: ingress-controller-public
name: nginx-ingress-controller
phase: prod
template:
metadata:
labels:
name: ingress-controller-public
name: nginx-ingress-controller
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.0
image: registry.k8s.io/ingress-nginx/controller:v1.5.1
args:
- /nginx-ingress-controller
- --controller-class=k8s.io/public
- --ingress-class=public
# use downward API
env:
@ -62,13 +64,17 @@ spec:
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33 # www-data
runAsUser: 101 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 60
terminationGracePeriodSeconds: 300

View File

@ -29,7 +29,7 @@ rules:
- list
- watch
- apiGroups:
- ""
- ""
resources:
- events
verbs:
@ -51,3 +51,19 @@ rules:
- ingresses/status
verbs:
- update
- apiGroups:
- "networking.k8s.io"
resources:
- ingressclasses
verbs:
- get
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- "endpointslices"
verbs:
- get
- list
- watch

View File

@ -10,6 +10,7 @@ rules:
- configmaps
- pods
- secrets
- endpoints
verbs:
- get
- apiGroups:
@ -32,8 +33,10 @@ rules:
verbs:
- create
- apiGroups:
- ""
- "coordination.k8s.io"
resources:
- endpoints
- leases
verbs:
- create
- get
- update

View File

@ -1,7 +1,7 @@
apiVersion: v1
kind: Service
metadata:
name: ingress-controller-public
name: nginx-ingress-controller
namespace: ingress
annotations:
prometheus.io/scrape: 'true'
@ -10,7 +10,7 @@ spec:
type: ClusterIP
clusterIP: 10.3.0.12
selector:
name: ingress-controller-public
name: nginx-ingress-controller
phase: prod
ports:
- name: http

View File

@ -0,0 +1,6 @@
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: public
spec:
controller: k8s.io/public

View File

@ -17,16 +17,16 @@ spec:
labels:
name: nginx-ingress-controller
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
nodeSelector:
node-role.kubernetes.io/node: ""
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.0
image: registry.k8s.io/ingress-nginx/controller:v1.5.1
args:
- /nginx-ingress-controller
- --controller-class=k8s.io/public
- --ingress-class=public
# use downward API
env:
@ -49,7 +49,6 @@ spec:
containerPort: 10254
hostPort: 10254
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
@ -57,22 +56,28 @@ spec:
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33 # www-data
runAsUser: 101 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 60
terminationGracePeriodSeconds: 300

View File

@ -29,7 +29,7 @@ rules:
- list
- watch
- apiGroups:
- ""
- ""
resources:
- events
verbs:
@ -51,3 +51,19 @@ rules:
- ingresses/status
verbs:
- update
- apiGroups:
- "networking.k8s.io"
resources:
- ingressclasses
verbs:
- get
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- "endpointslices"
verbs:
- get
- list
- watch

View File

@ -10,6 +10,7 @@ rules:
- configmaps
- pods
- secrets
- endpoints
verbs:
- get
- apiGroups:
@ -32,8 +33,10 @@ rules:
verbs:
- create
- apiGroups:
- ""
- "coordination.k8s.io"
resources:
- endpoints
- leases
verbs:
- create
- get
- update

View File

@ -0,0 +1,6 @@
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: public
spec:
controller: k8s.io/public

View File

@ -17,16 +17,16 @@ spec:
labels:
name: nginx-ingress-controller
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
nodeSelector:
node-role.kubernetes.io/node: ""
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.0
image: registry.k8s.io/ingress-nginx/controller:v1.5.1
args:
- /nginx-ingress-controller
- --controller-class=k8s.io/public
- --ingress-class=public
# use downward API
env:
@ -49,7 +49,6 @@ spec:
containerPort: 10254
hostPort: 10254
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
@ -57,22 +56,28 @@ spec:
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 5
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33 # www-data
runAsUser: 101 # www-data
restartPolicy: Always
terminationGracePeriodSeconds: 60
terminationGracePeriodSeconds: 300

View File

@ -29,7 +29,7 @@ rules:
- list
- watch
- apiGroups:
- ""
- ""
resources:
- events
verbs:
@ -51,3 +51,19 @@ rules:
- ingresses/status
verbs:
- update
- apiGroups:
- "networking.k8s.io"
resources:
- ingressclasses
verbs:
- get
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- "endpointslices"
verbs:
- get
- list
- watch

View File

@ -10,6 +10,7 @@ rules:
- configmaps
- pods
- secrets
- endpoints
verbs:
- get
- apiGroups:
@ -32,8 +33,10 @@ rules:
verbs:
- create
- apiGroups:
- ""
- "coordination.k8s.io"
resources:
- endpoints
- leases
verbs:
- create
- get
- update

View File

@ -34,7 +34,7 @@ data:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
@ -65,13 +65,61 @@ data:
- source_labels: [__name__]
action: drop
regex: apiserver_admission_step_admission_latencies_seconds_.*
- source_labels: [__name__, group]
regex: apiserver_request_duration_seconds_bucket;.+
action: drop
- source_labels: [__name__, group]
regex: apiserver_request_duration_seconds_count;.+
action: drop
# Scrape config for kube-controller-manager endpoints.
#
# kube-controller-manager service endpoints can be discovered by using the
# `endpoints` role and relabelling to only keep only endpoints associated with
# kube-system/kube-controller-manager and the `https` port.
- job_name: 'kube-controller-manager'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kube-system;kube-controller-manager;metrics
- replacement: kube-controller-manager
action: replace
target_label: job
# Scrape config for kube-scheduler endpoints.
#
# kube-scheduler service endpoints can be discovered by using the `endpoints`
# role and relabelling to only keep only endpoints associated with
# kube-system/kube-scheduler and the `https` port.
- job_name: 'kube-scheduler'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kube-system;kube-scheduler;metrics
- replacement: kube-scheduler
action: replace
target_label: job
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
# metrics from a node by scraping kubelet (127.0.0.1:10250/metrics).
- job_name: 'kubelet'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
@ -79,10 +127,6 @@ data:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
# Scrape config for Kubelet cAdvisor. Explore metrics from a node by
# scraping kubelet (127.0.0.1:10250/metrics/cadvisor).
- job_name: 'kubernetes-cadvisor'
@ -97,9 +141,6 @@ data:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
metric_relabel_configs:
- source_labels: [__name__, image]
action: drop
@ -115,16 +156,14 @@ data:
- role: node
scheme: http
relabel_configs:
- source_labels: [__meta_kubernetes_node_label_node_role_kubernetes_io_controller]
action: keep
regex: 'true'
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_address_InternalIP]
action: replace
target_label: __address__
replacement: '${1}:2381'
- source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_controller]
action: keep
regex: 'true'
- source_labels: [__meta_kubernetes_node_address_InternalIP]
action: replace
target_label: __address__
replacement: '${1}:2381'
# Scrape config for service endpoints.
#
# The relabeling allows the actual service scrape endpoint to be configured
@ -136,6 +175,7 @@ data:
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately.
# * `prometheus.io/param`: Custom metrics query parameter, like "format=prometheus".
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
@ -158,6 +198,11 @@ data:
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_param]
action: replace
target_label: __param_$1
regex: ([^=]+)=(.*)
replacement: $2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
@ -169,44 +214,12 @@ data:
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: job
metric_relabel_configs:
- source_labels: [__name__]
action: drop
regex: etcd_(debugging|disk|request|server).*
# Example scrape config for probing services via the Blackbox Exporter.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: job
# Example scrape config for pods
#
# The relabeling allows the actual pod scrape endpoint to be configured via the
@ -243,6 +256,67 @@ data:
action: replace
target_label: kubernetes_pod_name
# Example scrape config for probing Services via the Blackbox Exporter.
#
# Relabeling allows service scraping to be configured via annotations:
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter:8080
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: job
# Example scrape config for probing Ingresses via a Blackbox Exporter.
#
# Relabeling allows service scraping to be configured via annotations:
# * `prometheus.io/probe`: Only probe ingresses that have a value of `true`
- job_name: 'kubernetes-ingresses'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: ingress
relabel_configs:
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__meta_kubernetes_ingress_scheme, __address__, __meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter:8080
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: job
# Rule files
rule_files:
- "/etc/prometheus/rules/*.rules"

View File

@ -14,13 +14,14 @@ spec:
labels:
name: prometheus
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
serviceAccountName: prometheus
containers:
- name: prometheus
image: quay.io/prometheus/prometheus:v2.11.0
image: quay.io/prometheus/prometheus:v2.40.5
args:
- --web.listen-address=0.0.0.0:9090
- --config.file=/etc/prometheus/prometheus.yaml

View File

@ -1,18 +1,16 @@
# Allow Prometheus to discover service endpoints
apiVersion: v1
kind: Service
metadata:
name: kube-controller-manager
namespace: kube-system
annotations:
prometheus.io/scrape: 'true'
spec:
type: ClusterIP
# service is created to allow prometheus to scrape endpoints
clusterIP: None
selector:
k8s-app: kube-controller-manager
ports:
- name: metrics
protocol: TCP
port: 10252
targetPort: 10252
port: 10257
targetPort: 10257

View File

@ -0,0 +1,19 @@
# Allow Prometheus to scrape service endpoints
apiVersion: v1
kind: Service
metadata:
name: kube-proxy
namespace: kube-system
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '10249'
spec:
type: ClusterIP
clusterIP: None
selector:
k8s-app: kube-proxy
ports:
- name: metrics
protocol: TCP
port: 10249
targetPort: 10249

View File

@ -1,18 +1,16 @@
# Allow Prometheus to discover service endpoints
apiVersion: v1
kind: Service
metadata:
name: kube-scheduler
namespace: kube-system
annotations:
prometheus.io/scrape: 'true'
spec:
type: ClusterIP
# service is created to allow prometheus to scrape endpoints
clusterIP: None
selector:
k8s-app: kube-scheduler
ports:
- name: metrics
protocol: TCP
port: 10251
targetPort: 10251
port: 10259
targetPort: 10259

View File

@ -74,13 +74,30 @@ rules:
- storage.k8s.io
resources:
- storageclasses
- volumeattachments
verbs:
- list
- watch
- apiGroups:
- autoscaling.k8s.io
- admissionregistration.k8s.io
resources:
- verticalpodautoscalers
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
- ingresses
verbs:
- list
- watch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- list
- watch

View File

@ -18,46 +18,30 @@ spec:
labels:
name: kube-state-metrics
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.7.1
image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.7.0
ports:
- name: metrics
containerPort: 8080
readinessProbe:
- name: telemetry
containerPort: 8081
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
- name: addon-resizer
image: k8s.gcr.io/addon-resizer:1.8.5
resources:
limits:
cpu: 100m
memory: 30Mi
requests:
cpu: 100m
memory: 30Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --container=kube-state-metrics
- --cpu=100m
- --extra-cpu=1m
- --memory=100Mi
- --extra-memory=2Mi
- --threshold=5
- --deployment=kube-state-metrics
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
securityContext:
runAsUser: 65534

View File

@ -1,13 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kube-state-metrics
namespace: monitoring
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: monitoring

View File

@ -1,31 +0,0 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kube-state-metrics
namespace: monitoring
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- extensions
resources:
- deployments
resourceNames:
- kube-state-metrics
verbs:
- get
- update
- apiGroups:
- apps
resources:
- deployments
resourceNames:
- kube-state-metrics
verbs:
- get
- update

View File

@ -17,24 +17,24 @@ spec:
labels:
name: node-exporter
phase: prod
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
serviceAccountName: node-exporter
securityContext:
runAsNonRoot: true
runAsUser: 65534
runAsGroup: 65534
fsGroup: 65534
seccompProfile:
type: RuntimeDefault
hostNetwork: true
hostPID: true
containers:
- name: node-exporter
image: quay.io/prometheus/node-exporter:v0.18.1
image: quay.io/prometheus/node-exporter:v1.5.0
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
ports:
- name: metrics
containerPort: 9100
@ -46,6 +46,9 @@ spec:
limits:
cpu: 200m
memory: 100Mi
securityContext:
seLinuxOptions:
type: spc_t
volumeMounts:
- name: proc
mountPath: /host/proc
@ -55,9 +58,14 @@ spec:
readOnly: true
- name: root
mountPath: /host/root
mountPropagation: HostToContainer
readOnly: true
tolerations:
- effect: NoSchedule
- key: node-role.kubernetes.io/controller
operator: Exists
- key: node-role.kubernetes.io/control-plane
operator: Exists
- key: node.kubernetes.io/not-ready
operator: Exists
volumes:
- name: proc

View File

@ -10,6 +10,17 @@ rules:
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
verbs:
- get
- list
- watch
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch

View File

@ -1,8 +1,4 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: monitoring
data:
etcd.yaml: |-
{
@ -10,12 +6,25 @@ data:
{
"name": "etcd",
"rules": [
{
"alert": "etcdMembersDown",
"annotations": {
"description": "etcd cluster \"{{ $labels.job }}\": members are down ({{ $value }}).",
"summary": "etcd cluster members are down."
},
"expr": "max without (endpoint) (\n sum without (instance) (up{job=~\".*etcd.*\"} == bool 0)\nor\n count without (To) (\n sum without (instance) (rate(etcd_network_peer_sent_failures_total{job=~\".*etcd.*\"}[120s])) > 0.01\n )\n)\n> 0\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "etcdInsufficientMembers",
"annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": insufficient members ({{ $value }})."
"description": "etcd cluster \"{{ $labels.job }}\": insufficient members ({{ $value }}).",
"summary": "etcd cluster has insufficient number of members."
},
"expr": "sum(up{job=~\".*etcd.*\"} == bool 1) by (job) < ((count(up{job=~\".*etcd.*\"}) by (job) + 1) / 2)\n",
"expr": "sum(up{job=~\".*etcd.*\"} == bool 1) without (instance) < ((count(up{job=~\".*etcd.*\"}) without (instance) + 1) / 2)\n",
"for": "3m",
"labels": {
"severity": "critical"
@ -24,7 +33,8 @@ data:
{
"alert": "etcdNoLeader",
"annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": member {{ $labels.instance }} has no leader."
"description": "etcd cluster \"{{ $labels.job }}\": member {{ $labels.instance }} has no leader.",
"summary": "etcd cluster has no leader."
},
"expr": "etcd_server_has_leader{job=~\".*etcd.*\"} == 0\n",
"for": "1m",
@ -35,10 +45,11 @@ data:
{
"alert": "etcdHighNumberOfLeaderChanges",
"annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": instance {{ $labels.instance }} has seen {{ $value }} leader changes within the last 30 minutes."
"description": "etcd cluster \"{{ $labels.job }}\": {{ $value }} leader changes within the last 15 minutes. Frequent elections may be a sign of insufficient resources, high network latency, or disruptions by other components and should be investigated.",
"summary": "etcd cluster has high number of leader changes."
},
"expr": "rate(etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}[15m]) > 3\n",
"for": "15m",
"expr": "increase((max without (instance) (etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}) or 0*absent(etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}))[15m:1m]) >= 4\n",
"for": "5m",
"labels": {
"severity": "warning"
}
@ -46,9 +57,10 @@ data:
{
"alert": "etcdGRPCRequestsSlow",
"annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": gRPC requests to {{ $labels.grpc_method }} are taking {{ $value }}s on etcd instance {{ $labels.instance }}."
"description": "etcd cluster \"{{ $labels.job }}\": 99th percentile of gRPC requests is {{ $value }}s on etcd instance {{ $labels.instance }} for {{ $labels.grpc_method }} method.",
"summary": "etcd grpc requests are slow"
},
"expr": "histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job=~\".*etcd.*\", grpc_type=\"unary\"}[5m])) by (job, instance, grpc_service, grpc_method, le))\n> 0.15\n",
"expr": "histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job=~\".*etcd.*\", grpc_method!=\"Defragment\", grpc_type=\"unary\"}[5m])) without(grpc_type))\n> 0.15\n",
"for": "10m",
"labels": {
"severity": "critical"
@ -57,7 +69,8 @@ data:
{
"alert": "etcdMemberCommunicationSlow",
"annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": member communication with {{ $labels.To }} is taking {{ $value }}s on etcd instance {{ $labels.instance }}."
"description": "etcd cluster \"{{ $labels.job }}\": member communication with {{ $labels.To }} is taking {{ $value }}s on etcd instance {{ $labels.instance }}.",
"summary": "etcd cluster member communication is slow."
},
"expr": "histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket{job=~\".*etcd.*\"}[5m]))\n> 0.15\n",
"for": "10m",
@ -68,7 +81,8 @@ data:
{
"alert": "etcdHighNumberOfFailedProposals",
"annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": {{ $value }} proposal failures within the last 30 minutes on etcd instance {{ $labels.instance }}."
"description": "etcd cluster \"{{ $labels.job }}\": {{ $value }} proposal failures within the last 30 minutes on etcd instance {{ $labels.instance }}.",
"summary": "etcd cluster has high number of proposal failures."
},
"expr": "rate(etcd_server_proposals_failed_total{job=~\".*etcd.*\"}[15m]) > 5\n",
"for": "15m",
@ -79,7 +93,8 @@ data:
{
"alert": "etcdHighFsyncDurations",
"annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": 99th percentile fync durations are {{ $value }}s on etcd instance {{ $labels.instance }}."
"description": "etcd cluster \"{{ $labels.job }}\": 99th percentile fsync durations are {{ $value }}s on etcd instance {{ $labels.instance }}.",
"summary": "etcd cluster 99th percentile fsync durations are too high."
},
"expr": "histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{job=~\".*etcd.*\"}[5m]))\n> 0.5\n",
"for": "10m",
@ -87,10 +102,23 @@ data:
"severity": "warning"
}
},
{
"alert": "etcdHighFsyncDurations",
"annotations": {
"description": "etcd cluster \"{{ $labels.job }}\": 99th percentile fsync durations are {{ $value }}s on etcd instance {{ $labels.instance }}.",
"summary": "etcd cluster 99th percentile fsync durations are too high."
},
"expr": "histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{job=~\".*etcd.*\"}[5m]))\n> 1\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "etcdHighCommitDurations",
"annotations": {
"message": "etcd cluster \"{{ $labels.job }}\": 99th percentile commit durations {{ $value }}s on etcd instance {{ $labels.instance }}."
"description": "etcd cluster \"{{ $labels.job }}\": 99th percentile commit durations {{ $value }}s on etcd instance {{ $labels.instance }}.",
"summary": "etcd cluster 99th percentile commit durations are too high."
},
"expr": "histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket{job=~\".*etcd.*\"}[5m]))\n> 0.25\n",
"for": "10m",
@ -99,54 +127,24 @@ data:
}
},
{
"alert": "etcdHighNumberOfFailedHTTPRequests",
"alert": "etcdBackendQuotaLowSpace",
"annotations": {
"message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}"
"description": "etcd cluster \"{{ $labels.job }}\": database size exceeds the defined quota on etcd instance {{ $labels.instance }}, please defrag or increase the quota as the writes to etcd will be disabled when it is full.",
"summary": "etcd cluster database is running full."
},
"expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nBY (method) > 0.01\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "etcdHighNumberOfFailedHTTPRequests",
"annotations": {
"message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}."
},
"expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nBY (method) > 0.05\n",
"expr": "(etcd_mvcc_db_total_size_in_bytes/etcd_server_quota_backend_bytes)*100 > 95\n",
"for": "10m",
"labels": {
"severity": "critical"
}
},
{
"alert": "etcdHTTPRequestsSlow",
"alert": "etcdExcessiveDatabaseGrowth",
"annotations": {
"message": "etcd instance {{ $labels.instance }} HTTP requests to {{ $labels.method }} are slow."
"description": "etcd cluster \"{{ $labels.job }}\": Observed surge in etcd writes leading to 50% increase in database size over the past four hours on etcd instance {{ $labels.instance }}, please check as it might be disruptive.",
"summary": "etcd cluster database growing very fast."
},
"expr": "histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[5m]))\n> 0.15\n",
"for": "10m",
"labels": {
"severity": "warning"
}
}
]
}
]
}
extra.yaml: |-
{
"groups": [
{
"name": "extra.rules",
"rules": [
{
"alert": "InactiveRAIDDisk",
"annotations": {
"message": "{{ $value }} RAID disk(s) on node {{ $labels.instance }} are inactive."
},
"expr": "node_md_disks - node_md_disks_active > 0",
"expr": "increase(((etcd_mvcc_db_total_size_in_bytes/etcd_server_quota_backend_bytes)*100)[240m:1m]) > 50\n",
"for": "10m",
"labels": {
"severity": "warning"
@ -159,57 +157,303 @@ data:
kube.yaml: |-
{
"groups": [
{
"name": "kube-apiserver-burnrate.rules",
"rules": [
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\"}[1d]))\n -\n (\n (\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=~\"resource|\",le=\"1\"}[1d]))\n or\n vector(0)\n )\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"namespace\",le=\"5\"}[1d]))\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"cluster\",le=\"30\"}[1d]))\n )\n )\n +\n # errors\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[1d]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[1d]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate1d"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\"}[1h]))\n -\n (\n (\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=~\"resource|\",le=\"1\"}[1h]))\n or\n vector(0)\n )\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"namespace\",le=\"5\"}[1h]))\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"cluster\",le=\"30\"}[1h]))\n )\n )\n +\n # errors\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[1h]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[1h]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate1h"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\"}[2h]))\n -\n (\n (\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=~\"resource|\",le=\"1\"}[2h]))\n or\n vector(0)\n )\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"namespace\",le=\"5\"}[2h]))\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"cluster\",le=\"30\"}[2h]))\n )\n )\n +\n # errors\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[2h]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[2h]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate2h"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\"}[30m]))\n -\n (\n (\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=~\"resource|\",le=\"1\"}[30m]))\n or\n vector(0)\n )\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"namespace\",le=\"5\"}[30m]))\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"cluster\",le=\"30\"}[30m]))\n )\n )\n +\n # errors\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[30m]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[30m]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate30m"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\"}[3d]))\n -\n (\n (\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=~\"resource|\",le=\"1\"}[3d]))\n or\n vector(0)\n )\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"namespace\",le=\"5\"}[3d]))\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"cluster\",le=\"30\"}[3d]))\n )\n )\n +\n # errors\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[3d]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[3d]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate3d"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\"}[5m]))\n -\n (\n (\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=~\"resource|\",le=\"1\"}[5m]))\n or\n vector(0)\n )\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"namespace\",le=\"5\"}[5m]))\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"cluster\",le=\"30\"}[5m]))\n )\n )\n +\n # errors\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[5m]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[5m]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate5m"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\"}[6h]))\n -\n (\n (\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=~\"resource|\",le=\"1\"}[6h]))\n or\n vector(0)\n )\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"namespace\",le=\"5\"}[6h]))\n +\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\",scope=\"cluster\",le=\"30\"}[6h]))\n )\n )\n +\n # errors\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\",code=~\"5..\"}[6h]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[6h]))\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:burnrate6h"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\"}[1d]))\n -\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\",le=\"1\"}[1d]))\n )\n +\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[1d]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1d]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate1d"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\"}[1h]))\n -\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\",le=\"1\"}[1h]))\n )\n +\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[1h]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[1h]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate1h"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\"}[2h]))\n -\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\",le=\"1\"}[2h]))\n )\n +\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[2h]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[2h]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate2h"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\"}[30m]))\n -\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\",le=\"1\"}[30m]))\n )\n +\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[30m]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[30m]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate30m"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\"}[3d]))\n -\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\",le=\"1\"}[3d]))\n )\n +\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[3d]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[3d]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate3d"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\"}[5m]))\n -\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\",le=\"1\"}[5m]))\n )\n +\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[5m]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[5m]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate5m"
},
{
"expr": "(\n (\n # too slow\n sum by (cluster) (rate(apiserver_request_duration_seconds_count{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\"}[6h]))\n -\n sum by (cluster) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\",le=\"1\"}[6h]))\n )\n +\n sum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",code=~\"5..\"}[6h]))\n)\n/\nsum by (cluster) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[6h]))\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:burnrate6h"
}
]
},
{
"name": "kube-apiserver-histogram.rules",
"rules": [
{
"expr": "histogram_quantile(0.99, sum by (cluster, le, resource) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"LIST|GET\",subresource!~\"proxy|attach|log|exec|portforward\"}[5m]))) > 0\n",
"labels": {
"quantile": "0.99",
"verb": "read"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.99, sum by (cluster, le, resource) (rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\",subresource!~\"proxy|attach|log|exec|portforward\"}[5m]))) > 0\n",
"labels": {
"quantile": "0.99",
"verb": "write"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
}
]
},
{
"interval": "3m",
"name": "kube-apiserver-availability.rules",
"rules": [
{
"expr": "avg_over_time(code_verb:apiserver_request_total:increase1h[30d]) * 24 * 30\n",
"record": "code_verb:apiserver_request_total:increase30d"
},
{
"expr": "sum by (cluster, code) (code_verb:apiserver_request_total:increase30d{verb=~\"LIST|GET\"})\n",
"labels": {
"verb": "read"
},
"record": "code:apiserver_request_total:increase30d"
},
{
"expr": "sum by (cluster, code) (code_verb:apiserver_request_total:increase30d{verb=~\"POST|PUT|PATCH|DELETE\"})\n",
"labels": {
"verb": "write"
},
"record": "code:apiserver_request_total:increase30d"
},
{
"expr": "sum by (cluster, verb, scope) (increase(apiserver_request_duration_seconds_count[1h]))\n",
"record": "cluster_verb_scope:apiserver_request_duration_seconds_count:increase1h"
},
{
"expr": "sum by (cluster, verb, scope) (avg_over_time(cluster_verb_scope:apiserver_request_duration_seconds_count:increase1h[30d]) * 24 * 30)\n",
"record": "cluster_verb_scope:apiserver_request_duration_seconds_count:increase30d"
},
{
"expr": "sum by (cluster, verb, scope, le) (increase(apiserver_request_duration_seconds_bucket[1h]))\n",
"record": "cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase1h"
},
{
"expr": "sum by (cluster, verb, scope, le) (avg_over_time(cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase1h[30d]) * 24 * 30)\n",
"record": "cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase30d"
},
{
"expr": "1 - (\n (\n # write too slow\n sum by (cluster) (cluster_verb_scope:apiserver_request_duration_seconds_count:increase30d{verb=~\"POST|PUT|PATCH|DELETE\"})\n -\n sum by (cluster) (cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase30d{verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"})\n ) +\n (\n # read too slow\n sum by (cluster) (cluster_verb_scope:apiserver_request_duration_seconds_count:increase30d{verb=~\"LIST|GET\"})\n -\n (\n (\n sum by (cluster) (cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase30d{verb=~\"LIST|GET\",scope=~\"resource|\",le=\"1\"})\n or\n vector(0)\n )\n +\n sum by (cluster) (cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase30d{verb=~\"LIST|GET\",scope=\"namespace\",le=\"5\"})\n +\n sum by (cluster) (cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase30d{verb=~\"LIST|GET\",scope=\"cluster\",le=\"30\"})\n )\n ) +\n # errors\n sum by (cluster) (code:apiserver_request_total:increase30d{code=~\"5..\"} or vector(0))\n)\n/\nsum by (cluster) (code:apiserver_request_total:increase30d)\n",
"labels": {
"verb": "all"
},
"record": "apiserver_request:availability30d"
},
{
"expr": "1 - (\n sum by (cluster) (cluster_verb_scope:apiserver_request_duration_seconds_count:increase30d{verb=~\"LIST|GET\"})\n -\n (\n # too slow\n (\n sum by (cluster) (cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase30d{verb=~\"LIST|GET\",scope=~\"resource|\",le=\"1\"})\n or\n vector(0)\n )\n +\n sum by (cluster) (cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase30d{verb=~\"LIST|GET\",scope=\"namespace\",le=\"5\"})\n +\n sum by (cluster) (cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase30d{verb=~\"LIST|GET\",scope=\"cluster\",le=\"30\"})\n )\n +\n # errors\n sum by (cluster) (code:apiserver_request_total:increase30d{verb=\"read\",code=~\"5..\"} or vector(0))\n)\n/\nsum by (cluster) (code:apiserver_request_total:increase30d{verb=\"read\"})\n",
"labels": {
"verb": "read"
},
"record": "apiserver_request:availability30d"
},
{
"expr": "1 - (\n (\n # too slow\n sum by (cluster) (cluster_verb_scope:apiserver_request_duration_seconds_count:increase30d{verb=~\"POST|PUT|PATCH|DELETE\"})\n -\n sum by (cluster) (cluster_verb_scope_le:apiserver_request_duration_seconds_bucket:increase30d{verb=~\"POST|PUT|PATCH|DELETE\",le=\"1\"})\n )\n +\n # errors\n sum by (cluster) (code:apiserver_request_total:increase30d{verb=\"write\",code=~\"5..\"} or vector(0))\n)\n/\nsum by (cluster) (code:apiserver_request_total:increase30d{verb=\"write\"})\n",
"labels": {
"verb": "write"
},
"record": "apiserver_request:availability30d"
},
{
"expr": "sum by (cluster,code,resource) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET\"}[5m]))\n",
"labels": {
"verb": "read"
},
"record": "code_resource:apiserver_request_total:rate5m"
},
{
"expr": "sum by (cluster,code,resource) (rate(apiserver_request_total{job=\"apiserver\",verb=~\"POST|PUT|PATCH|DELETE\"}[5m]))\n",
"labels": {
"verb": "write"
},
"record": "code_resource:apiserver_request_total:rate5m"
},
{
"expr": "sum by (cluster, code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET|POST|PUT|PATCH|DELETE\",code=~\"2..\"}[1h]))\n",
"record": "code_verb:apiserver_request_total:increase1h"
},
{
"expr": "sum by (cluster, code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET|POST|PUT|PATCH|DELETE\",code=~\"3..\"}[1h]))\n",
"record": "code_verb:apiserver_request_total:increase1h"
},
{
"expr": "sum by (cluster, code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET|POST|PUT|PATCH|DELETE\",code=~\"4..\"}[1h]))\n",
"record": "code_verb:apiserver_request_total:increase1h"
},
{
"expr": "sum by (cluster, code, verb) (increase(apiserver_request_total{job=\"apiserver\",verb=~\"LIST|GET|POST|PUT|PATCH|DELETE\",code=~\"5..\"}[1h]))\n",
"record": "code_verb:apiserver_request_total:increase1h"
}
]
},
{
"name": "k8s.rules",
"rules": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])) by (namespace)\n",
"record": "namespace:container_cpu_usage_seconds_total:sum_rate"
"expr": "sum by (cluster, namespace, pod, container) (\n irate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\"}[5m])\n) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (\n 1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate"
},
{
"expr": "sum by (namespace, pod, container) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])\n)\n",
"record": "namespace_pod_container:container_cpu_usage_seconds_total:sum_rate"
"expr": "container_memory_working_set_bytes{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_working_set_bytes"
},
{
"expr": "sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}) by (namespace)\n",
"record": "namespace:container_memory_usage_bytes:sum"
"expr": "container_memory_rss{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_rss"
},
{
"expr": "sum by (namespace, label_name) (\n sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container!=\"POD\"}[5m])) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"record": "namespace:container_cpu_usage_seconds_total:sum_rate"
"expr": "container_memory_cache{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_cache"
},
{
"expr": "sum by (namespace, label_name) (\n sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\",image!=\"\", container!=\"POD\"}) by (pod, namespace)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"record": "namespace:container_memory_usage_bytes:sum"
"expr": "container_memory_swap{job=\"kubernetes-cadvisor\", image!=\"\"}\n* on (namespace, pod) group_left(node) topk by(namespace, pod) (1,\n max by(namespace, pod, node) (kube_pod_info{node!=\"\"})\n)\n",
"record": "node_namespace_pod_container:container_memory_swap"
},
{
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~\"^(Pending|Running)$\"} == 1)) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"record": "namespace:kube_pod_container_resource_requests_memory_bytes:sum"
"expr": "kube_pod_container_resource_requests{resource=\"memory\",job=\"kube-state-metrics\"} * on (namespace, pod, cluster)\ngroup_left() max by (namespace, pod, cluster) (\n (kube_pod_status_phase{phase=~\"Pending|Running\"} == 1)\n)\n",
"record": "cluster:namespace:pod_memory:active:kube_pod_container_resource_requests"
},
{
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"} * on (endpoint, instance, job, namespace, pod, service) group_left(phase) (kube_pod_status_phase{phase=~\"^(Pending|Running)$\"} == 1)) by (namespace, pod)\n * on (namespace, pod)\n group_left(label_name) kube_pod_labels{job=\"kube-state-metrics\"}\n)\n",
"record": "namespace:kube_pod_container_resource_requests_cpu_cores:sum"
"expr": "sum by (namespace, cluster) (\n sum by (namespace, pod, cluster) (\n max by (namespace, pod, container, cluster) (\n kube_pod_container_resource_requests{resource=\"memory\",job=\"kube-state-metrics\"}\n ) * on(namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (\n kube_pod_status_phase{phase=~\"Pending|Running\"} == 1\n )\n )\n)\n",
"record": "namespace_memory:kube_pod_container_resource_requests:sum"
},
{
"expr": "sum(\n label_replace(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"ReplicaSet\"},\n \"replicaset\", \"$1\", \"owner_name\", \"(.*)\"\n ) * on(replicaset, namespace) group_left(owner_name) kube_replicaset_owner{job=\"kube-state-metrics\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n) by (namespace, workload, pod)\n",
"expr": "kube_pod_container_resource_requests{resource=\"cpu\",job=\"kube-state-metrics\"} * on (namespace, pod, cluster)\ngroup_left() max by (namespace, pod, cluster) (\n (kube_pod_status_phase{phase=~\"Pending|Running\"} == 1)\n)\n",
"record": "cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests"
},
{
"expr": "sum by (namespace, cluster) (\n sum by (namespace, pod, cluster) (\n max by (namespace, pod, container, cluster) (\n kube_pod_container_resource_requests{resource=\"cpu\",job=\"kube-state-metrics\"}\n ) * on(namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (\n kube_pod_status_phase{phase=~\"Pending|Running\"} == 1\n )\n )\n)\n",
"record": "namespace_cpu:kube_pod_container_resource_requests:sum"
},
{
"expr": "kube_pod_container_resource_limits{resource=\"memory\",job=\"kube-state-metrics\"} * on (namespace, pod, cluster)\ngroup_left() max by (namespace, pod, cluster) (\n (kube_pod_status_phase{phase=~\"Pending|Running\"} == 1)\n)\n",
"record": "cluster:namespace:pod_memory:active:kube_pod_container_resource_limits"
},
{
"expr": "sum by (namespace, cluster) (\n sum by (namespace, pod, cluster) (\n max by (namespace, pod, container, cluster) (\n kube_pod_container_resource_limits{resource=\"memory\",job=\"kube-state-metrics\"}\n ) * on(namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (\n kube_pod_status_phase{phase=~\"Pending|Running\"} == 1\n )\n )\n)\n",
"record": "namespace_memory:kube_pod_container_resource_limits:sum"
},
{
"expr": "kube_pod_container_resource_limits{resource=\"cpu\",job=\"kube-state-metrics\"} * on (namespace, pod, cluster)\ngroup_left() max by (namespace, pod, cluster) (\n (kube_pod_status_phase{phase=~\"Pending|Running\"} == 1)\n )\n",
"record": "cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits"
},
{
"expr": "sum by (namespace, cluster) (\n sum by (namespace, pod, cluster) (\n max by (namespace, pod, container, cluster) (\n kube_pod_container_resource_limits{resource=\"cpu\",job=\"kube-state-metrics\"}\n ) * on(namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (\n kube_pod_status_phase{phase=~\"Pending|Running\"} == 1\n )\n )\n)\n",
"record": "namespace_cpu:kube_pod_container_resource_limits:sum"
},
{
"expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"ReplicaSet\"},\n \"replicaset\", \"$1\", \"owner_name\", \"(.*)\"\n ) * on(replicaset, namespace) group_left(owner_name) topk by(replicaset, namespace) (\n 1, max by (replicaset, namespace, owner_name) (\n kube_replicaset_owner{job=\"kube-state-metrics\"}\n )\n ),\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": {
"workload_type": "deployment"
},
"record": "mixin_pod_workload"
"record": "namespace_workload_pod:kube_pod_owner:relabel"
},
{
"expr": "sum(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"DaemonSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n) by (namespace, workload, pod)\n",
"expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"DaemonSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": {
"workload_type": "daemonset"
},
"record": "mixin_pod_workload"
"record": "namespace_workload_pod:kube_pod_owner:relabel"
},
{
"expr": "sum(\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"StatefulSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n) by (namespace, workload, pod)\n",
"expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"StatefulSet\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": {
"workload_type": "statefulset"
},
"record": "mixin_pod_workload"
"record": "namespace_workload_pod:kube_pod_owner:relabel"
},
{
"expr": "max by (cluster, namespace, workload, pod) (\n label_replace(\n kube_pod_owner{job=\"kube-state-metrics\", owner_kind=\"Job\"},\n \"workload\", \"$1\", \"owner_name\", \"(.*)\"\n )\n)\n",
"labels": {
"workload_type": "job"
},
"record": "namespace_workload_pod:kube_pod_owner:relabel"
}
]
},
@ -281,211 +525,50 @@ data:
}
]
},
{
"name": "kube-apiserver.rules",
"rules": [
{
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.99"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.9"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
},
{
"expr": "histogram_quantile(0.5, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) without(instance, pod))\n",
"labels": {
"quantile": "0.5"
},
"record": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
}
]
},
{
"name": "node.rules",
"rules": [
{
"expr": "sum(min(kube_pod_info) by (node))",
"record": ":kube_pod_info_node_count:"
},
{
"expr": "max(label_replace(kube_pod_info{job=\"kube-state-metrics\"}, \"pod\", \"$1\", \"pod\", \"(.*)\")) by (node, namespace, pod)\n",
"expr": "topk by(namespace, pod) (1,\n max by (node, namespace, pod) (\n label_replace(kube_pod_info{job=\"kube-state-metrics\",node!=\"\"}, \"pod\", \"$1\", \"pod\", \"(.*)\")\n))\n",
"record": "node_namespace_pod:kube_pod_info:"
},
{
"expr": "count by (node) (sum by (node, cpu) (\n node_cpu_seconds_total{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n))\n",
"expr": "count by (cluster, node) (sum by (node, cpu) (\n node_cpu_seconds_total{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n topk by(namespace, pod) (1, node_namespace_pod:kube_pod_info:)\n))\n",
"record": "node:node_num_cpu:sum"
},
{
"expr": "1 - avg(rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m]))\n",
"record": ":node_cpu_utilisation:avg1m"
"expr": "sum(\n node_memory_MemAvailable_bytes{job=\"node-exporter\"} or\n (\n node_memory_Buffers_bytes{job=\"node-exporter\"} +\n node_memory_Cached_bytes{job=\"node-exporter\"} +\n node_memory_MemFree_bytes{job=\"node-exporter\"} +\n node_memory_Slab_bytes{job=\"node-exporter\"}\n )\n) by (cluster)\n",
"record": ":node_memory_MemAvailable_bytes:sum"
},
{
"expr": "1 - avg by (node) (\n rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m])\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:)\n",
"record": "node:node_cpu_utilisation:avg1m"
},
{
"expr": "node:node_cpu_utilisation:avg1m\n *\nnode:node_num_cpu:sum\n /\nscalar(sum(node:node_num_cpu:sum))\n",
"record": "node:cluster_cpu_utilisation:ratio"
},
{
"expr": "sum(node_load1{job=\"node-exporter\"})\n/\nsum(node:node_num_cpu:sum)\n",
"record": ":node_cpu_saturation_load1:"
},
{
"expr": "sum by (node) (\n node_load1{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n/\nnode:node_num_cpu:sum\n",
"record": "node:node_cpu_saturation_load1:"
},
{
"expr": "1 -\nsum(node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n/\nsum(node_memory_MemTotal_bytes{job=\"node-exporter\"})\n",
"record": ":node_memory_utilisation:"
},
{
"expr": "sum(node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n",
"record": ":node_memory_MemFreeCachedBuffers_bytes:sum"
},
{
"expr": "sum(node_memory_MemTotal_bytes{job=\"node-exporter\"})\n",
"record": ":node_memory_MemTotal_bytes:sum"
},
{
"expr": "sum by (node) (\n (node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_memory_bytes_available:sum"
},
{
"expr": "sum by (node) (\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_memory_bytes_total:sum"
},
{
"expr": "(node:node_memory_bytes_total:sum - node:node_memory_bytes_available:sum)\n/\nnode:node_memory_bytes_total:sum\n",
"record": "node:node_memory_utilisation:ratio"
},
{
"expr": "(node:node_memory_bytes_total:sum - node:node_memory_bytes_available:sum)\n/\nscalar(sum(node:node_memory_bytes_total:sum))\n",
"record": "node:cluster_memory_utilisation:ratio"
},
{
"expr": "1e3 * sum(\n (rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m])\n + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n)\n",
"record": ":node_memory_swap_io_bytes:sum_rate"
},
{
"expr": "1 -\nsum by (node) (\n (node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n/\nsum by (node) (\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_memory_utilisation:"
},
{
"expr": "1 - (node:node_memory_bytes_available:sum / node:node_memory_bytes_total:sum)\n",
"record": "node:node_memory_utilisation_2:"
},
{
"expr": "1e3 * sum by (node) (\n (rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m])\n + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_memory_swap_io_bytes:sum_rate"
},
{
"expr": "avg(irate(node_disk_io_time_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m]))\n",
"record": ":node_disk_utilisation:avg_irate"
},
{
"expr": "avg by (node) (\n irate(node_disk_io_time_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m])\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_disk_utilisation:avg_irate"
},
{
"expr": "avg(irate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m]))\n",
"record": ":node_disk_saturation:avg_irate"
},
{
"expr": "avg by (node) (\n irate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+\"}[1m])\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_disk_saturation:avg_irate"
},
{
"expr": "max by (instance, namespace, pod, device) ((node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"}\n- node_filesystem_avail_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n/ node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n",
"record": "node:node_filesystem_usage:"
},
{
"expr": "max by (instance, namespace, pod, device) (node_filesystem_avail_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"} / node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n",
"record": "node:node_filesystem_avail:"
},
{
"expr": "sum(irate(node_network_receive_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m])) +\nsum(irate(node_network_transmit_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n",
"record": ":node_net_utilisation:sum_irate"
},
{
"expr": "sum by (node) (\n (irate(node_network_receive_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]) +\n irate(node_network_transmit_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_net_utilisation:sum_irate"
},
{
"expr": "sum(irate(node_network_receive_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m])) +\nsum(irate(node_network_transmit_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n",
"record": ":node_net_saturation:sum_irate"
},
{
"expr": "sum by (node) (\n (irate(node_network_receive_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]) +\n irate(node_network_transmit_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
"record": "node:node_net_saturation:sum_irate"
},
{
"expr": "max(\n max(\n kube_pod_info{job=\"kube-state-metrics\", host_ip!=\"\"}\n ) by (node, host_ip)\n * on (host_ip) group_right (node)\n label_replace(\n (max(node_filesystem_files{job=\"node-exporter\", mountpoint=\"/\"}) by (instance)), \"host_ip\", \"$1\", \"instance\", \"(.*):.*\"\n )\n) by (node)\n",
"record": "node:node_inodes_total:"
},
{
"expr": "max(\n max(\n kube_pod_info{job=\"kube-state-metrics\", host_ip!=\"\"}\n ) by (node, host_ip)\n * on (host_ip) group_right (node)\n label_replace(\n (max(node_filesystem_files_free{job=\"node-exporter\", mountpoint=\"/\"}) by (instance)), \"host_ip\", \"$1\", \"instance\", \"(.*):.*\"\n )\n) by (node)\n",
"record": "node:node_inodes_free:"
"expr": "sum(rate(node_cpu_seconds_total{job=\"node-exporter\",mode!=\"idle\",mode!=\"iowait\",mode!=\"steal\"}[5m])) /\ncount(sum(node_cpu_seconds_total{job=\"node-exporter\"}) by (cluster, instance, cpu))\n",
"record": "cluster:node_cpu:ratio_rate5m"
}
]
},
{
"name": "kubernetes-absent",
"name": "kubelet.rules",
"rules": [
{
"alert": "KubeAPIDown",
"annotations": {
"message": "KubeAPI has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown"
},
"expr": "absent(up{job=\"apiserver\"} == 1)\n",
"for": "15m",
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (cluster, instance, le) * on(cluster, instance) group_left(node) kubelet_node_name{job=\"kubelet\"})\n",
"labels": {
"severity": "critical"
}
"quantile": "0.99"
},
"record": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile"
},
{
"alert": "KubeControllerManagerDown",
"annotations": {
"message": "KubeControllerManager has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown"
},
"expr": "absent(up{job=\"kube-controller-manager\"} == 1)\n",
"for": "15m",
"expr": "histogram_quantile(0.9, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (cluster, instance, le) * on(cluster, instance) group_left(node) kubelet_node_name{job=\"kubelet\"})\n",
"labels": {
"severity": "critical"
}
"quantile": "0.9"
},
"record": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile"
},
{
"alert": "KubeSchedulerDown",
"annotations": {
"message": "KubeScheduler has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown"
},
"expr": "absent(up{job=\"kube-scheduler\"} == 1)\n",
"for": "15m",
"expr": "histogram_quantile(0.5, sum(rate(kubelet_pleg_relist_duration_seconds_bucket[5m])) by (cluster, instance, le) * on(cluster, instance) group_left(node) kubelet_node_name{job=\"kubelet\"})\n",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeletDown",
"annotations": {
"message": "Kubelet has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown"
"quantile": "0.5"
},
"expr": "absent(up{job=\"kubelet\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
"record": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile"
}
]
},
@ -495,104 +578,126 @@ data:
{
"alert": "KubePodCrashLooping",
"annotations": {
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf \"%.2f\" $value }} times / 5 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping"
"description": "Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is in waiting state (reason: \"CrashLoopBackOff\").",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping",
"summary": "Pod is crash looping."
},
"expr": "rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\"}[15m]) * 60 * 5 > 0\n",
"for": "1h",
"expr": "max_over_time(kube_pod_container_status_waiting_reason{reason=\"CrashLoopBackOff\", job=\"kube-state-metrics\"}[5m]) >= 1\n",
"for": "15m",
"labels": {
"severity": "critical"
"severity": "warning"
}
},
{
"alert": "KubePodNotReady",
"annotations": {
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than an hour.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"
"description": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready",
"summary": "Pod has been in a non-ready state for more than 15 minutes."
},
"expr": "sum by (namespace, pod) (kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Failed|Pending|Unknown\"}) > 0\n",
"for": "1h",
"expr": "sum by (namespace, pod) (\n max by(namespace, pod) (\n kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Pending|Unknown\"}\n ) * on(namespace, pod) group_left(owner_kind) topk by(namespace, pod) (\n 1, max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!=\"Job\"})\n )\n) > 0\n",
"for": "15m",
"labels": {
"severity": "critical"
"severity": "warning"
}
},
{
"alert": "KubeDeploymentGenerationMismatch",
"annotations": {
"message": "Deployment generation for {{ $labels.namespace }}/{{ $labels.deployment }} does not match, this indicates that the Deployment has failed but has not been rolled back.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentgenerationmismatch"
"description": "Deployment generation for {{ $labels.namespace }}/{{ $labels.deployment }} does not match, this indicates that the Deployment has failed but has not been rolled back.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentgenerationmismatch",
"summary": "Deployment generation mismatch due to possible roll-back"
},
"expr": "kube_deployment_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_deployment_metadata_generation{job=\"kube-state-metrics\"}\n",
"for": "15m",
"labels": {
"severity": "critical"
"severity": "warning"
}
},
{
"alert": "KubeDeploymentReplicasMismatch",
"annotations": {
"message": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than an hour.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"
"description": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch",
"summary": "Deployment has not matched the expected number of replicas."
},
"expr": "kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n !=\nkube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n",
"for": "1h",
"expr": "(\n kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n >\n kube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n) and (\n changes(kube_deployment_status_replicas_updated{job=\"kube-state-metrics\"}[10m])\n ==\n 0\n)\n",
"for": "15m",
"labels": {
"severity": "critical"
"severity": "warning"
}
},
{
"alert": "KubeStatefulSetReplicasMismatch",
"annotations": {
"message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetreplicasmismatch"
"description": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetreplicasmismatch",
"summary": "Deployment has not matched the expected number of replicas."
},
"expr": "kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_status_replicas{job=\"kube-state-metrics\"}\n",
"expr": "(\n kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas{job=\"kube-state-metrics\"}\n) and (\n changes(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}[10m])\n ==\n 0\n)\n",
"for": "15m",
"labels": {
"severity": "critical"
"severity": "warning"
}
},
{
"alert": "KubeStatefulSetGenerationMismatch",
"annotations": {
"message": "StatefulSet generation for {{ $labels.namespace }}/{{ $labels.statefulset }} does not match, this indicates that the StatefulSet has failed but has not been rolled back.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetgenerationmismatch"
"description": "StatefulSet generation for {{ $labels.namespace }}/{{ $labels.statefulset }} does not match, this indicates that the StatefulSet has failed but has not been rolled back.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetgenerationmismatch",
"summary": "StatefulSet generation mismatch due to possible roll-back"
},
"expr": "kube_statefulset_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_metadata_generation{job=\"kube-state-metrics\"}\n",
"for": "15m",
"labels": {
"severity": "critical"
"severity": "warning"
}
},
{
"alert": "KubeStatefulSetUpdateNotRolledOut",
"annotations": {
"message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update has not been rolled out.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetupdatenotrolledout"
"description": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update has not been rolled out.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetupdatenotrolledout",
"summary": "StatefulSet update has not been rolled out."
},
"expr": "max without (revision) (\n kube_statefulset_status_current_revision{job=\"kube-state-metrics\"}\n unless\n kube_statefulset_status_update_revision{job=\"kube-state-metrics\"}\n)\n *\n(\n kube_statefulset_replicas{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}\n)\n",
"expr": "(\n max without (revision) (\n kube_statefulset_status_current_revision{job=\"kube-state-metrics\"}\n unless\n kube_statefulset_status_update_revision{job=\"kube-state-metrics\"}\n )\n *\n (\n kube_statefulset_replicas{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}\n )\n) and (\n changes(kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m",
"labels": {
"severity": "critical"
"severity": "warning"
}
},
{
"alert": "KubeDaemonSetRolloutStuck",
"annotations": {
"message": "Only {{ $value }}% of the desired Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are scheduled and ready.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck"
"description": "DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has not finished or progressed for at least 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck",
"summary": "DaemonSet rollout is stuck."
},
"expr": "kube_daemonset_status_number_ready{job=\"kube-state-metrics\"}\n /\nkube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"} * 100 < 100\n",
"expr": "(\n (\n kube_daemonset_status_current_number_scheduled{job=\"kube-state-metrics\"}\n !=\n kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n ) or (\n kube_daemonset_status_number_misscheduled{job=\"kube-state-metrics\"}\n !=\n 0\n ) or (\n kube_daemonset_status_updated_number_scheduled{job=\"kube-state-metrics\"}\n !=\n kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n ) or (\n kube_daemonset_status_number_available{job=\"kube-state-metrics\"}\n !=\n kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n )\n) and (\n changes(kube_daemonset_status_updated_number_scheduled{job=\"kube-state-metrics\"}[5m])\n ==\n 0\n)\n",
"for": "15m",
"labels": {
"severity": "critical"
"severity": "warning"
}
},
{
"alert": "KubeContainerWaiting",
"annotations": {
"description": "pod/{{ $labels.pod }} in namespace {{ $labels.namespace }} on container {{ $labels.container}} has been in waiting state for longer than 1 hour.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontainerwaiting",
"summary": "Pod container waiting longer than 1 hour"
},
"expr": "sum by (namespace, pod, container) (kube_pod_container_status_waiting_reason{job=\"kube-state-metrics\"}) > 0\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeDaemonSetNotScheduled",
"annotations": {
"message": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetnotscheduled"
"description": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetnotscheduled",
"summary": "DaemonSet pods are not scheduled."
},
"expr": "kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n -\nkube_daemonset_status_current_number_scheduled{job=\"kube-state-metrics\"} > 0\n",
"for": "10m",
@ -603,23 +708,12 @@ data:
{
"alert": "KubeDaemonSetMisScheduled",
"annotations": {
"message": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are running where they are not supposed to run.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetmisscheduled"
"description": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are running where they are not supposed to run.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetmisscheduled",
"summary": "DaemonSet pods are misscheduled."
},
"expr": "kube_daemonset_status_number_misscheduled{job=\"kube-state-metrics\"} > 0\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeCronJobRunning",
"annotations": {
"message": "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is taking more than 1h to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecronjobrunning"
},
"expr": "time() - kube_cronjob_next_schedule_time{job=\"kube-state-metrics\"} > 3600\n",
"for": "1h",
"for": "15m",
"labels": {
"severity": "warning"
}
@ -627,11 +721,12 @@ data:
{
"alert": "KubeJobCompletion",
"annotations": {
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more than one hour to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobcompletion"
"description": "Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more than 12 hours to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobcompletion",
"summary": "Job did not complete in time"
},
"expr": "kube_job_spec_completions{job=\"kube-state-metrics\"} - kube_job_status_succeeded{job=\"kube-state-metrics\"} > 0\n",
"for": "1h",
"for": "12h",
"labels": {
"severity": "warning"
}
@ -639,11 +734,38 @@ data:
{
"alert": "KubeJobFailed",
"annotations": {
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobfailed"
"description": "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete. Removing failed job after investigation should clear this alert.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobfailed",
"summary": "Job failed to complete."
},
"expr": "kube_job_status_failed{job=\"kube-state-metrics\"} > 0\n",
"for": "1h",
"expr": "kube_job_failed{job=\"kube-state-metrics\"} > 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeHpaReplicasMismatch",
"annotations": {
"description": "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} has not matched the desired number of replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpareplicasmismatch",
"summary": "HPA has not matched descired number of replicas."
},
"expr": "(kube_horizontalpodautoscaler_status_desired_replicas{job=\"kube-state-metrics\"}\n !=\nkube_horizontalpodautoscaler_status_current_replicas{job=\"kube-state-metrics\"})\n and\n(kube_horizontalpodautoscaler_status_current_replicas{job=\"kube-state-metrics\"}\n >\nkube_horizontalpodautoscaler_spec_min_replicas{job=\"kube-state-metrics\"})\n and\n(kube_horizontalpodautoscaler_status_current_replicas{job=\"kube-state-metrics\"}\n <\nkube_horizontalpodautoscaler_spec_max_replicas{job=\"kube-state-metrics\"})\n and\nchanges(kube_horizontalpodautoscaler_status_current_replicas{job=\"kube-state-metrics\"}[15m]) == 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeHpaMaxedOut",
"annotations": {
"description": "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} has been running at max replicas for longer than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubehpamaxedout",
"summary": "HPA is running at max replicas"
},
"expr": "kube_horizontalpodautoscaler_status_current_replicas{job=\"kube-state-metrics\"}\n ==\nkube_horizontalpodautoscaler_spec_max_replicas{job=\"kube-state-metrics\"}\n",
"for": "15m",
"labels": {
"severity": "warning"
}
@ -656,58 +778,89 @@ data:
{
"alert": "KubeCPUOvercommit",
"annotations": {
"message": "Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit"
"description": "Cluster has overcommitted CPU resource requests for Pods by {{ $value }} CPU shares and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit",
"summary": "Cluster has overcommitted CPU resource requests."
},
"expr": "sum(namespace:kube_pod_container_resource_requests_cpu_cores:sum)\n /\nsum(kube_node_status_allocatable_cpu_cores)\n >\n(count(kube_node_status_allocatable_cpu_cores)-1) / count(kube_node_status_allocatable_cpu_cores)\n",
"expr": "sum(namespace_cpu:kube_pod_container_resource_requests:sum{}) - (sum(kube_node_status_allocatable{resource=\"cpu\"}) - max(kube_node_status_allocatable{resource=\"cpu\"})) > 0\nand\n(sum(kube_node_status_allocatable{resource=\"cpu\"}) - max(kube_node_status_allocatable{resource=\"cpu\"})) > 0\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeMemoryOvercommit",
"annotations": {
"description": "Cluster has overcommitted memory resource requests for Pods by {{ $value | humanize }} bytes and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememoryovercommit",
"summary": "Cluster has overcommitted memory resource requests."
},
"expr": "sum(namespace_memory:kube_pod_container_resource_requests:sum{}) - (sum(kube_node_status_allocatable{resource=\"memory\"}) - max(kube_node_status_allocatable{resource=\"memory\"})) > 0\nand\n(sum(kube_node_status_allocatable{resource=\"memory\"}) - max(kube_node_status_allocatable{resource=\"memory\"})) > 0\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeCPUQuotaOvercommit",
"annotations": {
"description": "Cluster has overcommitted CPU resource requests for Namespaces.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuquotaovercommit",
"summary": "Cluster has overcommitted CPU resource requests."
},
"expr": "sum(min without(resource) (kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=~\"(cpu|requests.cpu)\"}))\n /\nsum(kube_node_status_allocatable{resource=\"cpu\", job=\"kube-state-metrics\"})\n > 1.5\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeMemOvercommit",
"alert": "KubeMemoryQuotaOvercommit",
"annotations": {
"message": "Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit"
"description": "Cluster has overcommitted memory resource requests for Namespaces.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememoryquotaovercommit",
"summary": "Cluster has overcommitted memory resource requests."
},
"expr": "sum(namespace:kube_pod_container_resource_requests_memory_bytes:sum)\n /\nsum(kube_node_status_allocatable_memory_bytes)\n >\n(count(kube_node_status_allocatable_memory_bytes)-1)\n /\ncount(kube_node_status_allocatable_memory_bytes)\n",
"expr": "sum(min without(resource) (kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=~\"(memory|requests.memory)\"}))\n /\nsum(kube_node_status_allocatable{resource=\"memory\", job=\"kube-state-metrics\"})\n > 1.5\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeCPUOvercommit",
"alert": "KubeQuotaAlmostFull",
"annotations": {
"message": "Cluster has overcommitted CPU resource requests for Namespaces.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit"
"description": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaalmostfull",
"summary": "Namespace quota is going to be full."
},
"expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"cpu\"})\n /\nsum(kube_node_status_allocatable_cpu_cores)\n > 1.5\n",
"for": "5m",
"expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 0.9 < 1\n",
"for": "15m",
"labels": {
"severity": "warning"
"severity": "info"
}
},
{
"alert": "KubeMemOvercommit",
"alert": "KubeQuotaFullyUsed",
"annotations": {
"message": "Cluster has overcommitted memory resource requests for Namespaces.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit"
"description": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotafullyused",
"summary": "Namespace quota is fully used."
},
"expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"memory\"})\n /\nsum(kube_node_status_allocatable_memory_bytes{job=\"node-exporter\"})\n > 1.5\n",
"for": "5m",
"expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n == 1\n",
"for": "15m",
"labels": {
"severity": "warning"
"severity": "info"
}
},
{
"alert": "KubeQuotaExceeded",
"annotations": {
"message": "Namespace {{ $labels.namespace }} is using {{ printf \"%0.0f\" $value }}% of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded"
"description": "Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded",
"summary": "Namespace quota has exceeded the limits."
},
"expr": "100 * kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 90\n",
"expr": "kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 1\n",
"for": "15m",
"labels": {
"severity": "warning"
@ -716,13 +869,14 @@ data:
{
"alert": "CPUThrottlingHigh",
"annotations": {
"message": "{{ printf \"%0.0f\" $value }}% throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh"
"description": "{{ $value | humanizePercentage }} throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container }} in pod {{ $labels.pod }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh",
"summary": "Processes experience elevated CPU throttling."
},
"expr": "100 * sum(increase(container_cpu_cfs_throttled_periods_total{container!=\"\", }[5m])) by (container, pod, namespace)\n /\nsum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)\n > 100 \n",
"expr": "sum(increase(container_cpu_cfs_throttled_periods_total{container!=\"\", }[5m])) by (container, pod, namespace)\n /\nsum(increase(container_cpu_cfs_periods_total{}[5m])) by (container, pod, namespace)\n > ( 80 / 100 )\n",
"for": "15m",
"labels": {
"severity": "warning"
"severity": "info"
}
}
]
@ -731,34 +885,37 @@ data:
"name": "kubernetes-storage",
"rules": [
{
"alert": "KubePersistentVolumeUsageCritical",
"alert": "KubePersistentVolumeFillingUp",
"annotations": {
"message": "The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is only {{ printf \"%0.2f\" $value }}% free.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeusagecritical"
"description": "The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is only {{ $value | humanizePercentage }} free.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefillingup",
"summary": "PersistentVolume is filling up."
},
"expr": "100 * kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\nkubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n < 3\n",
"expr": "(\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 0.03\nand\nkubelet_volume_stats_used_bytes{job=\"kubelet\"} > 0\nunless on(namespace, persistentvolumeclaim)\nkube_persistentvolumeclaim_access_mode{ access_mode=\"ReadOnlyMany\"} == 1\nunless on(namespace, persistentvolumeclaim)\nkube_persistentvolumeclaim_labels{label_excluded_from_alerts=\"true\"} == 1\n",
"for": "1m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubePersistentVolumeFullInFourDays",
"alert": "KubePersistentVolumeFillingUp",
"annotations": {
"message": "Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is expected to fill up within four days. Currently {{ printf \"%0.2f\" $value }}% is available.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefullinfourdays"
"description": "Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is expected to fill up within four days. Currently {{ $value | humanizePercentage }} is available.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefillingup",
"summary": "PersistentVolume is filling up."
},
"expr": "100 * (\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 15\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\n",
"for": "5m",
"expr": "(\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 0.15\nand\nkubelet_volume_stats_used_bytes{job=\"kubelet\"} > 0\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\nunless on(namespace, persistentvolumeclaim)\nkube_persistentvolumeclaim_access_mode{ access_mode=\"ReadOnlyMany\"} == 1\nunless on(namespace, persistentvolumeclaim)\nkube_persistentvolumeclaim_labels{label_excluded_from_alerts=\"true\"} == 1\n",
"for": "1h",
"labels": {
"severity": "critical"
"severity": "warning"
}
},
{
"alert": "KubePersistentVolumeErrors",
"annotations": {
"message": "The persistent volume {{ $labels.persistentvolume }} has status {{ $labels.phase }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeerrors"
"description": "The persistent volume {{ $labels.persistentvolume }} has status {{ $labels.phase }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeerrors",
"summary": "PersistentVolume is having issues with provisioning."
},
"expr": "kube_persistentvolume_status_phase{phase=~\"Failed|Pending\",job=\"kube-state-metrics\"} > 0\n",
"for": "5m",
@ -771,37 +928,14 @@ data:
{
"name": "kubernetes-system",
"rules": [
{
"alert": "KubeNodeNotReady",
"annotations": {
"message": "{{ $labels.node }} has been unready for more than an hour.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready"
},
"expr": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeVersionMismatch",
"annotations": {
"message": "There are {{ $value }} different semantic versions of Kubernetes components running.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch"
"description": "There are {{ $value }} different semantic versions of Kubernetes components running.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch",
"summary": "Different semantic versions of Kubernetes components running."
},
"expr": "count(count by (gitVersion) (label_replace(kubernetes_build_info{job!=\"coredns\"},\"gitVersion\",\"$1\",\"gitVersion\",\"(v[0-9]*.[0-9]*.[0-9]*).*\"))) > 1\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeClientErrors",
"annotations": {
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ printf \"%0.0f\" $value }}% errors.'",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors"
},
"expr": "(sum(rate(rest_client_requests_total{code=~\"5..\"}[5m])) by (instance, job)\n /\nsum(rate(rest_client_requests_total[5m])) by (instance, job))\n* 100 > 1\n",
"expr": "count(count by (git_version) (label_replace(kubernetes_build_info{job!~\"kube-dns|coredns\"},\"git_version\",\"$1\",\"git_version\",\"(v[0-9]*.[0-9]*).*\"))) > 1\n",
"for": "15m",
"labels": {
"severity": "warning"
@ -810,10 +944,187 @@ data:
{
"alert": "KubeClientErrors",
"annotations": {
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ printf \"%0.0f\" $value }} errors / second.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors"
"description": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ $value | humanizePercentage }} errors.'",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors",
"summary": "Kubernetes API server client is experiencing errors."
},
"expr": "sum(rate(ksm_scrape_error_total{job=\"kube-state-metrics\"}[5m])) by (instance, job) > 0.1\n",
"expr": "(sum(rate(rest_client_requests_total{code=~\"5..\"}[5m])) by (instance, job, namespace)\n /\nsum(rate(rest_client_requests_total[5m])) by (instance, job, namespace))\n> 0.01\n",
"for": "15m",
"labels": {
"severity": "warning"
}
}
]
},
{
"name": "kube-apiserver-slos",
"rules": [
{
"alert": "KubeAPIErrorBudgetBurn",
"annotations": {
"description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
},
"expr": "sum(apiserver_request:burnrate1h) > (14.40 * 0.01000)\nand\nsum(apiserver_request:burnrate5m) > (14.40 * 0.01000)\n",
"for": "2m",
"labels": {
"long": "1h",
"severity": "critical",
"short": "5m"
}
},
{
"alert": "KubeAPIErrorBudgetBurn",
"annotations": {
"description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
},
"expr": "sum(apiserver_request:burnrate6h) > (6.00 * 0.01000)\nand\nsum(apiserver_request:burnrate30m) > (6.00 * 0.01000)\n",
"for": "15m",
"labels": {
"long": "6h",
"severity": "critical",
"short": "30m"
}
},
{
"alert": "KubeAPIErrorBudgetBurn",
"annotations": {
"description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
},
"expr": "sum(apiserver_request:burnrate1d) > (3.00 * 0.01000)\nand\nsum(apiserver_request:burnrate2h) > (3.00 * 0.01000)\n",
"for": "1h",
"labels": {
"long": "1d",
"severity": "warning",
"short": "2h"
}
},
{
"alert": "KubeAPIErrorBudgetBurn",
"annotations": {
"description": "The API server is burning too much error budget.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorbudgetburn",
"summary": "The API server is burning too much error budget."
},
"expr": "sum(apiserver_request:burnrate3d) > (1.00 * 0.01000)\nand\nsum(apiserver_request:burnrate6h) > (1.00 * 0.01000)\n",
"for": "3h",
"labels": {
"long": "3d",
"severity": "warning",
"short": "6h"
}
}
]
},
{
"name": "kubernetes-system-apiserver",
"rules": [
{
"alert": "KubeClientCertificateExpiration",
"annotations": {
"description": "A client certificate used to authenticate to kubernetes apiserver is expiring in less than 1.0 hours.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration",
"summary": "Client certificate is about to expire."
},
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 3600\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeClientCertificateExpiration",
"annotations": {
"description": "A client certificate used to authenticate to kubernetes apiserver is expiring in less than 0.1 hours.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration",
"summary": "Client certificate is about to expire."
},
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 300\n",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAggregatedAPIErrors",
"annotations": {
"description": "Kubernetes aggregated API {{ $labels.name }}/{{ $labels.namespace }} has reported errors. It has appeared unavailable {{ $value | humanize }} times averaged over the past 10m.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeaggregatedapierrors",
"summary": "Kubernetes aggregated API has reported errors."
},
"expr": "sum by(name, namespace)(increase(aggregator_unavailable_apiservice_total[10m])) > 4\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAggregatedAPIDown",
"annotations": {
"description": "Kubernetes aggregated API {{ $labels.name }}/{{ $labels.namespace }} has been only {{ $value | humanize }}% available over the last 10m.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeaggregatedapidown",
"summary": "Kubernetes aggregated API is down."
},
"expr": "(1 - max by(name, namespace)(avg_over_time(aggregator_unavailable_apiservice[10m]))) * 100 < 85\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPIDown",
"annotations": {
"description": "KubeAPI has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown",
"summary": "Target disappeared from Prometheus target discovery."
},
"expr": "absent(up{job=\"apiserver\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPITerminatedRequests",
"annotations": {
"description": "The kubernetes apiserver has terminated {{ $value | humanizePercentage }} of its incoming requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapiterminatedrequests",
"summary": "The kubernetes apiserver has terminated {{ $value | humanizePercentage }} of its incoming requests."
},
"expr": "sum(rate(apiserver_request_terminations_total{job=\"apiserver\"}[10m])) / ( sum(rate(apiserver_request_total{job=\"apiserver\"}[10m])) + sum(rate(apiserver_request_terminations_total{job=\"apiserver\"}[10m])) ) > 0.20\n",
"for": "5m",
"labels": {
"severity": "warning"
}
}
]
},
{
"name": "kubernetes-system-kubelet",
"rules": [
{
"alert": "KubeNodeNotReady",
"annotations": {
"description": "{{ $labels.node }} has been unready for more than 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready",
"summary": "Node is not ready."
},
"expr": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeNodeUnreachable",
"annotations": {
"description": "{{ $labels.node }} is unreachable and some workloads may be rescheduled.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodeunreachable",
"summary": "Node is unreachable."
},
"expr": "(kube_node_spec_taint{job=\"kube-state-metrics\",key=\"node.kubernetes.io/unreachable\",effect=\"NoSchedule\"} unless ignoring(key,value) kube_node_spec_taint{job=\"kube-state-metrics\",key=~\"ToBeDeletedByClusterAutoscaler|cloud.google.com/impending-node-termination|aws-node-termination-handler/spot-itn\"}) == 1\n",
"for": "15m",
"labels": {
"severity": "warning"
@ -822,105 +1133,192 @@ data:
{
"alert": "KubeletTooManyPods",
"annotations": {
"message": "Kubelet {{ $labels.instance }} is running {{ $value }} Pods, close to the limit of 110.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods"
"description": "Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage }} of its Pod capacity.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods",
"summary": "Kubelet is running at capacity."
},
"expr": "kubelet_running_pod_count{job=\"kubelet\"} > 110 * 0.9\n",
"expr": "count by(node) (\n (kube_pod_status_phase{job=\"kube-state-metrics\",phase=\"Running\"} == 1) * on(instance,pod,namespace,cluster) group_left(node) topk by(instance,pod,namespace,cluster) (1, kube_pod_info{job=\"kube-state-metrics\"})\n)\n/\nmax by(node) (\n kube_node_status_capacity{job=\"kube-state-metrics\",resource=\"pods\"} != 1\n) > 0.95\n",
"for": "15m",
"labels": {
"severity": "info"
}
},
{
"alert": "KubeNodeReadinessFlapping",
"annotations": {
"description": "The readiness status of node {{ $labels.node }} has changed {{ $value }} times in the last 15 minutes.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping",
"summary": "Node readiness status is flapping."
},
"expr": "sum(changes(kube_node_status_condition{status=\"true\",condition=\"Ready\"}[15m])) by (node) > 2\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPILatencyHigh",
"alert": "KubeletPlegDurationHigh",
"annotations": {
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
"description": "The Kubelet Pod Lifecycle Event Generator has a 99th percentile duration of {{ $value }} seconds on node {{ $labels.node }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletplegdurationhigh",
"summary": "Kubelet Pod Lifecycle Event Generator is taking too long to relist."
},
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"^(?:LIST|WATCH|WATCHLIST|PROXY|CONNECT)$\"} > 1\n",
"for": "10m",
"expr": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile{quantile=\"0.99\"} >= 10\n",
"for": "5m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPILatencyHigh",
"alert": "KubeletPodStartUpLatencyHigh",
"annotations": {
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
"description": "Kubelet Pod startup 99th percentile latency is {{ $value }} seconds on node {{ $labels.node }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh",
"summary": "Kubelet Pod startup latency is too high."
},
"expr": "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"^(?:LIST|WATCH|WATCHLIST|PROXY|CONNECT)$\"} > 4\n",
"for": "10m",
"expr": "histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\"}[5m])) by (instance, le)) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\"} > 60\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletClientCertificateExpiration",
"annotations": {
"description": "Client certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletclientcertificateexpiration",
"summary": "Kubelet client certificate is about to expire."
},
"expr": "kubelet_certificate_manager_client_ttl_seconds < 3600\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletClientCertificateExpiration",
"annotations": {
"description": "Client certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletclientcertificateexpiration",
"summary": "Kubelet client certificate is about to expire."
},
"expr": "kubelet_certificate_manager_client_ttl_seconds < 300\n",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"alert": "KubeletServerCertificateExpiration",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
"description": "Server certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletservercertificateexpiration",
"summary": "Kubelet server certificate is about to expire."
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m]))\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) * 100 > 3\n",
"for": "10m",
"expr": "kubelet_certificate_manager_server_ttl_seconds < 3600\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletServerCertificateExpiration",
"annotations": {
"description": "Server certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletservercertificateexpiration",
"summary": "Kubelet server certificate is about to expire."
},
"expr": "kubelet_certificate_manager_server_ttl_seconds < 300\n",
"labels": {
"severity": "critical"
}
},
{
"alert": "KubeAPIErrorsHigh",
"alert": "KubeletClientCertificateRenewalErrors",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
"description": "Kubelet on node {{ $labels.node }} has failed to renew its client certificate ({{ $value | humanize }} errors in the last 5 minutes).",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletclientcertificaterenewalerrors",
"summary": "Kubelet has failed to renew its client certificate."
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m]))\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) * 100 > 1\n",
"for": "10m",
"expr": "increase(kubelet_certificate_manager_client_expiration_renew_errors[5m]) > 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeAPIErrorsHigh",
"alert": "KubeletServerCertificateRenewalErrors",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
"description": "Kubelet on node {{ $labels.node }} has failed to renew its server certificate ({{ $value | humanize }} errors in the last 5 minutes).",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletservercertificaterenewalerrors",
"summary": "Kubelet has failed to renew its server certificate."
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (resource,subresource,verb) * 100 > 10\n",
"for": "10m",
"expr": "increase(kubelet_server_expiration_renew_errors[5m]) > 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "KubeletDown",
"annotations": {
"description": "Kubelet has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown",
"summary": "Target disappeared from Prometheus target discovery."
},
"expr": "absent(up{job=\"kubelet\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
}
]
},
{
"name": "kubernetes-system-scheduler",
"rules": [
{
"alert": "KubeAPIErrorsHigh",
"alert": "KubeSchedulerDown",
"annotations": {
"message": "API server is returning errors for {{ $value }}% of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
"description": "KubeScheduler has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown",
"summary": "Target disappeared from Prometheus target discovery."
},
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m])) by (resource,subresource,verb)\n /\nsum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (resource,subresource,verb) * 100 > 5\n",
"for": "10m",
"expr": "absent(up{job=\"kube-scheduler\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "warning"
"severity": "critical"
}
},
}
]
},
{
"name": "kubernetes-system-controller-manager",
"rules": [
{
"alert": "KubeClientCertificateExpiration",
"alert": "KubeControllerManagerDown",
"annotations": {
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 7.0 days.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration"
"description": "KubeControllerManager has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown",
"summary": "Target disappeared from Prometheus target discovery."
},
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 604800\n",
"expr": "absent(up{job=\"kube-controller-manager\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "warning"
"severity": "critical"
}
},
}
]
},
{
"name": "kubernetes-system-kube-proxy",
"rules": [
{
"alert": "KubeClientCertificateExpiration",
"alert": "KubeProxyDown",
"annotations": {
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration"
"description": "KubeProxy has disappeared from Prometheus target discovery.",
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeproxydown",
"summary": "Target disappeared from Prometheus target discovery."
},
"expr": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 86400\n",
"expr": "absent(up{job=\"kube-proxy\"} == 1)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
@ -929,136 +1327,273 @@ data:
}
]
}
kubeprom.yaml: |-
node-exporter.yaml: |-
{
"groups": [
{
"name": "kube-prometheus-node-recording.rules",
"name": "node-exporter.rules",
"rules": [
{
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[3m])) BY (instance)",
"record": "instance:node_cpu:rate:sum"
"expr": "count without (cpu, mode) (\n node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}\n)\n",
"record": "instance:node_num_cpu:sum"
},
{
"expr": "sum((node_filesystem_size_bytes{mountpoint=\"/\"} - node_filesystem_free_bytes{mountpoint=\"/\"})) BY (instance)",
"record": "instance:node_filesystem_usage:sum"
"expr": "1 - avg without (cpu) (\n sum without (mode) (rate(node_cpu_seconds_total{job=\"node-exporter\", mode=~\"idle|iowait|steal\"}[5m]))\n)\n",
"record": "instance:node_cpu_utilisation:rate5m"
},
{
"expr": "sum(rate(node_network_receive_bytes_total[3m])) BY (instance)",
"record": "instance:node_network_receive_bytes:rate:sum"
"expr": "(\n node_load1{job=\"node-exporter\"}\n/\n instance:node_num_cpu:sum{job=\"node-exporter\"}\n)\n",
"record": "instance:node_load1_per_cpu:ratio"
},
{
"expr": "sum(rate(node_network_transmit_bytes_total[3m])) BY (instance)",
"record": "instance:node_network_transmit_bytes:rate:sum"
"expr": "1 - (\n (\n node_memory_MemAvailable_bytes{job=\"node-exporter\"}\n or\n (\n node_memory_Buffers_bytes{job=\"node-exporter\"}\n +\n node_memory_Cached_bytes{job=\"node-exporter\"}\n +\n node_memory_MemFree_bytes{job=\"node-exporter\"}\n +\n node_memory_Slab_bytes{job=\"node-exporter\"}\n )\n )\n/\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n)\n",
"record": "instance:node_memory_utilisation:ratio"
},
{
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[5m])) WITHOUT (cpu, mode) / ON(instance) GROUP_LEFT() count(sum(node_cpu_seconds_total) BY (instance, cpu)) BY (instance)",
"record": "instance:node_cpu:ratio"
"expr": "rate(node_vmstat_pgmajfault{job=\"node-exporter\"}[5m])\n",
"record": "instance:node_vmstat_pgmajfault:rate5m"
},
{
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[5m]))",
"record": "cluster:node_cpu:sum_rate5m"
"expr": "rate(node_disk_io_time_seconds_total{job=\"node-exporter\", device!~\"dm.*\"}[5m])\n",
"record": "instance_device:node_disk_io_time_seconds:rate5m"
},
{
"expr": "cluster:node_cpu_seconds_total:rate5m / count(sum(node_cpu_seconds_total) BY (instance, cpu))",
"record": "cluster:node_cpu:ratio"
"expr": "rate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\", device!~\"dm.*\"}[5m])\n",
"record": "instance_device:node_disk_io_time_weighted_seconds:rate5m"
},
{
"expr": "sum without (device) (\n rate(node_network_receive_bytes_total{job=\"node-exporter\", device!=\"lo\"}[5m])\n)\n",
"record": "instance:node_network_receive_bytes_excluding_lo:rate5m"
},
{
"expr": "sum without (device) (\n rate(node_network_transmit_bytes_total{job=\"node-exporter\", device!=\"lo\"}[5m])\n)\n",
"record": "instance:node_network_transmit_bytes_excluding_lo:rate5m"
},
{
"expr": "sum without (device) (\n rate(node_network_receive_drop_total{job=\"node-exporter\", device!=\"lo\"}[5m])\n)\n",
"record": "instance:node_network_receive_drop_excluding_lo:rate5m"
},
{
"expr": "sum without (device) (\n rate(node_network_transmit_drop_total{job=\"node-exporter\", device!=\"lo\"}[5m])\n)\n",
"record": "instance:node_network_transmit_drop_excluding_lo:rate5m"
}
]
},
{
"name": "kube-prometheus-node-alerting.rules",
"name": "node-exporter",
"rules": [
{
"alert": "NodeDiskRunningFull",
"alert": "NodeFilesystemSpaceFillingUp",
"annotations": {
"message": "Device {{ $labels.device }} of node-exporter {{ $labels.namespace }}/{{ $labels.pod }} will be full within the next 24 hours."
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left and is filling up.",
"summary": "Filesystem is predicted to run out of space within the next 24 hours."
},
"expr": "(node:node_filesystem_usage: > 0.85) and (predict_linear(node:node_filesystem_avail:[6h], 3600 * 24) < 0)\n",
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 40\nand\n predict_linear(node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 24*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFilesystemSpaceFillingUp",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left and is filling up fast.",
"summary": "Filesystem is predicted to run out of space within the next 4 hours."
},
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 20\nand\n predict_linear(node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 4*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeFilesystemAlmostOutOfSpace",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left.",
"summary": "Filesystem has less than 5% space left."
},
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 5\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "30m",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeDiskRunningFull",
"alert": "NodeFilesystemAlmostOutOfSpace",
"annotations": {
"message": "Device {{ $labels.device }} of node-exporter {{ $labels.namespace }}/{{ $labels.pod }} will be full within the next 2 hours."
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available space left.",
"summary": "Filesystem has less than 3% space left."
},
"expr": "(node:node_filesystem_usage: > 0.85) and (predict_linear(node:node_filesystem_avail:[30m], 3600 * 2) < 0)\n",
"for": "10m",
"expr": "(\n node_filesystem_avail_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_size_bytes{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 3\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "30m",
"labels": {
"severity": "critical"
}
}
]
},
{
"name": "node-time",
"rules": [
},
{
"alert": "ClockSkewDetected",
"alert": "NodeFilesystemFilesFillingUp",
"annotations": {
"message": "Clock skew detected on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}. Ensure NTP is configured correctly on this host."
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left and is filling up.",
"summary": "Filesystem is predicted to run out of inodes within the next 24 hours."
},
"expr": "abs(node_timex_offset_seconds{job=\"node-exporter\"}) > 0.03\n",
"for": "2m",
"labels": {
"severity": "warning"
}
}
]
},
{
"name": "node-network",
"rules": [
{
"alert": "NetworkReceiveErrors",
"annotations": {
"message": "Network interface \"{{ $labels.device }}\" showing receive errors on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}\""
},
"expr": "rate(node_network_receive_errs_total{job=\"node-exporter\",device!~\"veth.+|tunl.+\"}[2m]) > 0\n",
"for": "2m",
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 40\nand\n predict_linear(node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 24*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NetworkTransmitErrors",
"alert": "NodeFilesystemFilesFillingUp",
"annotations": {
"message": "Network interface \"{{ $labels.device }}\" showing transmit errors on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}\""
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left and is filling up fast.",
"summary": "Filesystem is predicted to run out of inodes within the next 4 hours."
},
"expr": "rate(node_network_transmit_errs_total{job=\"node-exporter\",device!~\"veth.+|tunl.+\"}[2m]) > 0\n",
"for": "2m",
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 20\nand\n predict_linear(node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"}[6h], 4*60*60) < 0\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeFilesystemAlmostOutOfFiles",
"annotations": {
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left.",
"summary": "Filesystem has less than 5% inodes left."
},
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 5\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeNetworkInterfaceFlapping",
"alert": "NodeFilesystemAlmostOutOfFiles",
"annotations": {
"message": "Network interface \"{{ $labels.device }}\" changing it's up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}\""
"description": "Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf \"%.2f\" $value }}% available inodes left.",
"summary": "Filesystem has less than 3% inodes left."
},
"expr": "changes(node_network_up{job=\"node-exporter\",device!~\"veth.+|tunl.+\"}[2m]) > 2\n",
"for": "2m",
"expr": "(\n node_filesystem_files_free{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} / node_filesystem_files{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} * 100 < 3\nand\n node_filesystem_readonly{job=\"node-exporter\",fstype!~\"tmpfs|nsfs|vfat\"} == 0\n)\n",
"for": "1h",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeNetworkReceiveErrs",
"annotations": {
"description": "{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} receive errors in the last two minutes.",
"summary": "Network interface is reporting many receive errors."
},
"expr": "rate(node_network_receive_errs_total[2m]) / rate(node_network_receive_packets_total[2m]) > 0.01\n",
"for": "1h",
"labels": {
"severity": "warning"
}
}
]
},
{
"name": "general.rules",
"rules": [
},
{
"alert": "TargetDown",
"alert": "NodeNetworkTransmitErrs",
"annotations": {
"message": "{{ $value }}% of the {{ $labels.job }} targets are down."
"description": "{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} transmit errors in the last two minutes.",
"summary": "Network interface is reporting many transmit errors."
},
"expr": "100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10",
"expr": "rate(node_network_transmit_errs_total[2m]) / rate(node_network_transmit_packets_total[2m]) > 0.01\n",
"for": "1h",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeHighNumberConntrackEntriesUsed",
"annotations": {
"description": "{{ $value | humanizePercentage }} of conntrack entries are used.",
"summary": "Number of conntrack are getting close to the limit."
},
"expr": "(node_nf_conntrack_entries / node_nf_conntrack_entries_limit) > 0.75\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeTextFileCollectorScrapeError",
"annotations": {
"description": "Node Exporter text file collector failed to scrape.",
"summary": "Node Exporter text file collector failed to scrape."
},
"expr": "node_textfile_scrape_error{job=\"node-exporter\"} == 1\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeClockSkewDetected",
"annotations": {
"description": "Clock on {{ $labels.instance }} is out of sync by more than 300s. Ensure NTP is configured correctly on this host.",
"summary": "Clock skew detected."
},
"expr": "(\n node_timex_offset_seconds > 0.05\nand\n deriv(node_timex_offset_seconds[5m]) >= 0\n)\nor\n(\n node_timex_offset_seconds < -0.05\nand\n deriv(node_timex_offset_seconds[5m]) <= 0\n)\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeClockNotSynchronising",
"annotations": {
"description": "Clock on {{ $labels.instance }} is not synchronising. Ensure NTP is configured on this host.",
"summary": "Clock not synchronising."
},
"expr": "min_over_time(node_timex_sync_status[5m]) == 0\nand\nnode_timex_maxerror_seconds >= 16\n",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeRAIDDegraded",
"annotations": {
"description": "RAID array '{{ $labels.device }}' on {{ $labels.instance }} is in degraded state due to one or more disks failures. Number of spare drives is insufficient to fix issue automatically.",
"summary": "RAID Array is degraded"
},
"expr": "node_md_disks_required - ignoring (state) (node_md_disks{state=\"active\"}) > 0\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "NodeRAIDDiskFailure",
"annotations": {
"description": "At least one device in RAID array on {{ $labels.instance }} failed. Array '{{ $labels.device }}' needs attention and possibly a disk swap.",
"summary": "Failed device in RAID array"
},
"expr": "node_md_disks{state=\"failed\"} > 0\n",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFileDescriptorLimit",
"annotations": {
"description": "File descriptors limit at {{ $labels.instance }} is currently at {{ printf \"%.2f\" $value }}%.",
"summary": "Kernel is predicted to exhaust file descriptors limit soon."
},
"expr": "(\n node_filefd_allocated{job=\"node-exporter\"} * 100 / node_filefd_maximum{job=\"node-exporter\"} > 70\n)\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "NodeFileDescriptorLimit",
"annotations": {
"description": "File descriptors limit at {{ $labels.instance }} is currently at {{ printf \"%.2f\" $value }}%.",
"summary": "Kernel is predicted to exhaust file descriptors limit soon."
},
"expr": "(\n node_filefd_allocated{job=\"node-exporter\"} * 100 / node_filefd_maximum{job=\"node-exporter\"} > 90\n)\n",
"for": "15m",
"labels": {
"severity": "critical"
}
}
]
}
@ -1106,18 +1641,6 @@ data:
"severity": "warning"
}
},
{
"alert": "PrometheusErrorSendingAlertsToAnyAlertmanager",
"annotations": {
"description": "{{ printf \"%.1f\" $value }}% minimum errors while sending alerts from Prometheus {{$labels.instance}} to any Alertmanager.",
"summary": "Prometheus encounters more than 3% errors sending alerts to any Alertmanager."
},
"expr": "min without(alertmanager) (\n rate(prometheus_notifications_errors_total{job=\"prometheus\"}[5m])\n/\n rate(prometheus_notifications_sent_total{job=\"prometheus\"}[5m])\n)\n* 100\n> 3\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "PrometheusNotConnectedToAlertmanagers",
"annotations": {
@ -1154,25 +1677,13 @@ data:
"severity": "warning"
}
},
{
"alert": "PrometheusTSDBWALCorruptions",
"annotations": {
"description": "Prometheus {{$labels.instance}} has detected {{$value | humanize}} corruptions of the write-ahead log (WAL) over the last 3h.",
"summary": "Prometheus is detecting WAL corruptions."
},
"expr": "increase(tsdb_wal_corruptions_total{job=\"prometheus\"}[3h]) > 0\n",
"for": "4h",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusNotIngestingSamples",
"annotations": {
"description": "Prometheus {{$labels.instance}} is not ingesting samples.",
"summary": "Prometheus is not ingesting samples."
},
"expr": "rate(prometheus_tsdb_head_samples_appended_total{job=\"prometheus\"}[5m]) <= 0\n",
"expr": "(\n rate(prometheus_tsdb_head_samples_appended_total{job=\"prometheus\"}[5m]) <= 0\nand\n (\n sum without(scrape_job) (prometheus_target_metadata_cache_entries{job=\"prometheus\"}) > 0\n or\n sum without(rule_group) (prometheus_rule_group_rules{job=\"prometheus\"}) > 0\n )\n)\n",
"for": "10m",
"labels": {
"severity": "warning"
@ -1181,7 +1692,7 @@ data:
{
"alert": "PrometheusDuplicateTimestamps",
"annotations": {
"description": "Prometheus {{$labels.instance}} is dropping {{$value | humanize}} samples/s with different values but duplicated timestamp.",
"description": "Prometheus {{$labels.instance}} is dropping {{ printf \"%.4g\" $value }} samples/s with different values but duplicated timestamp.",
"summary": "Prometheus is dropping samples with duplicate timestamps."
},
"expr": "rate(prometheus_target_scrapes_sample_duplicate_timestamp_total{job=\"prometheus\"}[5m]) > 0\n",
@ -1193,7 +1704,7 @@ data:
{
"alert": "PrometheusOutOfOrderTimestamps",
"annotations": {
"description": "Prometheus {{$labels.instance}} is dropping {{$value | humanize}} samples/s with timestamps arriving out of order.",
"description": "Prometheus {{$labels.instance}} is dropping {{ printf \"%.4g\" $value }} samples/s with timestamps arriving out of order.",
"summary": "Prometheus drops samples with out-of-order timestamps."
},
"expr": "rate(prometheus_target_scrapes_sample_out_of_order_total{job=\"prometheus\"}[5m]) > 0\n",
@ -1205,10 +1716,10 @@ data:
{
"alert": "PrometheusRemoteStorageFailures",
"annotations": {
"description": "Prometheus {{$labels.instance}} failed to send {{ printf \"%.1f\" $value }}% of the samples to queue {{$labels.queue}}.",
"description": "Prometheus {{$labels.instance}} failed to send {{ printf \"%.1f\" $value }}% of the samples to {{ $labels.remote_name}}:{{ $labels.url }}",
"summary": "Prometheus fails to send samples to remote storage."
},
"expr": "(\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n/\n (\n rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m])\n +\n rate(prometheus_remote_storage_succeeded_samples_total{job=\"prometheus\"}[5m])\n )\n)\n* 100\n> 1\n",
"expr": "(\n (rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m]) or rate(prometheus_remote_storage_samples_failed_total{job=\"prometheus\"}[5m]))\n/\n (\n (rate(prometheus_remote_storage_failed_samples_total{job=\"prometheus\"}[5m]) or rate(prometheus_remote_storage_samples_failed_total{job=\"prometheus\"}[5m]))\n +\n (rate(prometheus_remote_storage_succeeded_samples_total{job=\"prometheus\"}[5m]) or rate(prometheus_remote_storage_samples_total{job=\"prometheus\"}[5m]))\n )\n)\n* 100\n> 1\n",
"for": "15m",
"labels": {
"severity": "critical"
@ -1217,15 +1728,27 @@ data:
{
"alert": "PrometheusRemoteWriteBehind",
"annotations": {
"description": "Prometheus {{$labels.instance}} remote write is {{ printf \"%.1f\" $value }}s behind for queue {{$labels.queue}}.",
"description": "Prometheus {{$labels.instance}} remote write is {{ printf \"%.1f\" $value }}s behind for {{ $labels.remote_name}}:{{ $labels.url }}.",
"summary": "Prometheus remote write is behind."
},
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds{job=\"prometheus\"}[5m])\n- on(job, instance) group_right\n max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job=\"prometheus\"}[5m])\n)\n> 120\n",
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds{job=\"prometheus\"}[5m])\n- ignoring(remote_name, url) group_right\n max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds{job=\"prometheus\"}[5m])\n)\n> 120\n",
"for": "15m",
"labels": {
"severity": "critical"
}
},
{
"alert": "PrometheusRemoteWriteDesiredShards",
"annotations": {
"description": "Prometheus {{$labels.instance}} remote write desired shards calculation wants to run {{ $value }} shards for queue {{ $labels.remote_name}}:{{ $labels.url }}, which is more than the max of {{ printf `prometheus_remote_storage_shards_max{instance=\"%s\",job=\"prometheus\"}` $labels.instance | query | first | value }}.",
"summary": "Prometheus remote write desired shards calculation wants to run more than configured max shards."
},
"expr": "# Without max_over_time, failed scrapes could create false negatives, see\n# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.\n(\n max_over_time(prometheus_remote_storage_shards_desired{job=\"prometheus\"}[5m])\n>\n max_over_time(prometheus_remote_storage_shards_max{job=\"prometheus\"}[5m])\n)\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusRuleFailures",
"annotations": {
@ -1249,8 +1772,108 @@ data:
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusTargetLimitHit",
"annotations": {
"description": "Prometheus {{$labels.instance}} has dropped {{ printf \"%.0f\" $value }} targets because the number of targets exceeded the configured target_limit.",
"summary": "Prometheus has dropped targets because some scrape configs have exceeded the targets limit."
},
"expr": "increase(prometheus_target_scrape_pool_exceeded_target_limit_total{job=\"prometheus\"}[5m]) > 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusLabelLimitHit",
"annotations": {
"description": "Prometheus {{$labels.instance}} has dropped {{ printf \"%.0f\" $value }} targets because some samples exceeded the configured label_limit, label_name_length_limit or label_value_length_limit.",
"summary": "Prometheus has dropped targets because some scrape configs have exceeded the labels limit."
},
"expr": "increase(prometheus_target_scrape_pool_exceeded_label_limits_total{job=\"prometheus\"}[5m]) > 0\n",
"for": "15m",
"labels": {
"severity": "warning"
}
},
{
"alert": "PrometheusTargetSyncFailure",
"annotations": {
"description": "{{ printf \"%.0f\" $value }} targets in Prometheus {{$labels.instance}} have failed to sync because invalid configuration was supplied.",
"summary": "Prometheus has failed to sync targets."
},
"expr": "increase(prometheus_target_sync_failed_total{job=\"prometheus\"}[30m]) > 0\n",
"for": "5m",
"labels": {
"severity": "critical"
}
},
{
"alert": "PrometheusErrorSendingAlertsToAnyAlertmanager",
"annotations": {
"description": "{{ printf \"%.1f\" $value }}% minimum errors while sending alerts from Prometheus {{$labels.instance}} to any Alertmanager.",
"summary": "Prometheus encounters more than 3% errors sending alerts to any Alertmanager."
},
"expr": "min without (alertmanager) (\n rate(prometheus_notifications_errors_total{job=\"prometheus\",alertmanager!~``}[5m])\n/\n rate(prometheus_notifications_sent_total{job=\"prometheus\",alertmanager!~``}[5m])\n)\n* 100\n> 3\n",
"for": "15m",
"labels": {
"severity": "critical"
}
}
]
}
]
}
typhoon.yaml: |-
{
"groups": [
{
"name": "general.rules",
"rules": [
{
"alert": "TargetDown",
"annotations": {
"message": "{{ printf \"%.4g\" $value }}% of the {{ $labels.job }} targets are down."
},
"expr": "100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, namespace, service)) > 10",
"for": "10m",
"labels": {
"severity": "warning"
}
},
{
"alert": "BlackboxProbeFailure",
"annotations": {
"message": "Blackbox probe {{$labels.instance}} failed"
},
"expr": "probe_success == 0",
"for": "2m",
"labels": {
"severity": "critical"
}
}
]
},
{
"name": "extra.rules",
"rules": [
{
"alert": "InactiveRAIDDisk",
"annotations": {
"message": "{{ $value }} RAID disk(s) on node {{ $labels.instance }} are inactive."
},
"expr": "node_md_disks{state=\"failed\"} > 0",
"for": "10m",
"labels": {
"severity": "warning"
}
}
]
}
]
}
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: monitoring

View File

@ -5,6 +5,7 @@ metadata:
namespace: monitoring
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9090'
spec:
type: ClusterIP
selector:

View File

@ -1,50 +0,0 @@
locals {
# Pick a CoreOS Container Linux derivative
# coreos-stable -> Container Linux AMI
# flatcar-stable -> Flatcar Linux AMI
ami_id = local.flavor == "flatcar" ? data.aws_ami.flatcar.image_id : data.aws_ami.coreos.image_id
flavor = element(split("-", var.os_image), 0)
channel = element(split("-", var.os_image), 1)
}
data "aws_ami" "coreos" {
most_recent = true
owners = ["595879546273"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["CoreOS-${local.flavor == "coreos" ? local.channel : "stable"}-*"]
}
}
data "aws_ami" "flatcar" {
most_recent = true
owners = ["075585003325"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-${local.flavor == "flatcar" ? local.channel : "stable"}-*"]
}
}

View File

@ -1,163 +0,0 @@
---
systemd:
units:
- name: etcd-member.service
enable: true
dropins:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.3.13"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380"
Environment="ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381"
Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}"
Environment="ETCD_STRICT_RECONFIG_CHECK=true"
Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt"
Environment="ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt"
Environment="ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key"
Environment="ETCD_CLIENT_CERT_AUTH=true"
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt"
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt"
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key"
Environment="ETCD_PEER_CLIENT_CERT_AUTH=true"
- name: docker.service
enable: true
- name: locksmithd.service
mask: true
- name: wait-for-dns.service
enable: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
enable: true
contents: |
[Unit]
Description=Kubelet via Hyperkube
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--insecure-options=image"
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--cgroup-driver=$${KUBELET_CGROUP_DRIVER} \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/master \
--node-labels=node-role.kubernetes.io/controller="true" \
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: bootkube.service
contents: |
[Unit]
Description=Bootstrap a Kubernetes cluster
ConditionPathExists=!/opt/bootkube/init_bootkube.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootkube
ExecStart=/opt/bootkube/bootkube-start
ExecStartPost=/bin/touch /opt/bootkube/init_bootkube.done
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubeconfig
filesystem: root
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/kubernetes/kubelet.env
filesystem: root
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.15.2
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /opt/bootkube/bootkube-start
filesystem: root
mode: 0544
user:
id: 500
group:
id: 500
contents:
inline: |
#!/bin/bash
# Wrapper for bootkube start
set -e
# Move experimental manifests
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
exec /usr/bin/rkt run \
--trust-keys-from-https \
--volume assets,kind=host,source=/opt/bootkube/assets \
--mount volume=assets,target=/assets \
--volume bootstrap,kind=host,source=/etc/kubernetes \
--mount volume=bootstrap,target=/etc/kubernetes \
$${RKT_OPTS} \
quay.io/coreos/bootkube:v0.14.0 \
--net=host \
--dns=host \
--exec=/bootkube -- start --asset-dir=/assets "$@"
passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_key}"

View File

@ -1,92 +0,0 @@
# Secure copy etcd TLS assets to controllers.
resource "null_resource" "copy-controller-secrets" {
count = var.controller_count
connection {
type = "ssh"
host = element(aws_instance.controllers.*.public_ip, count.index)
user = "core"
timeout = "15m"
}
provisioner "file" {
content = module.bootkube.etcd_ca_cert
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = module.bootkube.etcd_client_cert
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = module.bootkube.etcd_client_key
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = module.bootkube.etcd_server_cert
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = module.bootkube.etcd_server_key
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = module.bootkube.etcd_peer_cert
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = module.bootkube.etcd_peer_key
destination = "$HOME/etcd-peer.key"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
]
}
}
# Secure copy bootkube assets to ONE controller and start bootkube to perform
# one-time self-hosted cluster bootstrapping.
resource "null_resource" "bootkube-start" {
depends_on = [
module.bootkube,
module.workers,
aws_route53_record.apiserver,
null_resource.copy-controller-secrets,
]
connection {
type = "ssh"
host = aws_instance.controllers[0].public_ip
user = "core"
timeout = "15m"
}
provisioner "file" {
source = var.asset_dir
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mv $HOME/assets /opt/bootkube",
"sudo systemctl start bootkube",
]
}
}

View File

@ -1,156 +0,0 @@
variable "cluster_name" {
type = string
description = "Unique cluster name (prepended to dns_zone)"
}
# AWS
variable "dns_zone" {
type = string
description = "AWS Route53 DNS Zone (e.g. aws.example.com)"
}
variable "dns_zone_id" {
type = string
description = "AWS Route53 DNS Zone ID (e.g. Z3PAABBCFAKEC0)"
}
# instances
variable "controller_count" {
type = string
default = "1"
description = "Number of controllers (i.e. masters)"
}
variable "worker_count" {
type = string
default = "1"
description = "Number of workers"
}
variable "controller_type" {
type = string
default = "t3.small"
description = "EC2 instance type for controllers"
}
variable "worker_type" {
type = string
default = "t3.small"
description = "EC2 instance type for workers"
}
variable "os_image" {
type = string
default = "coreos-stable"
description = "AMI channel for a Container Linux derivative (coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha, flatcar-edge)"
}
variable "disk_size" {
type = string
default = "40"
description = "Size of the EBS volume in GB"
}
variable "disk_type" {
type = string
default = "gp2"
description = "Type of the EBS volume (e.g. standard, gp2, io1)"
}
variable "disk_iops" {
type = string
default = "0"
description = "IOPS of the EBS volume (e.g. 100)"
}
variable "worker_price" {
type = string
default = ""
description = "Spot price in USD for autoscaling group spot instances. Leave as default empty string for autoscaling group to use on-demand instances. Note, switching in-place from spot to on-demand is not possible: https://github.com/terraform-providers/terraform-provider-aws/issues/4320"
}
variable "worker_target_groups" {
type = list(string)
description = "Additional target group ARNs to which worker instances should be added"
default = []
}
variable "controller_clc_snippets" {
type = list(string)
description = "Controller Container Linux Config snippets"
default = []
}
variable "worker_clc_snippets" {
type = list(string)
description = "Worker Container Linux Config snippets"
default = []
}
# configuration
variable "ssh_authorized_key" {
type = string
description = "SSH public key for user 'core'"
}
variable "asset_dir" {
description = "Path to a directory where generated assets should be placed (contains secrets)"
type = string
}
variable "networking" {
description = "Choice of networking provider (calico or flannel)"
type = string
default = "calico"
}
variable "network_mtu" {
description = "CNI interface MTU (applies to calico only). Use 8981 if using instances types with Jumbo frames."
type = string
default = "1480"
}
variable "host_cidr" {
description = "CIDR IPv4 range to assign to EC2 nodes"
type = string
default = "10.0.0.0/16"
}
variable "pod_cidr" {
description = "CIDR IPv4 range to assign Kubernetes pods"
type = string
default = "10.2.0.0/16"
}
variable "service_cidr" {
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
type = string
default = "10.3.0.0/16"
}
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = string
default = "cluster.local"
}
variable "enable_reporting" {
type = string
description = "Enable usage or analytics reporting to upstreams (Calico)"
default = "false"
}
variable "enable_aggregation" {
description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
type = string
default = "false"
}

View File

@ -1,11 +0,0 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.12.0"
required_providers {
aws = "~> 2.7"
ct = "~> 0.3"
template = "~> 2.1"
null = "~> 2.1"
}
}

View File

@ -1,23 +0,0 @@
module "workers" {
source = "./workers"
name = var.cluster_name
# AWS
vpc_id = aws_vpc.network.id
subnet_ids = aws_subnet.public.*.id
security_groups = [aws_security_group.worker.id]
worker_count = var.worker_count
instance_type = var.worker_type
os_image = var.os_image
disk_size = var.disk_size
spot_price = var.worker_price
target_groups = var.worker_target_groups
# configuration
kubeconfig = module.bootkube.kubeconfig-kubelet
ssh_authorized_key = var.ssh_authorized_key
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
clc_snippets = var.worker_clc_snippets
}

View File

@ -1,50 +0,0 @@
locals {
# Pick a CoreOS Container Linux derivative
# coreos-stable -> Container Linux AMI
# flatcar-stable -> Flatcar Linux AMI
ami_id = local.flavor == "flatcar" ? data.aws_ami.flatcar.image_id : data.aws_ami.coreos.image_id
flavor = element(split("-", var.os_image), 0)
channel = element(split("-", var.os_image), 1)
}
data "aws_ami" "coreos" {
most_recent = true
owners = ["595879546273"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["CoreOS-${local.flavor == "coreos" ? local.channel : "stable"}-*"]
}
}
data "aws_ami" "flatcar" {
most_recent = true
owners = ["075585003325"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["Flatcar-${local.flavor == "flatcar" ? local.channel : "stable"}-*"]
}
}

View File

@ -1,124 +0,0 @@
---
systemd:
units:
- name: docker.service
enable: true
- name: locksmithd.service
mask: true
- name: wait-for-dns.service
enable: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
- name: kubelet.service
enable: true
contents: |
[Unit]
Description=Kubelet via Hyperkube
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume var-lib-calico,kind=host,source=/var/lib/calico \
--mount volume=var-lib-calico,target=/var/lib/calico \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--insecure-options=image"
Environment=KUBELET_CGROUP_DRIVER=${cgroup_driver}
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--cgroup-driver=$${KUBELET_CGROUP_DRIVER} \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/node \
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
- name: delete-node.service
enable: true
contents: |
[Unit]
Description=Waiting to delete Kubernetes node on shutdown
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/etc/kubernetes/delete-node
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubeconfig
filesystem: root
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/kubernetes/kubelet.env
filesystem: root
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
KUBELET_IMAGE_TAG=v1.15.2
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/kubernetes/delete-node
filesystem: root
mode: 0744
contents:
inline: |
#!/bin/bash
set -e
exec /usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.15.2 \
--net=host \
--dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_key}"

View File

@ -1,4 +0,0 @@
terraform {
required_version = ">= 0.12"
}

View File

@ -1,90 +0,0 @@
# Workers AutoScaling Group
resource "aws_autoscaling_group" "workers" {
name = "${var.name}-worker ${aws_launch_configuration.worker.name}"
# count
desired_capacity = var.worker_count
min_size = var.worker_count
max_size = var.worker_count + 2
default_cooldown = 30
health_check_grace_period = 30
# network
vpc_zone_identifier = var.subnet_ids
# template
launch_configuration = aws_launch_configuration.worker.name
# target groups to which instances should be added
target_group_arns = flatten([
aws_lb_target_group.workers-http.id,
aws_lb_target_group.workers-https.id,
var.target_groups,
])
lifecycle {
# override the default destroy and replace update behavior
create_before_destroy = true
}
# Waiting for instance creation delays adding the ASG to state. If instances
# can't be created (e.g. spot price too low), the ASG will be orphaned.
# Orphaned ASGs escape cleanup, can't be updated, and keep bidding if spot is
# used. Disable wait to avoid issues and align with other clouds.
wait_for_capacity_timeout = "0"
tags = [
{
key = "Name"
value = "${var.name}-worker"
propagate_at_launch = true
},
]
}
# Worker template
resource "aws_launch_configuration" "worker" {
image_id = local.ami_id
instance_type = var.instance_type
spot_price = var.spot_price
enable_monitoring = false
user_data = data.ct_config.worker-ignition.rendered
# storage
root_block_device {
volume_type = var.disk_type
volume_size = var.disk_size
iops = var.disk_iops
}
# network
security_groups = var.security_groups
lifecycle {
// Override the default destroy and replace update behavior
create_before_destroy = true
ignore_changes = [image_id]
}
}
# Worker Ignition config
data "ct_config" "worker-ignition" {
content = data.template_file.worker-config.rendered
pretty_print = false
snippets = var.clc_snippets
}
# Worker Container Linux config
data "template_file" "worker-config" {
template = file("${path.module}/cl/worker.yaml.tmpl")
vars = {
kubeconfig = indent(10, var.kubeconfig)
ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
cgroup_driver = local.flavor == "flatcar" && local.channel == "edge" ? "systemd" : "cgroupfs"
}
}

View File

@ -11,11 +11,11 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.15.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
* Kubernetes v1.31.3 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [Cilium](https://github.com/cilium/cilium) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/), SELinux enforcing
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/fedora-coreos/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customization
* Ready for Ingress, Prometheus, Grafana, CSI, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs

View File

@ -1,4 +1,3 @@
data "aws_ami" "fedora-coreos" {
most_recent = true
owners = ["125523088429"]
@ -13,9 +12,30 @@ data "aws_ami" "fedora-coreos" {
values = ["hvm"]
}
// pin on known ok versions as preview matures
filter {
name = "name"
values = ["fedora-coreos-30.20190725.0-hvm"]
name = "description"
values = ["Fedora CoreOS ${var.os_stream} *"]
}
}
data "aws_ami" "fedora-coreos-arm" {
count = var.controller_arch == "arm64" ? 1 : 0
most_recent = true
owners = ["125523088429"]
filter {
name = "architecture"
values = ["arm64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "description"
values = ["Fedora CoreOS ${var.os_stream} *"]
}
}

View File

@ -1,17 +1,15 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=c21da0224984493e92dd2dc7bb3b755c564852fc"
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=e6a1c7bccfc45ab299b5f8149bc3840f99b30b2b"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]
etcd_servers = aws_route53_record.etcds.*.fqdn
asset_dir = var.asset_dir
networking = var.networking
network_mtu = var.network_mtu
pod_cidr = var.pod_cidr
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
enable_reporting = var.enable_reporting
enable_aggregation = var.enable_aggregation
daemonset_tolerations = var.daemonset_tolerations
components = var.components
}

View File

@ -0,0 +1,268 @@
---
variant: fcos
version: 1.5.0
systemd:
units:
- name: etcd-member.service
enabled: true
contents: |
[Unit]
Description=etcd (System Container)
Documentation=https://github.com/etcd-io/etcd
Wants=network-online.target
After=network-online.target
[Service]
Environment=ETCD_IMAGE=quay.io/coreos/etcd:v3.5.13
Type=exec
ExecStartPre=/bin/mkdir -p /var/lib/etcd
ExecStartPre=-/usr/bin/podman rm etcd
ExecStart=/usr/bin/podman run --name etcd \
--env-file /etc/etcd/etcd.env \
--log-driver k8s-file \
--network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
$${ETCD_IMAGE}
ExecStop=/usr/bin/podman stop etcd
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
- name: containerd.service
enabled: true
- name: docker.service
mask: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS and hostname
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStartPre=/bin/sh -c 'while [ `hostname -s` == "localhost" ]; do sleep 1; done;'
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet (System Container)
Requires=afterburn.service
After=afterburn.service
Wants=rpc-statd.service
[Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.31.3
EnvironmentFile=/run/metadata/afterburn
ExecStartPre=/bin/mkdir -p /etc/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/podman rm kubelet
ExecStart=/usr/bin/podman run --name kubelet \
--log-driver k8s-file \
--privileged \
--pid host \
--network host \
--volume /etc/cni/net.d:/etc/cni/net.d:ro,z \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /etc/machine-id:/etc/machine-id:ro \
--volume /usr/lib/os-release:/etc/os-release:ro \
--volume /lib/modules:/lib/modules:ro \
--volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup \
--volume /etc/selinux:/etc/selinux \
--volume /sys/fs/selinux:/sys/fs/selinux \
--volume /var/lib/calico:/var/lib/calico:ro \
--volume /var/lib/containerd:/var/lib/containerd \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \
--volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \
$${KUBELET_IMAGE} \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--config=/etc/kubernetes/kubelet.yaml \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--node-labels=node.kubernetes.io/controller="true" \
--provider-id=aws:///$${AFTERBURN_AWS_AVAILABILITY_ZONE}/$${AFTERBURN_AWS_INSTANCE_ID} \
--register-with-taints=node-role.kubernetes.io/controller=:NoSchedule
ExecStop=-/usr/bin/podman stop kubelet
Delegate=yes
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: bootstrap.service
contents: |
[Unit]
Description=Kubernetes control plane
ConditionPathExists=!/opt/bootstrap/bootstrap.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/podman rm bootstrap
ExecStart=/usr/bin/podman run --name bootstrap \
--network host \
--volume /etc/kubernetes/pki:/etc/kubernetes/pki:ro,z \
--volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \
--entrypoint=/apply \
quay.io/poseidon/kubelet:v1.31.3
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap
storage:
directories:
- path: /var/lib/etcd
mode: 0700
- path: /etc/kubernetes
- path: /opt/bootstrap
files:
- path: /etc/kubernetes/kubeconfig
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/kubernetes/kubelet.yaml
mode: 0644
contents:
inline: |
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: /etc/kubernetes/ca.crt
authorization:
mode: Webhook
cgroupDriver: systemd
clusterDNS:
- ${cluster_dns_service_ip}
clusterDomain: cluster.local
healthzPort: 0
rotateCertificates: true
shutdownGracePeriod: 45s
shutdownGracePeriodCriticalPods: 30s
staticPodPath: /etc/kubernetes/manifests
readOnlyPort: 0
resolvConf: /run/systemd/resolve/resolv.conf
volumePluginDir: /var/lib/kubelet/volumeplugins
- path: /opt/bootstrap/layout
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/{etcd,k8s} static-manifests manifests/{coredns,kube-proxy,network}
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/pki
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/pki/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
mv auth/* /etc/kubernetes/pki/
mv tls/k8s/* /etc/kubernetes/pki/
mkdir -p /etc/kubernetes/manifests
mv static-manifests/* /etc/kubernetes/manifests/
mkdir -p /opt/bootstrap/assets
mv manifests /opt/bootstrap/assets/manifests
rm -rf assets auth static-manifests tls manifests
chcon -R -u system_u -t container_file_t /etc/kubernetes/pki
- path: /opt/bootstrap/apply
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/etc/kubernetes/pki/admin.conf
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5
done
- path: /etc/systemd/logind.conf.d/inhibitors.conf
contents:
inline: |
[Login]
InhibitDelayMaxSec=45s
- path: /etc/sysctl.d/max-user-watches.conf
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.*.rp_filter=0
- path: /etc/systemd/network/50-flannel.link
contents:
inline: |
[Match]
OriginalName=flannel*
[Link]
MACAddressPolicy=none
- path: /etc/systemd/system.conf.d/accounting.conf
contents:
inline: |
[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes
- path: /etc/etcd/etcd.env
mode: 0644
contents:
inline: |
ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381
ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}
ETCD_STRICT_RECONFIG_CHECK=true
ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt
ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt
ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true
- path: /etc/containerd/config.toml
overwrite: true
contents:
inline: |
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
subreaper = true
oom_score = -999
[grpc]
address = "/run/containerd/containerd.sock"
uid = 0
gid = 0
[plugins."io.containerd.grpc.v1.cri"]
enable_selinux = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
passwd:
users:
- name: core
ssh_authorized_keys:
- ${ssh_authorized_key}

View File

@ -20,24 +20,33 @@ resource "aws_instance" "controllers" {
tags = {
Name = "${var.cluster_name}-controller-${count.index}"
}
instance_type = var.controller_type
ami = data.aws_ami.fedora-coreos.image_id
user_data = data.ct_config.controller-ignitions.*.rendered[count.index]
ami = var.controller_arch == "arm64" ? data.aws_ami.fedora-coreos-arm[0].image_id : data.aws_ami.fedora-coreos.image_id
# storage
root_block_device {
volume_type = var.disk_type
volume_size = var.disk_size
iops = var.disk_iops
volume_type = var.controller_disk_type
volume_size = var.controller_disk_size
iops = var.controller_disk_iops
encrypted = true
tags = {
Name = "${var.cluster_name}-controller-${count.index}"
}
}
# network
associate_public_ip_address = true
subnet_id = aws_subnet.public.*.id[count.index]
subnet_id = element(aws_subnet.public.*.id, count.index)
vpc_security_group_ids = [aws_security_group.controller.id]
# boot
user_data = data.ct_config.controllers.*.rendered[count.index]
# cost
credit_specification {
cpu_credits = var.controller_cpu_credits
}
lifecycle {
ignore_changes = [
ami,
@ -46,41 +55,21 @@ resource "aws_instance" "controllers" {
}
}
# Controller Ignition configs
data "ct_config" "controller-ignitions" {
count = var.controller_count
content = data.template_file.controller-configs.*.rendered[count.index]
strict = true
snippets = var.controller_snippets
}
# Controller Fedora CoreOS configs
data "template_file" "controller-configs" {
# Fedora CoreOS controllers
data "ct_config" "controllers" {
count = var.controller_count
template = file("${path.module}/fcc/controller.yaml")
vars = {
content = templatefile("${path.module}/butane/controller.yaml", {
# Cannot use cyclic dependencies on controllers or their DNS records
etcd_name = "etcd${count.index}"
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
etcd_initial_cluster = join(",", data.template_file.etcds.*.rendered)
kubeconfig = indent(10, module.bootkube.kubeconfig-kubelet)
etcd_initial_cluster = join(",", [
for i in range(var.controller_count) : "etcd${i}=https://${var.cluster_name}-etcd${i}.${var.dns_zone}:2380"
])
kubeconfig = indent(10, module.bootstrap.kubeconfig-kubelet)
ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
}
})
strict = true
snippets = var.controller_snippets
}
data "template_file" "etcds" {
count = var.controller_count
template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380"
vars = {
index = count.index
cluster_name = var.cluster_name
dns_zone = var.dns_zone
}
}

View File

@ -1,179 +0,0 @@
---
variant: fcos
version: 1.0.0
systemd:
units:
- name: etcd-member.service
enabled: true
contents: |
[Unit]
Description=etcd (System Container)
Documentation=https://github.com/coreos/etcd
Wants=network-online.target network.target
After=network-online.target
[Service]
# https://github.com/opencontainers/runc/pull/1807
# Type=notify
# NotifyAccess=exec
Type=exec
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
ExecStartPre=/bin/mkdir -p /var/lib/etcd
ExecStartPre=-/usr/bin/podman rm etcd
#--volume $${NOTIFY_SOCKET}:/run/systemd/notify \
ExecStart=/usr/bin/podman run --name etcd \
--env-file /etc/etcd/etcd.env \
--network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.3.13
ExecStop=/usr/bin/podman stop etcd
[Install]
WantedBy=multi-user.target
- name: docker.service
enabled: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet via Hyperkube (System Container)
Wants=rpc-statd.service
[Service]
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/podman rm kubelet
ExecStart=/usr/bin/podman run --name kubelet \
--privileged \
--pid host \
--network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \
--volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico \
--volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \
--volume /var/run:/var/run \
--volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \
k8s.gcr.io/hyperkube:v1.15.2 /hyperkube kubelet \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--cgroup-driver=systemd \
--cgroups-per-qos=false \
--enforce-node-allocatable="" \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/master \
--node-labels=node-role.kubernetes.io/controller="true" \
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/podman stop kubelet
Delegate=yes
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: bootkube.service
contents: |
[Unit]
Description=Bootstrap a Kubernetes control plane
ConditionPathExists=!/opt/bootkube/init_bootkube.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootkube
ExecStart=/usr/bin/bash -c 'set -x && \
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-* && exec podman run --name bootkube --privileged \
--network host \
--volume /opt/bootkube/assets:/assets \
--volume /etc/kubernetes:/etc/kubernetes \
quay.io/coreos/bootkube:v0.14.0 \
/bootkube start --asset-dir=/assets'
ExecStartPost=/bin/touch /opt/bootkube/init_bootkube.done
storage:
directories:
- path: /etc/kubernetes
- path: /opt/bootkube
files:
- path: /etc/kubernetes/kubeconfig
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.all.rp_filter=1
- path: /etc/sysctl.d/max-user-watches.conf
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/systemd/system.conf.d/accounting.conf
contents:
inline: |
[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes
- path: /etc/etcd/etcd.env
mode: 0644
contents:
inline: |
# TODO: Use a systemd dropin once podman v1.4.5 is avail.
NOTIFY_SOCKET=/run/systemd/notify
ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381
ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}
ETCD_STRICT_RECONFIG_CHECK=true
ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt
ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt
ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true
passwd:
users:
- name: core
ssh_authorized_keys:
- ${ssh_authorized_key}

Some files were not shown because too many files have changed in this diff Show More