mirror of
https://github.com/puppetmaster/typhoon.git
synced 2025-08-03 00:51:35 +02:00
Compare commits
133 Commits
Author | SHA1 | Date | |
---|---|---|---|
da2be86e8c | |||
65a2751f77 | |||
a04ef3919a | |||
851bc1a3f8 | |||
758c09fa5c | |||
b1cdd361ef | |||
7f7bc960a6 | |||
29108fd99d | |||
18d08de898 | |||
f3730b2bfa | |||
88aa9a46e5 | |||
efa90d8b44 | |||
46226a8015 | |||
270d1ce357 | |||
ab87b6cea3 | |||
d621512dd6 | |||
c59a9c66b1 | |||
21f2cef12f | |||
931e311786 | |||
2592a0aad4 | |||
6c5e287c29 | |||
2a4595eeee | |||
8e7e6b9f7f | |||
35f3b1b28c | |||
9fb1e1a0e2 | |||
b61d6373c5 | |||
42708f9a70 | |||
d54709f89c | |||
0e688ef05a | |||
9dcc255f8e | |||
9307e97c46 | |||
c112ee3829 | |||
45b556c08f | |||
da6aafe816 | |||
cce4537487 | |||
73126eb7f8 | |||
160ae34e71 | |||
06d40c5b44 | |||
98985e5acd | |||
ea6bf9c9fb | |||
486fdb6968 | |||
a44cf0edbd | |||
983c7aa012 | |||
3d9683b6e8 | |||
0da7757ef4 | |||
04c6613ff3 | |||
92600efd11 | |||
66c64b4e45 | |||
13f3745093 | |||
c4914c326b | |||
461fd46986 | |||
ceb5555222 | |||
86420fd507 | |||
5c383f4184 | |||
22fa051002 | |||
c8313751d7 | |||
195d902ab6 | |||
c19a68b59b | |||
de88fa5457 | |||
d9a0183f3f | |||
7e24c67608 | |||
a37aff7f35 | |||
03d23bfde7 | |||
2c10d24113 | |||
82a616c70b | |||
2fa7dac247 | |||
a41691b222 | |||
9034203d7a | |||
d42f6d6b5d | |||
2fa1840c30 | |||
8e0b8d7e40 | |||
a0cf527ccf | |||
65321acad2 | |||
064ce83f25 | |||
c3b0cdddf3 | |||
211ec94c75 | |||
8aca5a089e | |||
3e6e4ea339 | |||
103f1e16d7 | |||
50dd3e3b82 | |||
3dc755994b | |||
ddbfb2eee1 | |||
868265988b | |||
6adffcb778 | |||
bc967ddcd0 | |||
ef18f19ec4 | |||
f5efcc1ff8 | |||
996651c605 | |||
38fa7dff1a | |||
bbe295a3f1 | |||
d8db296932 | |||
388ac08492 | |||
527b5ca602 | |||
ecd6a9443b | |||
2523d64f95 | |||
fc455c8624 | |||
7a0a60708e | |||
51a5f64024 | |||
e1f2125f02 | |||
9329b775f6 | |||
e04cce1201 | |||
201a38bd90 | |||
fbdd946601 | |||
19102636a9 | |||
21e540159b | |||
43e65a4d13 | |||
e79088baa0 | |||
495e33e213 | |||
63f5a26a72 | |||
eea79e895d | |||
99c07661c6 | |||
521a1f0fee | |||
7345cb6419 | |||
a481d71d7d | |||
831a5c976c | |||
85e6783503 | |||
165396d6aa | |||
ce49a93d5d | |||
e623439eec | |||
9548572d98 | |||
f00ecde854 | |||
d85300f947 | |||
65f006e6cc | |||
8d3817e0ae | |||
5f5eec1175 | |||
5308fde3d3 | |||
9ab61d7bf5 | |||
6483f613c5 | |||
56c6bf431a | |||
63ab117205 | |||
1cd262e712 | |||
32bdda1b6c | |||
07d257aa7b |
2
.github/ISSUE_TEMPLATE.md
vendored
2
.github/ISSUE_TEMPLATE.md
vendored
@ -4,7 +4,7 @@
|
||||
|
||||
### Environment
|
||||
|
||||
* Platform: bare-metal, google-cloud, digital-ocean
|
||||
* Platform: aws, bare-metal, google-cloud, digital-ocean
|
||||
* OS: container-linux, fedora-cloud
|
||||
* Terraform: `terraform version`
|
||||
* Plugins: Provider plugin versions
|
||||
|
198
CHANGES.md
198
CHANGES.md
@ -4,6 +4,196 @@ Notable changes between versions.
|
||||
|
||||
## Latest
|
||||
|
||||
## v1.9.6
|
||||
|
||||
* Kubernetes [v1.9.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v196)
|
||||
* Update Calico from v3.0.3 to v3.0.4
|
||||
|
||||
#### Addons
|
||||
|
||||
* Update heapster from v1.5.1 to v1.5.2
|
||||
|
||||
## v1.9.5
|
||||
|
||||
* Kubernetes [v1.9.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v195)
|
||||
* Fix `subPath` volume mounts regression ([kubernetes#61076](https://github.com/kubernetes/kubernetes/issues/61076))
|
||||
* Introduce [Container Linux Config snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) on cloud platforms ([#145](https://github.com/poseidon/typhoon/pull/145))
|
||||
* Validate and additively merge custom Container Linux Configs during `terraform plan`
|
||||
* Define files, systemd units, dropins, networkd configs, mounts, users, and more
|
||||
* Require updating `terraform-provider-ct` plugin from v0.2.0 to v0.2.1
|
||||
* Add `node-role.kubernetes.io/controller="true"` node label to controllers ([#160](https://github.com/poseidon/typhoon/pull/160))
|
||||
|
||||
#### AWS
|
||||
|
||||
* [Require](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action required!)
|
||||
|
||||
#### Digital Ocean
|
||||
|
||||
* [Require](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action required!)
|
||||
|
||||
#### Google Cloud
|
||||
|
||||
* [Require](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action required!)
|
||||
* Relax `os_image` to optional. Default to "coreos-stable".
|
||||
|
||||
#### Addons
|
||||
|
||||
* Update nginx-ingress from 0.11.0 to 0.12.0
|
||||
* Update Prometheus from 2.2.0 to 2.2.1
|
||||
|
||||
## v1.9.4
|
||||
|
||||
* Kubernetes [v1.9.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v194)
|
||||
* Secret, configMap, downward API, and projected volumes now read-only (breaking, [kubernetes#58720](https://github.com/kubernetes/kubernetes/pull/58720))
|
||||
* Regressed `subPath` volume mounts (regression, [kubernetes#61076](https://github.com/kubernetes/kubernetes/issues/61076))
|
||||
* Mitigated `subPath` [CVE-2017-1002101](https://github.com/kubernetes/kubernetes/issues/60813)
|
||||
* Introduce [worker pools](https://typhoon.psdn.io/advanced/worker-pools/) for AWS and Google Cloud for joining heterogeneous workers to existing clusters.
|
||||
* Use new Network Load Balancers and cross zone load balancing on AWS
|
||||
* Allow flexvolume plugins to be used on any Typhoon cluster (not just bare-metal)
|
||||
* Upgrade etcd from v3.2.15 to v3.3.2
|
||||
* Update Calico from v3.0.2 to v3.0.3
|
||||
* Use kubernetes-incubator/bootkube v0.10.0
|
||||
* [Recommend](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action recommended)
|
||||
|
||||
#### AWS
|
||||
|
||||
* Promote AWS platform to stable
|
||||
* Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) ([#150](https://github.com/poseidon/typhoon/pull/150))
|
||||
* Replace the apiserver elastic load balancer with a network load balancer ([#136](https://github.com/poseidon/typhoon/pull/136))
|
||||
* Replace the Ingress elastic load balancer with a network load balancer ([#141](https://github.com/poseidon/typhoon/pull/141))
|
||||
* AWS [NLBs](https://aws.amazon.com/blogs/aws/new-network-load-balancer-effortless-scaling-to-millions-of-requests-per-second/) can handle millions of RPS with high throughput and low latency.
|
||||
* Require `terraform-provider-aws` 1.7.0 or higher
|
||||
* Enable NLB [cross-zone](https://aws.amazon.com/about-aws/whats-new/2018/02/network-load-balancer-now-supports-cross-zone-load-balancing/) load balancing ([#159](https://github.com/poseidon/typhoon/pull/159))
|
||||
* Requests are automatically evenly distributed to targets regardless of AZ
|
||||
* Require `terraform-provider-aws` 1.11.0 or higher
|
||||
* Add kubelet `--volume-plugin-dir` flag to allow flexvolume plugins ([#142](https://github.com/poseidon/typhoon/pull/142))
|
||||
* Fix controller and worker launch configs to ignore AMI changes ([#126](https://github.com/poseidon/typhoon/pull/126), [#158](https://github.com/poseidon/typhoon/pull/158))
|
||||
|
||||
#### Digital Ocean
|
||||
|
||||
* Add kubelet `--volume-plugin-dir` flag to allow flexvolume plugins ([#142](https://github.com/poseidon/typhoon/pull/142))
|
||||
* Fix to pass `ssh_fingerprints` as a list to droplets ([#143](https://github.com/poseidon/typhoon/pull/143))
|
||||
|
||||
#### Google Cloud
|
||||
|
||||
* Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) ([#148](https://github.com/poseidon/typhoon/pull/148))
|
||||
* Add kubelet `--volume-plugin-dir` flag to allow flexvolume plugins ([#142](https://github.com/poseidon/typhoon/pull/142))
|
||||
* Add `kubeconfig` variable to `controllers` and `workers` submodules ([#147](https://github.com/poseidon/typhoon/pull/147))
|
||||
* Remove `kubeconfig_*` variables from `controllers` and `workers` submodules ([#147](https://github.com/poseidon/typhoon/pull/147))
|
||||
* Allow initial experimentation with accelerators (i.e. GPUs) on workers ([#161](https://github.com/poseidon/typhoon/pull/161)) (unofficial)
|
||||
* Require `terraform-provider-google` v1.6.0
|
||||
|
||||
#### Addons
|
||||
|
||||
* Update Prometheus from 2.1.0 to 2.2.0 ([#153](https://github.com/poseidon/typhoon/pull/153))
|
||||
* Scrape Prometheus itself to enable alerts about Prometheus itself
|
||||
* Adjust KubeletDown rule to fire when 10% of kubelets are down
|
||||
* Update heapster from v1.5.0 to v1.5.1 ([#131](https://github.com/poseidon/typhoon/pull/131))
|
||||
* Use separate service account
|
||||
* Update nginx-ingress from 0.10.2 to 0.11.0
|
||||
|
||||
## v1.9.3
|
||||
|
||||
* Kubernetes [v1.9.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v193)
|
||||
* Network improvements and fixes ([#104](https://github.com/poseidon/typhoon/pull/104))
|
||||
* Switch from Calico v2.6.6 to v3.0.2
|
||||
* Add Calico GlobalNetworkSet CRD
|
||||
* Update flannel from v0.9.0 to v0.10.0
|
||||
* Use separate service account for flannel
|
||||
* Update etcd from v3.2.14 to v3.2.15
|
||||
|
||||
#### Digital Ocean
|
||||
|
||||
* Use new Droplet [types](https://developers.digitalocean.com/documentation/changelog/api-v2/new-size-slugs-for-droplet-plan-changes/) which offer more CPU/memory, at lower cost. ([#105](https://github.com/poseidon/typhoon/pull/105))
|
||||
* A small Digital Ocean cluster costs less than $25 a month!
|
||||
|
||||
#### Addons
|
||||
|
||||
* Update Prometheus from v2.0.0 to v2.1.0 ([#113](https://github.com/poseidon/typhoon/pull/113))
|
||||
* Improve alerting rules
|
||||
* Relabel discovered kubelet, endpoint, service, and apiserver scrapes
|
||||
* Use separate service accounts
|
||||
* Update node-exporter and kube-state-metrics
|
||||
* Include Grafana dashboards for Kubernetes admins ([#113](https://github.com/poseidon/typhoon/pull/113))
|
||||
* Add grafana-watcher to load bundled upstream dashboards
|
||||
* Update nginx-ingress from 0.9.0 to 0.10.2
|
||||
* Update CLUO from v0.5.0 to v0.6.0
|
||||
* Switch manifests to use `apps/v1` Deployments and Daemonsets ([#120](https://github.com/poseidon/typhoon/pull/120))
|
||||
* Remove Kubernetes Dashboard manifests ([#121](https://github.com/poseidon/typhoon/pull/121))
|
||||
|
||||
## v1.9.2
|
||||
|
||||
* Kubernetes [v1.9.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v192)
|
||||
* Add Terraform v0.11.x support
|
||||
* Add explicit "providers" section to modules for Terraform v0.11.x
|
||||
* Retain support for Terraform v0.10.4+
|
||||
* Add [migration guide](https://typhoon.psdn.io/topics/maintenance/#terraform-v011x) from Terraform v0.10.x to v0.11.x (**action required!**)
|
||||
* Update etcd from 3.2.13 to 3.2.14
|
||||
* Update calico from 2.6.5 to 2.6.6
|
||||
* Update kube-dns from v1.14.7 to v1.14.8
|
||||
* Use separate service account for kube-dns
|
||||
* Use kubernetes-incubator/bootkube v0.10.0
|
||||
|
||||
#### Bare-Metal
|
||||
|
||||
* Use per-node Container Linux install profiles ([#97](https://github.com/poseidon/typhoon/pull/97))
|
||||
* Allow Container Linux channel/version to be chosen per-cluster
|
||||
* Fix issue where cluster deletion could require `terraform apply` multiple times
|
||||
|
||||
#### Digital Ocean
|
||||
|
||||
* Relax `digitalocean` provider version constraint
|
||||
* Fix bug with `terraform plan` always showing a firewall diff to be applied ([#3](https://github.com/poseidon/typhoon/issues/3))
|
||||
|
||||
#### Addons
|
||||
|
||||
* Update CLUO to v0.5.0 to fix compatibility with Kubernetes 1.9 (**important**)
|
||||
* Earlier versions can't roll out Container Linux updates on Kubernetes 1.9 nodes ([cluo#163](https://github.com/coreos/container-linux-update-operator/issues/163))
|
||||
* Update kube-state-metrics from v1.1.0 to v1.2.0
|
||||
* Fix RBAC cluster role for kube-state-metrics
|
||||
|
||||
## v1.9.1
|
||||
|
||||
* Kubernetes [v1.9.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v191)
|
||||
* Update kube-dns from 1.14.5 to v1.14.7
|
||||
* Update etcd from 3.2.0 to 3.2.13
|
||||
* Update Calico from v2.6.4 to v2.6.5
|
||||
* Enable portmap to fix hostPort with Calico
|
||||
* Use separate service account for controller-manager
|
||||
|
||||
## v1.8.6
|
||||
|
||||
* Kubernetes [v1.8.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#v186)
|
||||
* Update Calico from v2.6.3 to v2.6.4
|
||||
|
||||
## v1.8.5
|
||||
|
||||
* Kubernetes [v1.8.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#v185)
|
||||
* Recommend Container Linux [images](https://coreos.com/releases/) with Docker 17.09
|
||||
* Container Linux stable, beta, and alpha now provide Docker 17.09 (instead
|
||||
of 1.12)
|
||||
* Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09
|
||||
* Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69))
|
||||
* Add optional `cluster_domain_suffix` variable (#74)
|
||||
* Use kubernetes-incubator/bootkube v0.9.1
|
||||
|
||||
#### Bare-Metal
|
||||
|
||||
* Add kubelet `--volume-plugin-dir` flag to allow flexvolume providers ([#61](https://github.com/poseidon/typhoon/pull/61))
|
||||
|
||||
#### Addons
|
||||
|
||||
* Discourage deploying the Kubernetes Dashboard (security)
|
||||
|
||||
## v1.8.4
|
||||
|
||||
* Kubernetes v1.8.4
|
||||
* Calico related bug fixes
|
||||
* Update Calico from v2.6.1 to v2.6.3
|
||||
* Update flannel from v0.9.0 to v0.9.1
|
||||
* Service accounts for kube-proxy and pod-checkpointer
|
||||
* Use kubernetes-incubator/bootkube v0.9.0
|
||||
|
||||
## v1.8.3
|
||||
|
||||
* Kubernetes v1.8.3
|
||||
@ -86,7 +276,7 @@ Notable changes between versions.
|
||||
## v1.7.3
|
||||
|
||||
* Kubernetes v1.7.3
|
||||
* Use kubernete-incubator/bootkube v0.6.1
|
||||
* Use kubernetes-incubator/bootkube v0.6.1
|
||||
|
||||
#### Digital Ocean
|
||||
|
||||
@ -96,7 +286,7 @@ Notable changes between versions.
|
||||
## v1.7.1
|
||||
|
||||
* Kubernetes v1.7.1
|
||||
* Use kubernete-incubator/bootkube v0.6.0
|
||||
* Use kubernetes-incubator/bootkube v0.6.0
|
||||
* Add Bare-Metal Terraform module (stable)
|
||||
* Add Digital Ocean Terraform module (beta)
|
||||
|
||||
@ -109,12 +299,12 @@ Notable changes between versions.
|
||||
## v1.6.7
|
||||
|
||||
* Kubernetes v1.6.7
|
||||
* Use kubernete-incubator/bootkube v0.5.1
|
||||
* Use kubernetes-incubator/bootkube v0.5.1
|
||||
|
||||
## v1.6.6
|
||||
|
||||
* Kubernetes v1.6.6
|
||||
* Use kubernete-incubator/bootkube v0.4.5
|
||||
* Use kubernetes-incubator/bootkube v0.4.5
|
||||
* Disable locksmithd on hosts, in favor of [CLUO](https://github.com/coreos/container-linux-update-operator).
|
||||
|
||||
## v1.6.4
|
||||
|
@ -2,4 +2,4 @@
|
||||
|
||||
## Developer Certificate of Origin
|
||||
|
||||
By contributing, you agree to the Linux Foundation's Developer Certificate of Origin ([DOC](DCO)). The DCO is a statement that you, the contributor, have the legal right to make your contribution and understand the contribution will be distributed as part of this project.
|
||||
By contributing, you agree to the Linux Foundation's Developer Certificate of Origin ([DCO](DCO)). The DCO is a statement that you, the contributor, have the legal right to make your contribution and understand the contribution will be distributed as part of this project.
|
||||
|
39
README.md
39
README.md
@ -1,4 +1,4 @@
|
||||
# Typhoon []() <img align="right" src="https://storage.googleapis.com/dghubble/spin.png">
|
||||
# Typhoon []() <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
|
||||
|
||||
Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
@ -9,12 +9,13 @@ Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
|
||||
|
||||
## Features
|
||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||
|
||||
* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Kubernetes v1.9.6 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/) and [preemption](https://typhoon.psdn.io/google-cloud/#preemption) (varies by platform)
|
||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
|
||||
## Modules
|
||||
|
||||
@ -22,7 +23,7 @@ Typhoon provides a Terraform Module for each supported operating system and plat
|
||||
|
||||
| Platform | Operating System | Terraform Module | Status |
|
||||
|---------------|------------------|------------------|--------|
|
||||
| AWS | Container Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | beta |
|
||||
| AWS | Container Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable |
|
||||
| Bare-Metal | Container Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
|
||||
| Digital Ocean | Container Linux | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta |
|
||||
| Google Cloud | Container Linux | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | beta |
|
||||
@ -43,13 +44,21 @@ Define a Kubernetes cluster by using the Terraform module for your chosen platfo
|
||||
|
||||
```tf
|
||||
module "google-cloud-yavin" {
|
||||
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes"
|
||||
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.9.6"
|
||||
|
||||
providers = {
|
||||
google = "google.default"
|
||||
local = "local.default"
|
||||
null = "null.default"
|
||||
template = "template.default"
|
||||
tls = "tls.default"
|
||||
}
|
||||
|
||||
# Google Cloud
|
||||
region = "us-central1"
|
||||
dns_zone = "example.com"
|
||||
dns_zone_name = "example-zone"
|
||||
os_image = "coreos-stable-1465-6-0-v20170817"
|
||||
os_image = "coreos-stable"
|
||||
|
||||
cluster_name = "yavin"
|
||||
controller_count = 1
|
||||
@ -75,12 +84,12 @@ Apply complete! Resources: 37 added, 0 changed, 0 destroyed.
|
||||
In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.
|
||||
|
||||
```sh
|
||||
$ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
|
||||
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
|
||||
$ kubectl get nodes
|
||||
NAME STATUS AGE VERSION
|
||||
yavin-controller-0.c.example-com.internal Ready 6m v1.8.3
|
||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.8.3
|
||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.8.3
|
||||
yavin-controller-0.c.example-com.internal Ready 6m v1.9.6
|
||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.9.6
|
||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.9.6
|
||||
```
|
||||
|
||||
List the pods.
|
||||
@ -115,11 +124,11 @@ Typhoon is strict about minimalism, maturity, and scope. These are not in scope:
|
||||
|
||||
Ask questions on the IRC #typhoon channel on [freenode.net](http://freenode.net/).
|
||||
|
||||
## Background
|
||||
## Motivation
|
||||
|
||||
Typhoon powers the author's cloud and colocation clusters. The project has evolved through operational experience and Kubernetes changes. Typhoon is shared under a free license to allow others to use the work freely and contribute to its upkeep.
|
||||
|
||||
Typhoon addresses real world needs, which you may share. It is honest about limitations or areas that aren't mature yet. It avoids buzzword bingo and hype. It does not aim to be the one-solution-fits-all distro. An ecosystem of free (or enterprise) Kubernetes distros is healthy.
|
||||
Typhoon addresses real world needs, which you may share. It is honest about limitations or areas that aren't mature yet. It avoids buzzword bingo and hype. It does not aim to be the one-solution-fits-all distro. An ecosystem of Kubernetes distributions is healthy.
|
||||
|
||||
## Social Contract
|
||||
|
||||
@ -127,4 +136,6 @@ Typhoon is not a product, trial, or free-tier. It is not run by a company, does
|
||||
|
||||
Typhoon clusters will contain only [free](https://www.debian.org/intro/free) components. Cluster components will not collect data on users without their permission.
|
||||
|
||||
*Disclosure: The author works for CoreOS and previously wrote Matchbox and original Tectonic for bare-metal and AWS. This project is not associated with CoreOS.*
|
||||
## Donations
|
||||
|
||||
Typhoon does not accept money donations. Instead, we encourage you to donate to one of [these organizations](https://github.com/poseidon/typhoon/wiki/Donations) to show your appreciation.
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: reboot-coordinator
|
||||
roleRef:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: reboot-coordinator
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: container-linux-update-agent
|
||||
@ -8,6 +8,9 @@ spec:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: container-linux-update-agent
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
@ -15,7 +18,7 @@ spec:
|
||||
spec:
|
||||
containers:
|
||||
- name: update-agent
|
||||
image: quay.io/coreos/container-linux-update-operator:v0.4.1
|
||||
image: quay.io/coreos/container-linux-update-operator:v0.6.0
|
||||
command:
|
||||
- "/bin/update-agent"
|
||||
volumeMounts:
|
||||
|
@ -1,10 +1,13 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: container-linux-update-operator
|
||||
namespace: reboot-coordinator
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: container-linux-update-operator
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
@ -12,7 +15,7 @@ spec:
|
||||
spec:
|
||||
containers:
|
||||
- name: update-operator
|
||||
image: quay.io/coreos/container-linux-update-operator:v0.4.1
|
||||
image: quay.io/coreos/container-linux-update-operator:v0.6.0
|
||||
command:
|
||||
- "/bin/update-operator"
|
||||
env:
|
||||
|
@ -1,32 +0,0 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: kubernetes-dashboard
|
||||
namespace: kube-system
|
||||
spec:
|
||||
replicas: 1
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
name: kubernetes-dashboard
|
||||
phase: prod
|
||||
spec:
|
||||
containers:
|
||||
- name: kubernetes-dashboard
|
||||
image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.1
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: 9090
|
||||
resources:
|
||||
limits:
|
||||
cpu: 100m
|
||||
memory: 300Mi
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 100Mi
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /
|
||||
port: 9090
|
||||
initialDelaySeconds: 30
|
||||
timeoutSeconds: 30
|
@ -1,15 +0,0 @@
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: kubernetes-dashboard
|
||||
namespace: kube-system
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
name: kubernetes-dashboard
|
||||
phase: prod
|
||||
ports:
|
||||
- name: http
|
||||
protocol: TCP
|
||||
port: 80
|
||||
targetPort: 9090
|
7499
addons/grafana/config.yaml
Normal file
7499
addons/grafana/config.yaml
Normal file
@ -0,0 +1,7499 @@
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: grafana-dashboards
|
||||
namespace: monitoring
|
||||
data:
|
||||
deployment-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 1,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "200px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 8,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "cores",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "CPU",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 9,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "GB",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "80%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(container_memory_usage_bytes{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}) / 1024^3",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Memory",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "Bps",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": false
|
||||
},
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Network",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "100px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": false
|
||||
},
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"metric": "kube_deployment_spec_replicas",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Desired Replicas",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 6,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Available Replicas",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 3,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_deployment_status_observed_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Observed Generation",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_deployment_metadata_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Metadata Generation",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "350px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 1,
|
||||
"isNew": true,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_deployment_status_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "current replicas",
|
||||
"refId": "A",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "available",
|
||||
"refId": "B",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "max(kube_deployment_status_replicas_unavailable{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "unavailable",
|
||||
"refId": "C",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "min(kube_deployment_status_replicas_updated{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "updated",
|
||||
"refId": "D",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "desired",
|
||||
"refId": "E",
|
||||
"step": 30
|
||||
}
|
||||
],
|
||||
"title": "Replicas",
|
||||
"tooltip": {
|
||||
"msResolution": true,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "none",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"show": false
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Namespace",
|
||||
"multi": false,
|
||||
"name": "deployment_namespace",
|
||||
"options": [],
|
||||
"query": "label_values(kube_deployment_metadata_generation, namespace)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": null,
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
},
|
||||
{
|
||||
"allValue": null,
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Deployment",
|
||||
"multi": false,
|
||||
"name": "deployment_name",
|
||||
"options": [],
|
||||
"query": "label_values(kube_deployment_metadata_generation{namespace=\"$deployment_namespace\"}, deployment)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "deployment",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Deployment",
|
||||
"version": 1
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
etcd-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"label": "prometheus",
|
||||
"description": "",
|
||||
"type": "datasource",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus"
|
||||
}
|
||||
],
|
||||
"__requires": [
|
||||
{
|
||||
"type": "grafana",
|
||||
"id": "grafana",
|
||||
"name": "Grafana",
|
||||
"version": "4.5.2"
|
||||
},
|
||||
{
|
||||
"type": "panel",
|
||||
"id": "graph",
|
||||
"name": "Graph",
|
||||
"version": ""
|
||||
},
|
||||
{
|
||||
"type": "datasource",
|
||||
"id": "prometheus",
|
||||
"name": "Prometheus",
|
||||
"version": "1.0.0"
|
||||
},
|
||||
{
|
||||
"type": "panel",
|
||||
"id": "singlestat",
|
||||
"name": "Singlestat",
|
||||
"version": ""
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"description": "etcd sample Grafana dashboard with Prometheus",
|
||||
"editable": false,
|
||||
"gnetId": null,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"id": null,
|
||||
"links": [],
|
||||
"refresh": false,
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"cacheTimeout": null,
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 28,
|
||||
"interval": null,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"nullText": null,
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"tableColumn": "",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(etcd_server_has_leader)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"metric": "etcd_server_has_leader",
|
||||
"refId": "A",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"thresholds": "",
|
||||
"title": "Up",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "200%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 23,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 5,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(grpc_server_started_total{grpc_type=\"unary\"}[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "RPC Rate",
|
||||
"metric": "grpc_server_started_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(grpc_server_handled_total{grpc_type=\"unary\",grpc_code!=\"OK\"}[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "RPC Failed Rate",
|
||||
"metric": "grpc_server_handled_total",
|
||||
"refId": "B",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "RPC Rate",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "ops",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 41,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 4,
|
||||
"stack": true,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Watch Streams",
|
||||
"metric": "grpc_server_handled_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
},
|
||||
{
|
||||
"expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Lease Streams",
|
||||
"metric": "grpc_server_handled_total",
|
||||
"refId": "B",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Active Streams",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"repeat": null,
|
||||
"repeatIteration": null,
|
||||
"repeatRowId": null,
|
||||
"showTitle": false,
|
||||
"title": "Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"decimals": null,
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"grid": {},
|
||||
"id": 1,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 4,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "etcd_debugging_mvcc_db_total_size_in_bytes",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} DB Size",
|
||||
"metric": "",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "DB Size",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"grid": {},
|
||||
"id": 3,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 1,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 4,
|
||||
"stack": false,
|
||||
"steppedLine": true,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} WAL fsync",
|
||||
"metric": "etcd_disk_wal_fsync_duration_seconds_bucket",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
},
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} DB fsync",
|
||||
"metric": "etcd_disk_backend_commit_duration_seconds_bucket",
|
||||
"refId": "B",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Disk Sync Duration",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "s",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 29,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 4,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "process_resident_memory_bytes",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Resident Memory",
|
||||
"metric": "process_resident_memory_bytes",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Memory",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"repeat": null,
|
||||
"repeatIteration": null,
|
||||
"repeatRowId": null,
|
||||
"showTitle": false,
|
||||
"title": "New row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 5,
|
||||
"id": 22,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 3,
|
||||
"stack": true,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(etcd_network_client_grpc_received_bytes_total[5m])",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Client Traffic In",
|
||||
"metric": "etcd_network_client_grpc_received_bytes_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Client Traffic In",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "Bps",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 5,
|
||||
"id": 21,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 3,
|
||||
"stack": true,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(etcd_network_client_grpc_sent_bytes_total[5m])",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Client Traffic Out",
|
||||
"metric": "etcd_network_client_grpc_sent_bytes_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Client Traffic Out",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "Bps",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 20,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 3,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Peer Traffic In",
|
||||
"metric": "etcd_network_peer_received_bytes_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Peer Traffic In",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "Bps",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"decimals": null,
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"grid": {},
|
||||
"id": 16,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 3,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance)",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Peer Traffic Out",
|
||||
"metric": "etcd_network_peer_sent_bytes_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Peer Traffic Out",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "Bps",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"repeat": null,
|
||||
"repeatIteration": null,
|
||||
"repeatRowId": null,
|
||||
"showTitle": false,
|
||||
"title": "New row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 40,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(etcd_server_proposals_failed_total[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Proposal Failure Rate",
|
||||
"metric": "etcd_server_proposals_failed_total",
|
||||
"refId": "A",
|
||||
"step": 2
|
||||
},
|
||||
{
|
||||
"expr": "sum(etcd_server_proposals_pending)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Proposal Pending Total",
|
||||
"metric": "etcd_server_proposals_pending",
|
||||
"refId": "B",
|
||||
"step": 2
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(etcd_server_proposals_committed_total[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Proposal Commit Rate",
|
||||
"metric": "etcd_server_proposals_committed_total",
|
||||
"refId": "C",
|
||||
"step": 2
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(etcd_server_proposals_applied_total[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Proposal Apply Rate",
|
||||
"refId": "D",
|
||||
"step": 2
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Raft Proposals",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"decimals": 0,
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 19,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "changes(etcd_server_leader_changes_seen_total[1d])",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Total Leader Elections Per Day",
|
||||
"metric": "etcd_server_leader_changes_seen_total",
|
||||
"refId": "A",
|
||||
"step": 2
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Total Leader Elections Per Day",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"repeat": null,
|
||||
"repeatIteration": null,
|
||||
"repeatRowId": null,
|
||||
"showTitle": false,
|
||||
"title": "New row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-15m",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"now": true,
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "etcd",
|
||||
"version": 4
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
kubernetes-capacity-planning-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"gnetId": 22,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"refresh": false,
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 3,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_cpu{mode=\"idle\"}[2m])) * 100",
|
||||
"hide": false,
|
||||
"intervalFactor": 10,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 50
|
||||
}
|
||||
],
|
||||
"title": "Idle CPU",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "percent",
|
||||
"label": "cpu usage",
|
||||
"logBase": 1,
|
||||
"min": 0,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 9,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(node_load1)",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 1m",
|
||||
"refId": "A",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(node_load5)",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 5m",
|
||||
"refId": "B",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(node_load15)",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 15m",
|
||||
"refId": "C",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "System Load",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "percentunit",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 4,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": true,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory usage",
|
||||
"metric": "memo",
|
||||
"refId": "A",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(node_memory_Buffers)",
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory buffers",
|
||||
"metric": "memo",
|
||||
"refId": "B",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(node_memory_Cached)",
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory cached",
|
||||
"metric": "memo",
|
||||
"refId": "C",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(node_memory_MemFree)",
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory free",
|
||||
"metric": "memo",
|
||||
"refId": "D",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "Memory Usage",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"min": "0",
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100",
|
||||
"intervalFactor": 2,
|
||||
"metric": "",
|
||||
"refId": "A",
|
||||
"step": 60,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Memory Usage",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "246px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 6,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "read",
|
||||
"yaxis": 1
|
||||
},
|
||||
{
|
||||
"alias": "{instance=\"172.17.0.1:9100\"}",
|
||||
"yaxis": 2
|
||||
},
|
||||
{
|
||||
"alias": "io time",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_disk_bytes_read[5m]))",
|
||||
"hide": false,
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "read",
|
||||
"refId": "A",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(node_disk_bytes_written[5m]))",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "written",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(node_disk_io_time_ms[5m]))",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "io time",
|
||||
"refId": "C",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "Disk I/O",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "ms",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percentunit",
|
||||
"gauge": {
|
||||
"maxValue": 1,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 12,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 60,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"thresholds": "0.75, 0.9",
|
||||
"title": "Disk Space Usage",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 8,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "transmitted",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_network_receive_bytes{device!~\"lo\"}[5m]))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "Network Received",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 10,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "transmitted",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_network_transmit_bytes{device!~\"lo\"}[5m]))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "B",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "Network Transmitted",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "276px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 11,
|
||||
"isNew": true,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 11,
|
||||
"span": 9,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_pod_info)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Current number of Pods",
|
||||
"refId": "A",
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"expr": "sum(kube_node_status_capacity_pods)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Maximum capacity of pods",
|
||||
"refId": "B",
|
||||
"step": 10
|
||||
}
|
||||
],
|
||||
"title": "Cluster Pod Utilization",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 60,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Pod Utilization",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-1h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Kubernetes Capacity Planning",
|
||||
"version": 4
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
kubernetes-cluster-health-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"refresh": "10s",
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "254px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 1,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Control Plane Components Down",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "Everything UP and healthy",
|
||||
"value": "null"
|
||||
},
|
||||
{
|
||||
"op": "=",
|
||||
"text": "",
|
||||
"value": ""
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Alerts Firing",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 3,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(ALERTS{alertstate=\"pending\",alertname!=\"DeadMansSwitch\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "3, 5",
|
||||
"title": "Alerts Pending",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 4,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(increase(kube_pod_container_status_restarts[1h]) > 5)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Crashlooping Pods",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status!=\"true\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Node Not Ready",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 6,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_status_condition{condition=\"DiskPressure\",status=\"true\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Node Disk Pressure",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Node Memory Pressure",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 8,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_spec_unschedulable)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Nodes Unschedulable",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Kubernetes Cluster Health",
|
||||
"version": 9
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
kubernetes-cluster-status-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "129px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 6,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Control Plane UP",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "UP",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "total"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 6,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 6,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "3, 5",
|
||||
"title": "Alerts Firing",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": true,
|
||||
"title": "Cluster Health",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "168px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 1,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"apiserver\"} == 1) / count(up{job=\"apiserver\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "API Servers UP",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / count(up{job=\"kube-controller-manager\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "Controller Managers UP",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 3,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"kube-scheduler\"} == 1) / count(up{job=\"kube-scheduler\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "Schedulers UP",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 4,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(increase(kube_pod_container_status_restarts{namespace=~\"kube-system|tectonic-system\"}[1h]) > 5)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Crashlooping Control Plane Pods",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": true,
|
||||
"title": "Control Plane Status",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "158px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 8,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(100 - (avg by (instance) (rate(node_cpu{job=\"node-exporter\",mode=\"idle\"}[5m])) * 100)) / count(node_cpu{job=\"node-exporter\",mode=\"idle\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "CPU Utilization",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Memory Utilization",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 9,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Filesystem Utilization",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 10,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Pod Utilization",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": true,
|
||||
"title": "Capacity Planning",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Kubernetes Cluster Status",
|
||||
"version": 3
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
kubernetes-control-plane-status-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 1,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"apiserver\"} == 1) / sum(up{job=\"apiserver\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "API Servers UP",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / sum(up{job=\"kube-controller-manager\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "Controller Managers UP",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 3,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"kube-scheduler\"} == 1) / sum(up{job=\"kube-scheduler\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "Schedulers UP",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 4,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(sum by(instance) (rate(apiserver_request_count{code=~\"5..\"}[5m])) / sum by(instance) (rate(apiserver_request_count[5m]))) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "5, 10",
|
||||
"title": "API Server Request Error Rate",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 7,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by(verb) (rate(apiserver_latency_seconds:quantile[5m]) >= 0)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 30
|
||||
}
|
||||
],
|
||||
"title": "API Server Request Latency",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 5,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "cluster:scheduler_e2e_scheduling_latency_seconds:quantile",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 60
|
||||
}
|
||||
],
|
||||
"title": "End to End Scheduling Latency",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "dtdurations",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 6,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by(instance) (rate(apiserver_request_count{code!~\"2..\"}[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Error Rate",
|
||||
"refId": "A",
|
||||
"step": 60
|
||||
},
|
||||
{
|
||||
"expr": "sum by(instance) (rate(apiserver_request_count[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Request Rate",
|
||||
"refId": "B",
|
||||
"step": 60
|
||||
}
|
||||
],
|
||||
"title": "API Server Request Rates",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Kubernetes Control Plane Status",
|
||||
"version": 3
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
kubernetes-resource-requests-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"refresh": false,
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "300px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"description": "This represents the total [CPU resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) in the cluster.\nFor comparison the total [allocatable CPU cores](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 1,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "min(sum(kube_node_status_allocatable_cpu_cores) by (instance))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Allocatable CPU Cores",
|
||||
"refId": "A",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Requested CPU Cores",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "CPU Cores",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"label": "CPU Cores",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance)) / min(sum(kube_node_status_allocatable_cpu_cores) by (instance)) * 100",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 240
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "CPU Cores",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "CPU Cores",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "300px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"description": "This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) in the cluster.\nFor comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 3,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "min(sum(kube_node_status_allocatable_memory_bytes) by (instance))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Allocatable Memory",
|
||||
"refId": "A",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Requested Memory",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "Memory",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"label": "Memory",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 4,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance)) / min(sum(kube_node_status_allocatable_memory_bytes) by (instance)) * 100",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 240
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Memory",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Memory",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-3h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Kubernetes Resource Requests",
|
||||
"version": 2
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
nodes-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"description": "Dashboard to get an overview of one server",
|
||||
"editable": false,
|
||||
"gnetId": 22,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"refresh": false,
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 3,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (avg by (cpu) (irate(node_cpu{mode=\"idle\", instance=\"$server\"}[5m])) * 100)",
|
||||
"hide": false,
|
||||
"intervalFactor": 10,
|
||||
"legendFormat": "{{cpu}}",
|
||||
"refId": "A",
|
||||
"step": 50
|
||||
}
|
||||
],
|
||||
"title": "Idle CPU",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "percent",
|
||||
"label": "cpu usage",
|
||||
"logBase": 1,
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 9,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_load1{instance=\"$server\"}",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 1m",
|
||||
"refId": "A",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "node_load5{instance=\"$server\"}",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 5m",
|
||||
"refId": "B",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "node_load15{instance=\"$server\"}",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 15m",
|
||||
"refId": "C",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "System Load",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "percentunit",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 4,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": true,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}",
|
||||
"hide": false,
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory used",
|
||||
"metric": "",
|
||||
"refId": "C",
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"expr": "node_memory_Buffers{instance=\"$server\"}",
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory buffers",
|
||||
"metric": "",
|
||||
"refId": "E",
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"expr": "node_memory_Cached{instance=\"$server\"}",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory cached",
|
||||
"metric": "",
|
||||
"refId": "F",
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"expr": "node_memory_MemFree{instance=\"$server\"}",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory free",
|
||||
"metric": "",
|
||||
"refId": "D",
|
||||
"step": 10
|
||||
}
|
||||
],
|
||||
"title": "Memory Usage",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"min": "0",
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "((node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}) / node_memory_MemTotal{instance=\"$server\"}) * 100",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 60,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Memory Usage",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 6,
|
||||
"isNew": true,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "read",
|
||||
"yaxis": 1
|
||||
},
|
||||
{
|
||||
"alias": "{instance=\"172.17.0.1:9100\"}",
|
||||
"yaxis": 2
|
||||
},
|
||||
{
|
||||
"alias": "io time",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (instance) (rate(node_disk_bytes_read{instance=\"$server\"}[2m]))",
|
||||
"hide": false,
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "read",
|
||||
"refId": "A",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum by (instance) (rate(node_disk_bytes_written{instance=\"$server\"}[2m]))",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "written",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "sum by (instance) (rate(node_disk_io_time_ms{instance=\"$server\"}[2m]))",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "io time",
|
||||
"refId": "C",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "Disk I/O",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "ms",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percentunit",
|
||||
"gauge": {
|
||||
"maxValue": 1,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"}) - sum(node_filesystem_free{device!=\"rootfs\",instance=\"$server\"})) / sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"})",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 60,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"thresholds": "0.75, 0.9",
|
||||
"title": "Disk Space Usage",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 8,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "transmitted",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(node_network_receive_bytes{instance=\"$server\",device!~\"lo\"}[5m])",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{device}}",
|
||||
"refId": "A",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "Network Received",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 10,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "transmitted",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(node_network_transmit_bytes{instance=\"$server\",device!~\"lo\"}[5m])",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{device}}",
|
||||
"refId": "B",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "Network Transmitted",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"allValue": null,
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": null,
|
||||
"multi": false,
|
||||
"name": "server",
|
||||
"options": [],
|
||||
"query": "label_values(node_boot_time, instance)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-1h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Nodes",
|
||||
"version": 2
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
pods-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 1,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"refresh": false,
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 1,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": true,
|
||||
"avg": true,
|
||||
"current": true,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": true,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": true
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by(container_name) (container_memory_usage_bytes{pod_name=\"$pod\", container_name=~\"$container\", container_name!=\"POD\"})",
|
||||
"interval": "10s",
|
||||
"intervalFactor": 1,
|
||||
"legendFormat": "Current: {{ container_name }}",
|
||||
"metric": "container_memory_usage_bytes",
|
||||
"refId": "A",
|
||||
"step": 15
|
||||
},
|
||||
{
|
||||
"expr": "kube_pod_container_resource_requests_memory_bytes{pod=\"$pod\", container=~\"$container\"}",
|
||||
"interval": "10s",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Requested: {{ container }}",
|
||||
"metric": "kube_pod_container_resource_requests_memory_bytes",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "kube_pod_container_resource_limits_memory_bytes{pod=\"$pod\", container=~\"$container\"}",
|
||||
"interval": "10s",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Limit: {{ container }}",
|
||||
"metric": "kube_pod_container_resource_limits_memory_bytes",
|
||||
"refId": "C",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "Memory Usage",
|
||||
"tooltip": {
|
||||
"msResolution": true,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 2,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": true,
|
||||
"avg": true,
|
||||
"current": true,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": true,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": true
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (container_name)(rate(container_cpu_usage_seconds_total{image!=\"\",container_name!=\"POD\",pod_name=\"$pod\"}[1m]))",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{ container_name }}",
|
||||
"refId": "A",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "kube_pod_container_resource_requests_cpu_cores{pod=\"$pod\", container=~\"$container\"}",
|
||||
"interval": "10s",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Requested: {{ container }}",
|
||||
"metric": "kube_pod_container_resource_requests_cpu_cores",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "kube_pod_container_resource_limits_cpu_cores{pod=\"$pod\", container=~\"$container\"}",
|
||||
"interval": "10s",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Limit: {{ container }}",
|
||||
"metric": "kube_pod_container_resource_limits_memory_bytes",
|
||||
"refId": "C",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "CPU Usage",
|
||||
"tooltip": {
|
||||
"msResolution": true,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 3,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": true,
|
||||
"avg": true,
|
||||
"current": true,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": true,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": true
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{pod_name=\"$pod\"}[1m])))",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{ pod_name }}",
|
||||
"refId": "A",
|
||||
"step": 30
|
||||
}
|
||||
],
|
||||
"title": "Network I/O",
|
||||
"tooltip": {
|
||||
"msResolution": true,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": true,
|
||||
"label": "Namespace",
|
||||
"multi": false,
|
||||
"name": "namespace",
|
||||
"options": [],
|
||||
"query": "label_values(kube_pod_info, namespace)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
},
|
||||
{
|
||||
"allValue": null,
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Pod",
|
||||
"multi": false,
|
||||
"name": "pod",
|
||||
"options": [],
|
||||
"query": "label_values(kube_pod_info{namespace=~\"$namespace\"}, pod)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
},
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": true,
|
||||
"label": "Container",
|
||||
"multi": false,
|
||||
"name": "container",
|
||||
"options": [],
|
||||
"query": "label_values(kube_pod_container_info{namespace=\"$namespace\", pod=\"$pod\"}, container)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Pods",
|
||||
"version": 1
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
statefulset-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 1,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "200px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 8,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "cores",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "CPU",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 9,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "GB",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "80%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(container_memory_usage_bytes{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}) / 1024^3",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Memory",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "Bps",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": false
|
||||
},
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Network",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "100px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": false
|
||||
},
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"metric": "kube_statefulset_replicas",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Desired Replicas",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 6,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Available Replicas",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 3,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_statefulset_status_observed_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Observed Generation",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_statefulset_metadata_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Metadata Generation",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "350px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 1,
|
||||
"isNew": true,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "available",
|
||||
"refId": "B",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "desired",
|
||||
"refId": "E",
|
||||
"step": 30
|
||||
}
|
||||
],
|
||||
"title": "Replicas",
|
||||
"tooltip": {
|
||||
"msResolution": true,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "none",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"show": false
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Namespace",
|
||||
"multi": false,
|
||||
"name": "statefulset_namespace",
|
||||
"options": [],
|
||||
"query": "label_values(kube_statefulset_metadata_generation, namespace)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": null,
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
},
|
||||
{
|
||||
"allValue": null,
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "StatefulSet",
|
||||
"multi": false,
|
||||
"name": "statefulset_name",
|
||||
"options": [],
|
||||
"query": "label_values(kube_statefulset_metadata_generation{namespace=\"$statefulset_namespace\"}, statefulset)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "statefulset",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "StatefulSet",
|
||||
"version": 1
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
prometheus-datasource.json: |+
|
||||
{
|
||||
"access": "proxy",
|
||||
"basicAuth": false,
|
||||
"name": "prometheus",
|
||||
"type": "prometheus",
|
||||
"url": "http://prometheus.monitoring.svc"
|
||||
}
|
||||
---
|
@ -1,4 +1,4 @@
|
||||
apiVersion: apps/v1beta2
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: grafana
|
||||
@ -21,7 +21,7 @@ spec:
|
||||
spec:
|
||||
containers:
|
||||
- name: grafana
|
||||
image: grafana/grafana:4.6.1
|
||||
image: grafana/grafana:4.6.3
|
||||
env:
|
||||
- name: GF_SERVER_HTTP_PORT
|
||||
value: "8080"
|
||||
@ -41,6 +41,22 @@ spec:
|
||||
limits:
|
||||
memory: 200Mi
|
||||
cpu: 200m
|
||||
- name: grafana-watcher
|
||||
image: quay.io/coreos/grafana-watcher:v0.0.8
|
||||
args:
|
||||
- '--watch-dir=/etc/grafana/dashboards'
|
||||
- '--grafana-url=http://localhost:8080'
|
||||
resources:
|
||||
requests:
|
||||
memory: "16Mi"
|
||||
cpu: "50m"
|
||||
limits:
|
||||
memory: "32Mi"
|
||||
cpu: "100m"
|
||||
volumeMounts:
|
||||
- name: dashboards
|
||||
mountPath: /etc/grafana/dashboards
|
||||
volumes:
|
||||
- name: grafana-storage
|
||||
emptyDir: {}
|
||||
- name: dashboards
|
||||
configMap:
|
||||
name: grafana-dashboards
|
||||
|
12
addons/heapster/cluster-role-binding.yaml
Normal file
12
addons/heapster/cluster-role-binding.yaml
Normal file
@ -0,0 +1,12 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: heapster
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: ClusterRole
|
||||
name: system:heapster
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: heapster
|
||||
namespace: kube-system
|
@ -1,4 +1,4 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: heapster
|
||||
@ -14,12 +14,11 @@ spec:
|
||||
labels:
|
||||
name: heapster
|
||||
phase: prod
|
||||
annotations:
|
||||
scheduler.alpha.kubernetes.io/critical-pod: ''
|
||||
spec:
|
||||
serviceAccountName: heapster
|
||||
containers:
|
||||
- name: heapster
|
||||
image: gcr.io/google_containers/heapster-amd64:v1.4.3
|
||||
image: k8s.gcr.io/heapster-amd64:v1.5.2
|
||||
command:
|
||||
- /heapster
|
||||
- --source=kubernetes.summary_api:''
|
||||
@ -31,16 +30,18 @@ spec:
|
||||
initialDelaySeconds: 180
|
||||
timeoutSeconds: 5
|
||||
- name: heapster-nanny
|
||||
image: gcr.io/google_containers/addon-resizer:2.0
|
||||
image: k8s.gcr.io/addon-resizer:1.7
|
||||
command:
|
||||
- /pod_nanny
|
||||
- --cpu=80m
|
||||
- --extra-cpu=0.5m
|
||||
- --memory=140Mi
|
||||
- --extra-memory=4Mi
|
||||
- --threshold=5
|
||||
- --deployment=heapster
|
||||
- --container=heapster
|
||||
- --poll-period=300000
|
||||
- --estimator=exponential
|
||||
env:
|
||||
- name: MY_POD_NAME
|
||||
valueFrom:
|
||||
|
13
addons/heapster/role-binding.yaml
Normal file
13
addons/heapster/role-binding.yaml
Normal file
@ -0,0 +1,13 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: heapster
|
||||
namespace: kube-system
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: Role
|
||||
name: system:pod-nanny
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: heapster
|
||||
namespace: kube-system
|
19
addons/heapster/role.yaml
Normal file
19
addons/heapster/role.yaml
Normal file
@ -0,0 +1,19 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: system:pod-nanny
|
||||
namespace: kube-system
|
||||
rules:
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- pods
|
||||
verbs:
|
||||
- get
|
||||
- apiGroups:
|
||||
- "extensions"
|
||||
resources:
|
||||
- deployments
|
||||
verbs:
|
||||
- get
|
||||
- update
|
5
addons/heapster/service-account.yaml
Normal file
5
addons/heapster/service-account.yaml
Normal file
@ -0,0 +1,5 @@
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: heapster
|
||||
namespace: kube-system
|
@ -1,10 +1,14 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: default-backend
|
||||
namespace: ingress
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: default-backend
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: nginx-ingress-controller
|
||||
@ -8,6 +8,10 @@ spec:
|
||||
strategy:
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: nginx-ingress-controller
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
@ -19,7 +23,7 @@ spec:
|
||||
hostNetwork: true
|
||||
containers:
|
||||
- name: nginx-ingress-controller
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0
|
||||
args:
|
||||
- /nginx-ingress-controller
|
||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
roleRef:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: nginx-ingress-controller
|
||||
@ -8,6 +8,10 @@ spec:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: nginx-ingress-controller
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
@ -19,7 +23,7 @@ spec:
|
||||
hostNetwork: true
|
||||
containers:
|
||||
- name: nginx-ingress-controller
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0
|
||||
args:
|
||||
- /nginx-ingress-controller
|
||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
||||
|
@ -1,10 +1,14 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: default-backend
|
||||
namespace: ingress
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: default-backend
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
roleRef:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -1,10 +1,14 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: default-backend
|
||||
namespace: ingress
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: default-backend
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: nginx-ingress-controller
|
||||
@ -8,6 +8,10 @@ spec:
|
||||
strategy:
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: nginx-ingress-controller
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
@ -19,7 +23,7 @@ spec:
|
||||
hostNetwork: true
|
||||
containers:
|
||||
- name: nginx-ingress-controller
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0
|
||||
args:
|
||||
- /nginx-ingress-controller
|
||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
roleRef:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -39,7 +39,7 @@ data:
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
# Using endpoints to discover kube-apiserver targets finds the pod IP
|
||||
# (host IP since apiserver is uses host network) which is not used in
|
||||
# (host IP since apiserver uses host network) which is not used in
|
||||
# the server certificate.
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
@ -51,6 +51,9 @@ data:
|
||||
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- replacement: apiserver
|
||||
action: replace
|
||||
target_label: job
|
||||
|
||||
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
|
||||
# metrics from a node by scraping kubelet (127.0.0.1:10255/metrics).
|
||||
@ -59,7 +62,7 @@ data:
|
||||
# Kubernetes apiserver. This means it will work if Prometheus is running out of
|
||||
# cluster, or can't connect to nodes for some other reason (e.g. because of
|
||||
# firewalling).
|
||||
- job_name: 'kubernetes-nodes'
|
||||
- job_name: 'kubelet'
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
|
||||
@ -149,7 +152,7 @@ data:
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
target_label: job
|
||||
|
||||
# Example scrape config for probing services via the Blackbox Exporter.
|
||||
#
|
||||
@ -181,7 +184,7 @@ data:
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
target_label: kubernetes_name
|
||||
target_label: job
|
||||
|
||||
# Example scrape config for pods
|
||||
#
|
||||
|
@ -1,22 +1,24 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: prometheus
|
||||
namespace: monitoring
|
||||
spec:
|
||||
replicas: 1
|
||||
strategy:
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: prometheus
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
name: prometheus
|
||||
phase: prod
|
||||
spec:
|
||||
serviceAccountName: prometheus
|
||||
containers:
|
||||
- name: prometheus
|
||||
image: quay.io/prometheus/prometheus:v2.0.0
|
||||
image: quay.io/prometheus/prometheus:v2.2.1
|
||||
args:
|
||||
- '--config.file=/etc/prometheus/prometheus.yaml'
|
||||
ports:
|
||||
|
@ -12,7 +12,9 @@ rules:
|
||||
- replicationcontrollers
|
||||
- limitranges
|
||||
- persistentvolumeclaims
|
||||
- persistentvolumes
|
||||
- namespaces
|
||||
- endpoints
|
||||
verbs: ["list", "watch"]
|
||||
- apiGroups: ["extensions"]
|
||||
resources:
|
||||
@ -29,3 +31,7 @@ rules:
|
||||
- cronjobs
|
||||
- jobs
|
||||
verbs: ["list", "watch"]
|
||||
- apiGroups: ["autoscaling"]
|
||||
resources:
|
||||
- horizontalpodautoscalers
|
||||
verbs: ["list", "watch"]
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: apps/v1beta2
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: kube-state-metrics
|
||||
@ -22,7 +22,7 @@ spec:
|
||||
serviceAccountName: kube-state-metrics
|
||||
containers:
|
||||
- name: kube-state-metrics
|
||||
image: quay.io/coreos/kube-state-metrics:v1.1.0
|
||||
image: quay.io/coreos/kube-state-metrics:v1.2.0
|
||||
ports:
|
||||
- name: metrics
|
||||
containerPort: 8080
|
||||
@ -33,7 +33,7 @@ spec:
|
||||
initialDelaySeconds: 5
|
||||
timeoutSeconds: 5
|
||||
- name: addon-resizer
|
||||
image: gcr.io/google_containers/addon-resizer:1.0
|
||||
image: gcr.io/google_containers/addon-resizer:1.7
|
||||
resources:
|
||||
limits:
|
||||
cpu: 100m
|
||||
|
@ -15,5 +15,5 @@ spec:
|
||||
ports:
|
||||
- name: metrics
|
||||
protocol: TCP
|
||||
port: 80
|
||||
port: 8080
|
||||
targetPort: 8080
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: apps/v1beta2
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: node-exporter
|
||||
@ -18,11 +18,15 @@ spec:
|
||||
name: node-exporter
|
||||
phase: prod
|
||||
spec:
|
||||
serviceAccountName: node-exporter
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 65534
|
||||
hostNetwork: true
|
||||
hostPID: true
|
||||
containers:
|
||||
- name: node-exporter
|
||||
image: quay.io/prometheus/node-exporter:v0.15.0
|
||||
image: quay.io/prometheus/node-exporter:v0.15.2
|
||||
args:
|
||||
- "--path.procfs=/host/proc"
|
||||
- "--path.sysfs=/host/sys"
|
||||
@ -45,9 +49,8 @@ spec:
|
||||
mountPath: /host/sys
|
||||
readOnly: true
|
||||
tolerations:
|
||||
- key: node-role.kubernetes.io/master
|
||||
- effect: NoSchedule
|
||||
operator: Exists
|
||||
effect: NoSchedule
|
||||
volumes:
|
||||
- name: proc
|
||||
hostPath:
|
||||
|
@ -0,0 +1,5 @@
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: node-exporter
|
||||
namespace: monitoring
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: prometheus
|
||||
@ -8,5 +8,5 @@ roleRef:
|
||||
name: prometheus
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: default
|
||||
name: prometheus
|
||||
namespace: monitoring
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: prometheus
|
||||
|
@ -4,10 +4,9 @@ metadata:
|
||||
name: prometheus-rules
|
||||
namespace: monitoring
|
||||
data:
|
||||
# Rules adapted from those provided by coreos/prometheus-operator and SoundCloud
|
||||
alertmanager.rules.yaml: |+
|
||||
alertmanager.rules.yaml: |
|
||||
groups:
|
||||
- name: ./alertmanager.rules
|
||||
- name: alertmanager.rules
|
||||
rules:
|
||||
- alert: AlertmanagerConfigInconsistent
|
||||
expr: count_values("config_hash", alertmanager_config_hash) BY (service) / ON(service)
|
||||
@ -19,7 +18,6 @@ data:
|
||||
annotations:
|
||||
description: The configuration of the instances of the Alertmanager cluster
|
||||
`{{$labels.service}}` are out of sync.
|
||||
summary: Alertmanager configurations are inconsistent
|
||||
- alert: AlertmanagerDownOrMissing
|
||||
expr: label_replace(prometheus_operator_alertmanager_spec_replicas, "job", "alertmanager-$1",
|
||||
"alertmanager", "(.*)") / ON(job) GROUP_RIGHT() sum(up) BY (job) != 1
|
||||
@ -29,8 +27,7 @@ data:
|
||||
annotations:
|
||||
description: An unexpected number of Alertmanagers are scraped or Alertmanagers
|
||||
disappeared from discovery.
|
||||
summary: Alertmanager down or not discovered
|
||||
- alert: FailedReload
|
||||
- alert: AlertmanagerFailedReload
|
||||
expr: alertmanager_config_last_reload_successful == 0
|
||||
for: 10m
|
||||
labels:
|
||||
@ -38,8 +35,7 @@ data:
|
||||
annotations:
|
||||
description: Reloading Alertmanager's configuration has failed for {{ $labels.namespace
|
||||
}}/{{ $labels.pod}}.
|
||||
summary: Alertmanager configuration reload has failed
|
||||
etcd3.rules.yaml: |+
|
||||
etcd3.rules.yaml: |
|
||||
groups:
|
||||
- name: ./etcd3.rules
|
||||
rules:
|
||||
@ -68,8 +64,8 @@ data:
|
||||
changes within the last hour
|
||||
summary: a high number of leader changes within the etcd cluster are happening
|
||||
- alert: HighNumberOfFailedGRPCRequests
|
||||
expr: sum(rate(etcd_grpc_requests_failed_total{job="etcd"}[5m])) BY (grpc_method)
|
||||
/ sum(rate(etcd_grpc_total{job="etcd"}[5m])) BY (grpc_method) > 0.01
|
||||
expr: sum(rate(grpc_server_handled_total{grpc_code!="OK",job="etcd"}[5m])) BY (grpc_service, grpc_method)
|
||||
/ sum(rate(grpc_server_handled_total{job="etcd"}[5m])) BY (grpc_service, grpc_method) > 0.01
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
@ -78,8 +74,8 @@ data:
|
||||
on etcd instance {{ $labels.instance }}'
|
||||
summary: a high number of gRPC requests are failing
|
||||
- alert: HighNumberOfFailedGRPCRequests
|
||||
expr: sum(rate(etcd_grpc_requests_failed_total{job="etcd"}[5m])) BY (grpc_method)
|
||||
/ sum(rate(etcd_grpc_total{job="etcd"}[5m])) BY (grpc_method) > 0.05
|
||||
expr: sum(rate(grpc_server_handled_total{grpc_code!="OK",job="etcd"}[5m])) BY (grpc_service, grpc_method)
|
||||
/ sum(rate(grpc_server_handled_total{job="etcd"}[5m])) BY (grpc_service, grpc_method) > 0.05
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
@ -88,7 +84,7 @@ data:
|
||||
on etcd instance {{ $labels.instance }}'
|
||||
summary: a high number of gRPC requests are failing
|
||||
- alert: GRPCRequestsSlow
|
||||
expr: histogram_quantile(0.99, rate(etcd_grpc_unary_requests_duration_seconds_bucket[5m]))
|
||||
expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job="etcd",grpc_type="unary"}[5m])) by (grpc_service, grpc_method, le))
|
||||
> 0.15
|
||||
for: 10m
|
||||
labels:
|
||||
@ -128,7 +124,7 @@ data:
|
||||
}} are slow
|
||||
summary: slow HTTP requests
|
||||
- alert: EtcdMemberCommunicationSlow
|
||||
expr: histogram_quantile(0.99, rate(etcd_network_member_round_trip_time_seconds_bucket[5m]))
|
||||
expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[5m]))
|
||||
> 0.15
|
||||
for: 10m
|
||||
labels:
|
||||
@ -163,9 +159,9 @@ data:
|
||||
annotations:
|
||||
description: etcd instance {{ $labels.instance }} commit durations are high
|
||||
summary: high commit durations
|
||||
general.rules.yaml: |+
|
||||
general.rules.yaml: |
|
||||
groups:
|
||||
- name: ./general.rules
|
||||
- name: general.rules
|
||||
rules:
|
||||
- alert: TargetDown
|
||||
expr: 100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10
|
||||
@ -173,66 +169,34 @@ data:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{ $value }}% or more of {{ $labels.job }} targets are down.'
|
||||
description: '{{ $value }}% of {{ $labels.job }} targets are down.'
|
||||
summary: Targets are down
|
||||
- alert: TooManyOpenFileDescriptors
|
||||
expr: 100 * (process_open_fds / process_max_fds) > 95
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
|
||||
$labels.instance }}) is using {{ $value }}% of the available file/socket descriptors.'
|
||||
summary: too many open file descriptors
|
||||
- record: instance:fd_utilization
|
||||
- record: fd_utilization
|
||||
expr: process_open_fds / process_max_fds
|
||||
- alert: FdExhaustionClose
|
||||
expr: predict_linear(instance:fd_utilization[1h], 3600 * 4) > 1
|
||||
expr: predict_linear(fd_utilization[1h], 3600 * 4) > 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
|
||||
$labels.instance }}) instance will exhaust in file/socket descriptors soon'
|
||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
|
||||
will exhaust in file/socket descriptors within the next 4 hours'
|
||||
summary: file descriptors soon exhausted
|
||||
- alert: FdExhaustionClose
|
||||
expr: predict_linear(instance:fd_utilization[10m], 3600) > 1
|
||||
expr: predict_linear(fd_utilization[10m], 3600) > 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
|
||||
$labels.instance }}) instance will exhaust in file/socket descriptors soon'
|
||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
|
||||
will exhaust in file/socket descriptors within the next hour'
|
||||
summary: file descriptors soon exhausted
|
||||
kube-apiserver.rules.yaml: |+
|
||||
kube-controller-manager.rules.yaml: |
|
||||
groups:
|
||||
- name: ./kube-apiserver.rules
|
||||
rules:
|
||||
- alert: K8SApiserverDown
|
||||
expr: absent(up{job="kubernetes-apiservers"} == 1)
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: Prometheus failed to scrape API server(s), or all API servers have
|
||||
disappeared from service discovery.
|
||||
summary: API server unreachable
|
||||
- alert: K8SApiServerLatency
|
||||
expr: histogram_quantile(0.99, sum(apiserver_request_latencies_bucket{subresource!="log",verb!~"^(?:CONNECT|WATCHLIST|WATCH|PROXY)$"})
|
||||
WITHOUT (instance, resource)) / 1e+06 > 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: 99th percentile Latency for {{ $labels.verb }} requests to the
|
||||
kube-apiserver is higher than 1s.
|
||||
summary: Kubernetes apiserver latency is high
|
||||
kube-controller-manager.rules.yaml: |+
|
||||
groups:
|
||||
- name: ./kube-controller-manager.rules
|
||||
- name: kube-controller-manager.rules
|
||||
rules:
|
||||
- alert: K8SControllerManagerDown
|
||||
expr: absent(up{kubernetes_name="kube-controller-manager"} == 1)
|
||||
expr: absent(up{job="kube-controller-manager"} == 1)
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
@ -240,12 +204,57 @@ data:
|
||||
description: There is no running K8S controller manager. Deployments and replication
|
||||
controllers are not making progress.
|
||||
summary: Controller manager is down
|
||||
kube-scheduler.rules.yaml: |+
|
||||
kube-scheduler.rules.yaml: |
|
||||
groups:
|
||||
- name: ./kube-scheduler.rules
|
||||
- name: kube-scheduler.rules
|
||||
rules:
|
||||
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- record: cluster:scheduler_binding_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_binding_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_binding_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- alert: K8SSchedulerDown
|
||||
expr: absent(up{kubernetes_name="kube-scheduler"} == 1)
|
||||
expr: absent(up{job="kube-scheduler"} == 1)
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
@ -253,9 +262,69 @@ data:
|
||||
description: There is no running K8S scheduler. New pods are not being assigned
|
||||
to nodes.
|
||||
summary: Scheduler is down
|
||||
kubelet.rules.yaml: |+
|
||||
kube-state-metrics.rules.yaml: |
|
||||
groups:
|
||||
- name: ./kubelet.rules
|
||||
- name: kube-state-metrics.rules
|
||||
rules:
|
||||
- alert: DeploymentGenerationMismatch
|
||||
expr: kube_deployment_status_observed_generation != kube_deployment_metadata_generation
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Observed deployment generation does not match expected one for
|
||||
deployment {{$labels.namespaces}}/{{$labels.deployment}}
|
||||
summary: Deployment is outdated
|
||||
- alert: DeploymentReplicasNotUpdated
|
||||
expr: ((kube_deployment_status_replicas_updated != kube_deployment_spec_replicas)
|
||||
or (kube_deployment_status_replicas_available != kube_deployment_spec_replicas))
|
||||
unless (kube_deployment_spec_paused == 1)
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Replicas are not updated and available for deployment {{$labels.namespaces}}/{{$labels.deployment}}
|
||||
summary: Deployment replicas are outdated
|
||||
- alert: DaemonSetRolloutStuck
|
||||
expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled
|
||||
* 100 < 100
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Only {{$value}}% of desired pods scheduled and ready for daemon
|
||||
set {{$labels.namespaces}}/{{$labels.daemonset}}
|
||||
summary: DaemonSet is missing pods
|
||||
- alert: K8SDaemonSetsNotScheduled
|
||||
expr: kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled
|
||||
> 0
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: A number of daemonsets are not scheduled.
|
||||
summary: Daemonsets are not scheduled correctly
|
||||
- alert: DaemonSetsMissScheduled
|
||||
expr: kube_daemonset_status_number_misscheduled > 0
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: A number of daemonsets are running where they are not supposed
|
||||
to run.
|
||||
summary: Daemonsets are not scheduled correctly
|
||||
- alert: PodFrequentlyRestarting
|
||||
expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Pod {{$labels.namespaces}}/{{$labels.pod}} is was restarted {{$value}}
|
||||
times within the last hour
|
||||
summary: Pod is restarting frequently
|
||||
kubelet.rules.yaml: |
|
||||
groups:
|
||||
- name: kubelet.rules
|
||||
rules:
|
||||
- alert: K8SNodeNotReady
|
||||
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
|
||||
@ -274,20 +343,17 @@ data:
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: '{{ $value }} Kubernetes nodes (more than 10% are in the NotReady
|
||||
state).'
|
||||
summary: Many Kubernetes nodes are Not Ready
|
||||
description: '{{ $value }}% of Kubernetes nodes are not ready'
|
||||
- alert: K8SKubeletDown
|
||||
expr: count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"}) > 0.03
|
||||
expr: count(up{job="kubelet"} == 0) / count(up{job="kubelet"}) * 100 > 3
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Prometheus failed to scrape {{ $value }}% of kubelets.
|
||||
summary: Many Kubelets cannot be scraped
|
||||
- alert: K8SKubeletDown
|
||||
expr: absent(up{job="kubernetes-nodes"} == 1) or count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"})
|
||||
> 0.1
|
||||
expr: (absent(up{job="kubelet"} == 1) or count(up{job="kubelet"} == 0) / count(up{job="kubelet"}))
|
||||
* 100 > 10
|
||||
for: 1h
|
||||
labels:
|
||||
severity: critical
|
||||
@ -297,176 +363,236 @@ data:
|
||||
summary: Many Kubelets cannot be scraped
|
||||
- alert: K8SKubeletTooManyPods
|
||||
expr: kubelet_running_pod_count > 100
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Kubelet {{$labels.instance}} is running {{$value}} pods, close
|
||||
to the limit of 110
|
||||
summary: Kubelet is close to pod limit
|
||||
kubernetes.rules.yaml: |+
|
||||
kubernetes.rules.yaml: |
|
||||
groups:
|
||||
- name: ./kubernetes.rules
|
||||
- name: kubernetes.rules
|
||||
rules:
|
||||
- record: cluster_namespace_controller_pod_container:spec_memory_limit_bytes
|
||||
expr: sum(label_replace(container_spec_memory_limit_bytes{container_name!=""},
|
||||
"controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
|
||||
controller, pod_name, container_name)
|
||||
- record: cluster_namespace_controller_pod_container:spec_cpu_shares
|
||||
expr: sum(label_replace(container_spec_cpu_shares{container_name!=""}, "controller",
|
||||
"$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
|
||||
container_name)
|
||||
- record: cluster_namespace_controller_pod_container:cpu_usage:rate
|
||||
expr: sum(label_replace(irate(container_cpu_usage_seconds_total{container_name!=""}[5m]),
|
||||
"controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
|
||||
controller, pod_name, container_name)
|
||||
- record: cluster_namespace_controller_pod_container:memory_usage:bytes
|
||||
expr: sum(label_replace(container_memory_usage_bytes{container_name!=""}, "controller",
|
||||
"$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
|
||||
container_name)
|
||||
- record: cluster_namespace_controller_pod_container:memory_working_set:bytes
|
||||
expr: sum(label_replace(container_memory_working_set_bytes{container_name!=""},
|
||||
"controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
|
||||
controller, pod_name, container_name)
|
||||
- record: cluster_namespace_controller_pod_container:memory_rss:bytes
|
||||
expr: sum(label_replace(container_memory_rss{container_name!=""}, "controller",
|
||||
"$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
|
||||
container_name)
|
||||
- record: cluster_namespace_controller_pod_container:memory_cache:bytes
|
||||
expr: sum(label_replace(container_memory_cache{container_name!=""}, "controller",
|
||||
"$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
|
||||
container_name)
|
||||
- record: cluster_namespace_controller_pod_container:disk_usage:bytes
|
||||
expr: sum(label_replace(container_disk_usage_bytes{container_name!=""}, "controller",
|
||||
"$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
|
||||
container_name)
|
||||
- record: cluster_namespace_controller_pod_container:memory_pagefaults:rate
|
||||
expr: sum(label_replace(irate(container_memory_failures_total{container_name!=""}[5m]),
|
||||
"controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
|
||||
controller, pod_name, container_name, scope, type)
|
||||
- record: cluster_namespace_controller_pod_container:memory_oom:rate
|
||||
expr: sum(label_replace(irate(container_memory_failcnt{container_name!=""}[5m]),
|
||||
"controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
|
||||
controller, pod_name, container_name, scope, type)
|
||||
- record: cluster:memory_allocation:percent
|
||||
expr: 100 * sum(container_spec_memory_limit_bytes{pod_name!=""}) BY (cluster)
|
||||
/ sum(machine_memory_bytes) BY (cluster)
|
||||
- record: cluster:memory_used:percent
|
||||
expr: 100 * sum(container_memory_usage_bytes{pod_name!=""}) BY (cluster) / sum(machine_memory_bytes)
|
||||
BY (cluster)
|
||||
- record: cluster:cpu_allocation:percent
|
||||
expr: 100 * sum(container_spec_cpu_shares{pod_name!=""}) BY (cluster) / sum(container_spec_cpu_shares{id="/"}
|
||||
* ON(cluster, instance) machine_cpu_cores) BY (cluster)
|
||||
- record: cluster:node_cpu_use:percent
|
||||
expr: 100 * sum(rate(node_cpu{mode!="idle"}[5m])) BY (cluster) / sum(machine_cpu_cores)
|
||||
BY (cluster)
|
||||
- record: cluster_resource_verb:apiserver_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.99, sum(apiserver_request_latencies_bucket) BY (le,
|
||||
cluster, job, resource, verb)) / 1e+06
|
||||
- record: pod_name:container_memory_usage_bytes:sum
|
||||
expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
|
||||
(pod_name)
|
||||
- record: pod_name:container_spec_cpu_shares:sum
|
||||
expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) BY (pod_name)
|
||||
- record: pod_name:container_cpu_usage:sum
|
||||
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
|
||||
BY (pod_name)
|
||||
- record: pod_name:container_fs_usage_bytes:sum
|
||||
expr: sum(container_fs_usage_bytes{container_name!="POD",pod_name!=""}) BY (pod_name)
|
||||
- record: namespace:container_memory_usage_bytes:sum
|
||||
expr: sum(container_memory_usage_bytes{container_name!=""}) BY (namespace)
|
||||
- record: namespace:container_spec_cpu_shares:sum
|
||||
expr: sum(container_spec_cpu_shares{container_name!=""}) BY (namespace)
|
||||
- record: namespace:container_cpu_usage:sum
|
||||
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m]))
|
||||
BY (namespace)
|
||||
- record: cluster:memory_usage:ratio
|
||||
expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
|
||||
(cluster) / sum(machine_memory_bytes) BY (cluster)
|
||||
- record: cluster:container_spec_cpu_shares:ratio
|
||||
expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) / 1000
|
||||
/ sum(machine_cpu_cores)
|
||||
- record: cluster:container_cpu_usage:ratio
|
||||
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
|
||||
/ sum(machine_cpu_cores)
|
||||
- record: apiserver_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.99, rate(apiserver_request_latencies_bucket[5m])) /
|
||||
1e+06
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster_resource_verb:apiserver_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.9, sum(apiserver_request_latencies_bucket) BY (le,
|
||||
cluster, job, resource, verb)) / 1e+06
|
||||
- record: apiserver_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.9, rate(apiserver_request_latencies_bucket[5m])) /
|
||||
1e+06
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster_resource_verb:apiserver_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.5, sum(apiserver_request_latencies_bucket) BY (le,
|
||||
cluster, job, resource, verb)) / 1e+06
|
||||
- record: apiserver_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.5, rate(apiserver_request_latencies_bucket[5m])) /
|
||||
1e+06
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
- alert: APIServerLatencyHigh
|
||||
expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
|
||||
> 1
|
||||
for: 10m
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: warning
|
||||
annotations:
|
||||
description: the API server has a 99th percentile latency of {{ $value }} seconds
|
||||
for {{$labels.verb}} {{$labels.resource}}
|
||||
- alert: APIServerLatencyHigh
|
||||
expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
|
||||
> 4
|
||||
for: 10m
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: critical
|
||||
annotations:
|
||||
description: the API server has a 99th percentile latency of {{ $value }} seconds
|
||||
for {{$labels.verb}} {{$labels.resource}}
|
||||
- alert: APIServerErrorsHigh
|
||||
expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
|
||||
* 100 > 2
|
||||
for: 10m
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: warning
|
||||
annotations:
|
||||
description: API server returns errors for {{ $value }}% of requests
|
||||
- alert: APIServerErrorsHigh
|
||||
expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
|
||||
* 100 > 5
|
||||
for: 10m
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: critical
|
||||
annotations:
|
||||
description: API server returns errors for {{ $value }}% of requests
|
||||
- alert: K8SApiserverDown
|
||||
expr: absent(up{job="apiserver"} == 1)
|
||||
for: 20m
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: critical
|
||||
annotations:
|
||||
description: No API servers are reachable or all have disappeared from service
|
||||
discovery
|
||||
|
||||
- alert: K8sCertificateExpirationNotice
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- record: cluster:scheduler_binding_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Kubernetes API Certificate is expiring soon (less than 7 days)
|
||||
expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="604800"}) > 0
|
||||
|
||||
- alert: K8sCertificateExpirationNotice
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_binding_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_binding_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
node.rules.yaml: |+
|
||||
severity: critical
|
||||
annotations:
|
||||
description: Kubernetes API Certificate is expiring in less than 1 day
|
||||
expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="86400"}) > 0
|
||||
node.rules.yaml: |
|
||||
groups:
|
||||
- name: ./node.rules
|
||||
- name: node.rules
|
||||
rules:
|
||||
- record: instance:node_cpu:rate:sum
|
||||
expr: sum(rate(node_cpu{mode!="idle",mode!="iowait",mode!~"^(?:guest.*)$"}[3m]))
|
||||
BY (instance)
|
||||
- record: instance:node_filesystem_usage:sum
|
||||
expr: sum((node_filesystem_size{mountpoint="/"} - node_filesystem_free{mountpoint="/"}))
|
||||
BY (instance)
|
||||
- record: instance:node_network_receive_bytes:rate:sum
|
||||
expr: sum(rate(node_network_receive_bytes[3m])) BY (instance)
|
||||
- record: instance:node_network_transmit_bytes:rate:sum
|
||||
expr: sum(rate(node_network_transmit_bytes[3m])) BY (instance)
|
||||
- record: instance:node_cpu:ratio
|
||||
expr: sum(rate(node_cpu{mode!="idle"}[5m])) WITHOUT (cpu, mode) / ON(instance)
|
||||
GROUP_LEFT() count(sum(node_cpu) BY (instance, cpu)) BY (instance)
|
||||
- record: cluster:node_cpu:sum_rate5m
|
||||
expr: sum(rate(node_cpu{mode!="idle"}[5m]))
|
||||
- record: cluster:node_cpu:ratio
|
||||
expr: cluster:node_cpu:rate5m / count(sum(node_cpu) BY (instance, cpu))
|
||||
- alert: NodeExporterDown
|
||||
expr: absent(up{kubernetes_name="node-exporter"} == 1)
|
||||
expr: absent(up{job="node-exporter"} == 1)
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Prometheus could not scrape a node-exporter for more than 10m,
|
||||
or node-exporters have disappeared from discovery.
|
||||
summary: node-exporter cannot be scraped
|
||||
- alert: K8SNodeOutOfDisk
|
||||
expr: kube_node_status_condition{condition="OutOfDisk",status="true"} == 1
|
||||
or node-exporters have disappeared from discovery
|
||||
- alert: NodeDiskRunningFull
|
||||
expr: predict_linear(node_filesystem_free[6h], 3600 * 24) < 0
|
||||
for: 30m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: device {{$labels.device}} on node {{$labels.instance}} is running
|
||||
full within the next 24 hours (mounted at {{$labels.mountpoint}})
|
||||
- alert: NodeDiskRunningFull
|
||||
expr: predict_linear(node_filesystem_free[30m], 3600 * 2) < 0
|
||||
for: 10m
|
||||
labels:
|
||||
service: k8s
|
||||
severity: critical
|
||||
annotations:
|
||||
description: '{{ $labels.node }} has run out of disk space.'
|
||||
summary: Node ran out of disk space.
|
||||
- alert: K8SNodeMemoryPressure
|
||||
expr: kube_node_status_condition{condition="MemoryPressure",status="true"} ==
|
||||
1
|
||||
labels:
|
||||
service: k8s
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{ $labels.node }} is under memory pressure.'
|
||||
summary: Node is under memory pressure.
|
||||
- alert: K8SNodeDiskPressure
|
||||
expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
|
||||
labels:
|
||||
service: k8s
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{ $labels.node }} is under disk pressure.'
|
||||
summary: Node is under disk pressure.
|
||||
prometheus.rules.yaml: |+
|
||||
description: device {{$labels.device}} on node {{$labels.instance}} is running
|
||||
full within the next 2 hours (mounted at {{$labels.mountpoint}})
|
||||
prometheus.rules.yaml: |
|
||||
groups:
|
||||
- name: ./prometheus.rules
|
||||
- name: prometheus.rules
|
||||
rules:
|
||||
- alert: FailedReload
|
||||
- alert: PrometheusConfigReloadFailed
|
||||
expr: prometheus_config_last_reload_successful == 0
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Reloading Prometheus' configuration has failed for {{ $labels.namespace
|
||||
}}/{{ $labels.pod}}.
|
||||
summary: Prometheus configuration reload has failed
|
||||
description: Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}}
|
||||
- alert: PrometheusNotificationQueueRunningFull
|
||||
expr: predict_linear(prometheus_notifications_queue_length[5m], 60 * 30) > prometheus_notifications_queue_capacity
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Prometheus' alert notification queue is running full for {{$labels.namespace}}/{{
|
||||
$labels.pod}}
|
||||
- alert: PrometheusErrorSendingAlerts
|
||||
expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
|
||||
> 0.01
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
|
||||
$labels.pod}} to Alertmanager {{$labels.Alertmanager}}
|
||||
- alert: PrometheusErrorSendingAlerts
|
||||
expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
|
||||
> 0.03
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
|
||||
$labels.pod}} to Alertmanager {{$labels.Alertmanager}}
|
||||
- alert: PrometheusNotConnectedToAlertmanagers
|
||||
expr: prometheus_notifications_alertmanagers_discovered < 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected
|
||||
to any Alertmanagers
|
||||
- alert: PrometheusTSDBReloadsFailing
|
||||
expr: increase(prometheus_tsdb_reloads_failures_total[2h]) > 0
|
||||
for: 12h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}
|
||||
reload failures over the last four hours.'
|
||||
summary: Prometheus has issues reloading data blocks from disk
|
||||
- alert: PrometheusTSDBCompactionsFailing
|
||||
expr: increase(prometheus_tsdb_compactions_failed_total[2h]) > 0
|
||||
for: 12h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}
|
||||
compaction failures over the last four hours.'
|
||||
summary: Prometheus has issues compacting sample blocks
|
||||
- alert: PrometheusTSDBWALCorruptions
|
||||
expr: tsdb_wal_corruptions_total > 0
|
||||
for: 4h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{$labels.job}} at {{$labels.instance}} has a corrupted write-ahead
|
||||
log (WAL).'
|
||||
summary: Prometheus write-ahead log is corrupted
|
||||
- alert: PrometheusNotIngestingSamples
|
||||
expr: rate(prometheus_tsdb_head_samples_appended_total[5m]) <= 0
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: "Prometheus {{ $labels.namespace }}/{{ $labels.pod}} isn't ingesting samples."
|
||||
summary: "Prometheus isn't ingesting samples"
|
||||
|
5
addons/prometheus/service-account.yaml
Normal file
5
addons/prometheus/service-account.yaml
Normal file
@ -0,0 +1,5 @@
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: prometheus
|
||||
namespace: monitoring
|
@ -3,6 +3,8 @@ kind: Service
|
||||
metadata:
|
||||
name: prometheus
|
||||
namespace: monitoring
|
||||
annotations:
|
||||
prometheus.io/scrape: 'true'
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Typhoon
|
||||
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
|
||||
|
||||
Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
@ -9,12 +9,13 @@ Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
|
||||
|
||||
## Features
|
||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||
|
||||
* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Kubernetes v1.9.6 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/)
|
||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
|
||||
## Docs
|
||||
|
||||
|
69
aws/container-linux/kubernetes/apiserver.tf
Normal file
69
aws/container-linux/kubernetes/apiserver.tf
Normal file
@ -0,0 +1,69 @@
|
||||
# kube-apiserver Network Load Balancer DNS Record
|
||||
resource "aws_route53_record" "apiserver" {
|
||||
zone_id = "${var.dns_zone_id}"
|
||||
|
||||
name = "${format("%s.%s.", var.cluster_name, var.dns_zone)}"
|
||||
type = "A"
|
||||
|
||||
# AWS recommends their special "alias" records for ELBs
|
||||
alias {
|
||||
name = "${aws_lb.apiserver.dns_name}"
|
||||
zone_id = "${aws_lb.apiserver.zone_id}"
|
||||
evaluate_target_health = true
|
||||
}
|
||||
}
|
||||
|
||||
# Network Load Balancer for apiservers
|
||||
resource "aws_lb" "apiserver" {
|
||||
name = "${var.cluster_name}-apiserver"
|
||||
load_balancer_type = "network"
|
||||
internal = false
|
||||
|
||||
subnets = ["${aws_subnet.public.*.id}"]
|
||||
|
||||
enable_cross_zone_load_balancing = true
|
||||
}
|
||||
|
||||
# Forward HTTP traffic to controllers
|
||||
resource "aws_lb_listener" "apiserver-https" {
|
||||
load_balancer_arn = "${aws_lb.apiserver.arn}"
|
||||
protocol = "TCP"
|
||||
port = "443"
|
||||
|
||||
default_action {
|
||||
type = "forward"
|
||||
target_group_arn = "${aws_lb_target_group.controllers.arn}"
|
||||
}
|
||||
}
|
||||
|
||||
# Target group of controllers
|
||||
resource "aws_lb_target_group" "controllers" {
|
||||
name = "${var.cluster_name}-controllers"
|
||||
vpc_id = "${aws_vpc.network.id}"
|
||||
target_type = "instance"
|
||||
|
||||
protocol = "TCP"
|
||||
port = 443
|
||||
|
||||
# Kubelet HTTP health check
|
||||
health_check {
|
||||
protocol = "TCP"
|
||||
port = 443
|
||||
|
||||
# NLBs required to use same healthy and unhealthy thresholds
|
||||
healthy_threshold = 3
|
||||
unhealthy_threshold = 3
|
||||
|
||||
# Interval between health checks required to be 10 or 30
|
||||
interval = 10
|
||||
}
|
||||
}
|
||||
|
||||
# Attach controller instances to apiserver NLB
|
||||
resource "aws_lb_target_group_attachment" "controllers" {
|
||||
count = "${var.controller_count}"
|
||||
|
||||
target_group_arn = "${aws_lb_target_group.controllers.arn}"
|
||||
target_id = "${element(aws_instance.controllers.*.id, count.index)}"
|
||||
port = 443
|
||||
}
|
@ -1,13 +1,14 @@
|
||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||
module "bootkube" {
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=88b361207d42ec3121930a4add6b64ba7cf18360"
|
||||
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||
etcd_servers = ["${aws_route53_record.etcds.*.fqdn}"]
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = "${var.network_mtu}"
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||
etcd_servers = ["${aws_route53_record.etcds.*.fqdn}"]
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = "${var.network_mtu}"
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
|
@ -7,7 +7,7 @@ systemd:
|
||||
- name: 40-etcd-cluster.conf
|
||||
contents: |
|
||||
[Service]
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.0"
|
||||
Environment="ETCD_IMAGE_TAG=v3.3.2"
|
||||
Environment="ETCD_NAME=${etcd_name}"
|
||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
||||
@ -41,11 +41,12 @@ systemd:
|
||||
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
|
||||
[Install]
|
||||
RequiredBy=kubelet.service
|
||||
RequiredBy=etcd-member.service
|
||||
- name: kubelet.service
|
||||
enable: true
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -65,6 +66,7 @@ systemd:
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/cni
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
|
||||
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
|
||||
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
ExecStart=/usr/lib/coreos/kubelet-wrapper \
|
||||
@ -72,15 +74,17 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--kubeconfig=/etc/kubernetes/kubeconfig \
|
||||
--lock-file=/var/run/lock/kubelet.lock \
|
||||
--network-plugin=cni \
|
||||
--node-labels=node-role.kubernetes.io/master \
|
||||
--node-labels=node-role.kubernetes.io/controller="true" \
|
||||
--pod-manifest-path=/etc/kubernetes/manifests \
|
||||
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule
|
||||
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
|
||||
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
|
||||
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
@ -106,29 +110,14 @@ storage:
|
||||
mode: 0644
|
||||
contents:
|
||||
inline: |
|
||||
apiVersion: v1
|
||||
kind: Config
|
||||
clusters:
|
||||
- name: local
|
||||
cluster:
|
||||
server: ${kubeconfig_server}
|
||||
certificate-authority-data: ${kubeconfig_ca_cert}
|
||||
users:
|
||||
- name: kubelet
|
||||
user:
|
||||
client-certificate-data: ${kubeconfig_kubelet_cert}
|
||||
client-key-data: ${kubeconfig_kubelet_key}
|
||||
contexts:
|
||||
- context:
|
||||
cluster: local
|
||||
user: kubelet
|
||||
${kubeconfig}
|
||||
- path: /etc/kubernetes/kubelet.env
|
||||
filesystem: root
|
||||
mode: 0644
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.6
|
||||
- path: /etc/sysctl.d/max-user-watches.conf
|
||||
filesystem: root
|
||||
contents:
|
||||
@ -147,11 +136,9 @@ storage:
|
||||
# Wrapper for bootkube start
|
||||
set -e
|
||||
# Move experimental manifests
|
||||
[ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
[ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
|
||||
[ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
|
||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.11.0}"
|
||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
||||
exec /usr/bin/rkt run \
|
||||
--trust-keys-from-https \
|
||||
|
@ -36,6 +36,10 @@ resource "aws_instance" "controllers" {
|
||||
associate_public_ip_address = true
|
||||
subnet_id = "${element(aws_subnet.public.*.id, count.index)}"
|
||||
vpc_security_group_ids = ["${aws_security_group.controller.id}"]
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = ["ami"]
|
||||
}
|
||||
}
|
||||
|
||||
# Controller Container Linux Config
|
||||
@ -52,12 +56,10 @@ data "template_file" "controller_config" {
|
||||
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
|
||||
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
kubeconfig_ca_cert = "${module.bootkube.ca_cert}"
|
||||
kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
|
||||
kubeconfig_kubelet_key = "${module.bootkube.kubelet_key}"
|
||||
kubeconfig_server = "${module.bootkube.server}"
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
kubeconfig = "${indent(10, module.bootkube.kubeconfig)}"
|
||||
}
|
||||
}
|
||||
|
||||
@ -76,186 +78,5 @@ data "ct_config" "controller_ign" {
|
||||
count = "${var.controller_count}"
|
||||
content = "${element(data.template_file.controller_config.*.rendered, count.index)}"
|
||||
pretty_print = false
|
||||
}
|
||||
|
||||
# Security Group (instance firewall)
|
||||
|
||||
resource "aws_security_group" "controller" {
|
||||
name = "${var.cluster_name}-controller"
|
||||
description = "${var.cluster_name} controller security group"
|
||||
|
||||
vpc_id = "${aws_vpc.network.id}"
|
||||
|
||||
tags = "${map("Name", "${var.cluster_name}-controller")}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-icmp" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "icmp"
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-ssh" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 22
|
||||
to_port = 22
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-apiserver" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-etcd" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 2379
|
||||
to_port = 2380
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-flannel" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "udp"
|
||||
from_port = 8472
|
||||
to_port = 8472
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-flannel-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "udp"
|
||||
from_port = 8472
|
||||
to_port = 8472
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-node-exporter" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 9100
|
||||
to_port = 9100
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-kubelet-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10250
|
||||
to_port = 10250
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-kubelet-read" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10255
|
||||
to_port = 10255
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-kubelet-read-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10255
|
||||
to_port = 10255
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-bgp" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 179
|
||||
to_port = 179
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-bgp-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 179
|
||||
to_port = 179
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-ipip" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 4
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-ipip-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 4
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-ipip-legacy" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 94
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-ipip-legacy-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 94
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-egress" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "egress"
|
||||
protocol = "-1"
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
ipv6_cidr_blocks = ["::/0"]
|
||||
snippets = ["${var.controller_clc_snippets}"]
|
||||
}
|
||||
|
@ -1,43 +0,0 @@
|
||||
# kube-apiserver Network Load Balancer DNS Record
|
||||
resource "aws_route53_record" "apiserver" {
|
||||
zone_id = "${var.dns_zone_id}"
|
||||
|
||||
name = "${format("%s.%s.", var.cluster_name, var.dns_zone)}"
|
||||
type = "A"
|
||||
|
||||
# AWS recommends their special "alias" records for ELBs
|
||||
alias {
|
||||
name = "${aws_elb.apiserver.dns_name}"
|
||||
zone_id = "${aws_elb.apiserver.zone_id}"
|
||||
evaluate_target_health = true
|
||||
}
|
||||
}
|
||||
|
||||
# Controller Network Load Balancer
|
||||
resource "aws_elb" "apiserver" {
|
||||
name = "${var.cluster_name}-apiserver"
|
||||
subnets = ["${aws_subnet.public.*.id}"]
|
||||
security_groups = ["${aws_security_group.controller.id}"]
|
||||
|
||||
listener {
|
||||
lb_port = 443
|
||||
lb_protocol = "tcp"
|
||||
instance_port = 443
|
||||
instance_protocol = "tcp"
|
||||
}
|
||||
|
||||
instances = ["${aws_instance.controllers.*.id}"]
|
||||
|
||||
# Kubelet HTTP health check
|
||||
health_check {
|
||||
target = "SSL:443"
|
||||
healthy_threshold = 2
|
||||
unhealthy_threshold = 4
|
||||
timeout = 5
|
||||
interval = 6
|
||||
}
|
||||
|
||||
idle_timeout = 3600
|
||||
connection_draining = true
|
||||
connection_draining_timeout = 300
|
||||
}
|
@ -1,32 +0,0 @@
|
||||
# Ingress Network Load Balancer
|
||||
resource "aws_elb" "ingress" {
|
||||
name = "${var.cluster_name}-ingress"
|
||||
subnets = ["${aws_subnet.public.*.id}"]
|
||||
security_groups = ["${aws_security_group.worker.id}"]
|
||||
|
||||
listener {
|
||||
lb_port = 80
|
||||
lb_protocol = "tcp"
|
||||
instance_port = 80
|
||||
instance_protocol = "tcp"
|
||||
}
|
||||
|
||||
listener {
|
||||
lb_port = 443
|
||||
lb_protocol = "tcp"
|
||||
instance_port = 443
|
||||
instance_protocol = "tcp"
|
||||
}
|
||||
|
||||
# Ingress Controller HTTP health check
|
||||
health_check {
|
||||
target = "HTTP:10254/healthz"
|
||||
healthy_threshold = 2
|
||||
unhealthy_threshold = 4
|
||||
timeout = 5
|
||||
interval = 6
|
||||
}
|
||||
|
||||
connection_draining = true
|
||||
connection_draining_timeout = 300
|
||||
}
|
@ -1,4 +1,25 @@
|
||||
output "ingress_dns_name" {
|
||||
value = "${aws_elb.ingress.dns_name}"
|
||||
description = "DNS name of the ELB for distributing traffic to Ingress controllers"
|
||||
value = "${module.workers.ingress_dns_name}"
|
||||
description = "DNS name of the network load balancer for distributing traffic to Ingress controllers"
|
||||
}
|
||||
|
||||
# Outputs for worker pools
|
||||
|
||||
output "vpc_id" {
|
||||
value = "${aws_vpc.network.id}"
|
||||
description = "ID of the VPC for creating worker instances"
|
||||
}
|
||||
|
||||
output "subnet_ids" {
|
||||
value = ["${aws_subnet.public.*.id}"]
|
||||
description = "List of subnet IDs for creating worker instances"
|
||||
}
|
||||
|
||||
output "worker_security_groups" {
|
||||
value = ["${aws_security_group.worker.id}"]
|
||||
description = "List of worker security group IDs"
|
||||
}
|
||||
|
||||
output "kubeconfig" {
|
||||
value = "${module.bootkube.kubeconfig}"
|
||||
}
|
||||
|
@ -5,7 +5,7 @@ terraform {
|
||||
}
|
||||
|
||||
provider "aws" {
|
||||
version = "~> 1.0"
|
||||
version = "~> 1.11"
|
||||
}
|
||||
|
||||
provider "local" {
|
||||
|
385
aws/container-linux/kubernetes/security.tf
Normal file
385
aws/container-linux/kubernetes/security.tf
Normal file
@ -0,0 +1,385 @@
|
||||
# Security Groups (instance firewalls)
|
||||
|
||||
# Controller security group
|
||||
|
||||
resource "aws_security_group" "controller" {
|
||||
name = "${var.cluster_name}-controller"
|
||||
description = "${var.cluster_name} controller security group"
|
||||
|
||||
vpc_id = "${aws_vpc.network.id}"
|
||||
|
||||
tags = "${map("Name", "${var.cluster_name}-controller")}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-icmp" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "icmp"
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-ssh" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 22
|
||||
to_port = 22
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-apiserver" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-etcd" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 2379
|
||||
to_port = 2380
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-flannel" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "udp"
|
||||
from_port = 8472
|
||||
to_port = 8472
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-flannel-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "udp"
|
||||
from_port = 8472
|
||||
to_port = 8472
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-node-exporter" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 9100
|
||||
to_port = 9100
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-kubelet-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10250
|
||||
to_port = 10250
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-kubelet-read" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10255
|
||||
to_port = 10255
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-kubelet-read-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10255
|
||||
to_port = 10255
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-bgp" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 179
|
||||
to_port = 179
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-bgp-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 179
|
||||
to_port = 179
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-ipip" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 4
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-ipip-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 4
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-ipip-legacy" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 94
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
source_security_group_id = "${aws_security_group.worker.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-ipip-legacy-self" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 94
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "controller-egress" {
|
||||
security_group_id = "${aws_security_group.controller.id}"
|
||||
|
||||
type = "egress"
|
||||
protocol = "-1"
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
ipv6_cidr_blocks = ["::/0"]
|
||||
}
|
||||
|
||||
# Worker security group
|
||||
|
||||
resource "aws_security_group" "worker" {
|
||||
name = "${var.cluster_name}-worker"
|
||||
description = "${var.cluster_name} worker security group"
|
||||
|
||||
vpc_id = "${aws_vpc.network.id}"
|
||||
|
||||
tags = "${map("Name", "${var.cluster_name}-worker")}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-icmp" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "icmp"
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-ssh" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 22
|
||||
to_port = 22
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-http" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 80
|
||||
to_port = 80
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-https" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-flannel" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "udp"
|
||||
from_port = 8472
|
||||
to_port = 8472
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-flannel-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "udp"
|
||||
from_port = 8472
|
||||
to_port = 8472
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-node-exporter" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 9100
|
||||
to_port = 9100
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "ingress-health" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10254
|
||||
to_port = 10254
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-kubelet" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10250
|
||||
to_port = 10250
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-kubelet-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10250
|
||||
to_port = 10250
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-kubelet-read" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10255
|
||||
to_port = 10255
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-kubelet-read-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10255
|
||||
to_port = 10255
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-bgp" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 179
|
||||
to_port = 179
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-bgp-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 179
|
||||
to_port = 179
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-ipip" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 4
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-ipip-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 4
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-ipip-legacy" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 94
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-ipip-legacy-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 94
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-egress" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "egress"
|
||||
protocol = "-1"
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
ipv6_cidr_blocks = ["::/0"]
|
||||
}
|
@ -60,6 +60,18 @@ variable "worker_type" {
|
||||
description = "Worker EC2 instance type"
|
||||
}
|
||||
|
||||
variable "controller_clc_snippets" {
|
||||
type = "list"
|
||||
description = "Controller Container Linux Config snippets"
|
||||
default = []
|
||||
}
|
||||
|
||||
variable "worker_clc_snippets" {
|
||||
type = "list"
|
||||
description = "Worker Container Linux Config snippets"
|
||||
default = []
|
||||
}
|
||||
|
||||
# bootkube assets
|
||||
|
||||
variable "asset_dir" {
|
||||
@ -94,3 +106,9 @@ EOD
|
||||
type = "string"
|
||||
default = "10.3.0.0/16"
|
||||
}
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
@ -1,274 +1,20 @@
|
||||
# Workers AutoScaling Group
|
||||
resource "aws_autoscaling_group" "workers" {
|
||||
name = "${var.cluster_name}-worker ${aws_launch_configuration.worker.name}"
|
||||
load_balancers = ["${aws_elb.ingress.id}"]
|
||||
module "workers" {
|
||||
source = "workers"
|
||||
name = "${var.cluster_name}"
|
||||
|
||||
# count
|
||||
desired_capacity = "${var.worker_count}"
|
||||
min_size = "${var.worker_count}"
|
||||
max_size = "${var.worker_count + 2}"
|
||||
default_cooldown = 30
|
||||
health_check_grace_period = 30
|
||||
|
||||
# network
|
||||
vpc_zone_identifier = ["${aws_subnet.public.*.id}"]
|
||||
|
||||
# template
|
||||
launch_configuration = "${aws_launch_configuration.worker.name}"
|
||||
|
||||
lifecycle {
|
||||
# override the default destroy and replace update behavior
|
||||
create_before_destroy = true
|
||||
ignore_changes = ["image_id"]
|
||||
}
|
||||
|
||||
tags = [{
|
||||
key = "Name"
|
||||
value = "${var.cluster_name}-worker"
|
||||
propagate_at_launch = true
|
||||
}]
|
||||
}
|
||||
|
||||
# Worker template
|
||||
resource "aws_launch_configuration" "worker" {
|
||||
image_id = "${data.aws_ami.coreos.image_id}"
|
||||
instance_type = "${var.worker_type}"
|
||||
|
||||
user_data = "${data.ct_config.worker_ign.rendered}"
|
||||
|
||||
# storage
|
||||
root_block_device {
|
||||
volume_type = "standard"
|
||||
volume_size = "${var.disk_size}"
|
||||
}
|
||||
|
||||
# network
|
||||
# AWS
|
||||
vpc_id = "${aws_vpc.network.id}"
|
||||
subnet_ids = ["${aws_subnet.public.*.id}"]
|
||||
security_groups = ["${aws_security_group.worker.id}"]
|
||||
count = "${var.worker_count}"
|
||||
instance_type = "${var.worker_type}"
|
||||
os_channel = "${var.os_channel}"
|
||||
disk_size = "${var.disk_size}"
|
||||
|
||||
lifecycle {
|
||||
// Override the default destroy and replace update behavior
|
||||
create_before_destroy = true
|
||||
}
|
||||
}
|
||||
|
||||
# Worker Container Linux Config
|
||||
data "template_file" "worker_config" {
|
||||
template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
|
||||
|
||||
vars = {
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
k8s_etcd_service_ip = "${cidrhost(var.service_cidr, 15)}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
kubeconfig_ca_cert = "${module.bootkube.ca_cert}"
|
||||
kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
|
||||
kubeconfig_kubelet_key = "${module.bootkube.kubelet_key}"
|
||||
kubeconfig_server = "${module.bootkube.server}"
|
||||
}
|
||||
}
|
||||
|
||||
data "ct_config" "worker_ign" {
|
||||
content = "${data.template_file.worker_config.rendered}"
|
||||
pretty_print = false
|
||||
}
|
||||
|
||||
# Security Group (instance firewall)
|
||||
|
||||
resource "aws_security_group" "worker" {
|
||||
name = "${var.cluster_name}-worker"
|
||||
description = "${var.cluster_name} worker security group"
|
||||
|
||||
vpc_id = "${aws_vpc.network.id}"
|
||||
|
||||
tags = "${map("Name", "${var.cluster_name}-worker")}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-icmp" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "icmp"
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-ssh" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 22
|
||||
to_port = 22
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-http" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 80
|
||||
to_port = 80
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-https" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-flannel" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "udp"
|
||||
from_port = 8472
|
||||
to_port = 8472
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-flannel-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "udp"
|
||||
from_port = 8472
|
||||
to_port = 8472
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-node-exporter" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 9100
|
||||
to_port = 9100
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-kubelet" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10250
|
||||
to_port = 10250
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-kubelet-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10250
|
||||
to_port = 10250
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-kubelet-read" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10255
|
||||
to_port = 10255
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-kubelet-read-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10255
|
||||
to_port = 10255
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "ingress-health-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 10254
|
||||
to_port = 10254
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-bgp" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 179
|
||||
to_port = 179
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-bgp-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 179
|
||||
to_port = 179
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-ipip" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 4
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-ipip-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 4
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-ipip-legacy" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 94
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
source_security_group_id = "${aws_security_group.controller.id}"
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-ipip-legacy-self" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = 94
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-egress" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "egress"
|
||||
protocol = "-1"
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
ipv6_cidr_blocks = ["::/0"]
|
||||
# configuration
|
||||
kubeconfig = "${module.bootkube.kubeconfig}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
clc_snippets = "${var.worker_clc_snippets}"
|
||||
}
|
||||
|
19
aws/container-linux/kubernetes/workers/ami.tf
Normal file
19
aws/container-linux/kubernetes/workers/ami.tf
Normal file
@ -0,0 +1,19 @@
|
||||
data "aws_ami" "coreos" {
|
||||
most_recent = true
|
||||
owners = ["595879546273"]
|
||||
|
||||
filter {
|
||||
name = "architecture"
|
||||
values = ["x86_64"]
|
||||
}
|
||||
|
||||
filter {
|
||||
name = "virtualization-type"
|
||||
values = ["hvm"]
|
||||
}
|
||||
|
||||
filter {
|
||||
name = "name"
|
||||
values = ["CoreOS-${var.os_channel}-*"]
|
||||
}
|
||||
}
|
@ -22,7 +22,7 @@ systemd:
|
||||
enable: true
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -42,6 +42,7 @@ systemd:
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/cni
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
|
||||
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
|
||||
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
ExecStart=/usr/lib/coreos/kubelet-wrapper \
|
||||
@ -49,14 +50,15 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--kubeconfig=/etc/kubernetes/kubeconfig \
|
||||
--lock-file=/var/run/lock/kubelet.lock \
|
||||
--network-plugin=cni \
|
||||
--node-labels=node-role.kubernetes.io/node \
|
||||
--pod-manifest-path=/etc/kubernetes/manifests
|
||||
--pod-manifest-path=/etc/kubernetes/manifests \
|
||||
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
|
||||
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
@ -81,29 +83,14 @@ storage:
|
||||
mode: 0644
|
||||
contents:
|
||||
inline: |
|
||||
apiVersion: v1
|
||||
kind: Config
|
||||
clusters:
|
||||
- name: local
|
||||
cluster:
|
||||
server: ${kubeconfig_server}
|
||||
certificate-authority-data: ${kubeconfig_ca_cert}
|
||||
users:
|
||||
- name: kubelet
|
||||
user:
|
||||
client-certificate-data: ${kubeconfig_kubelet_cert}
|
||||
client-key-data: ${kubeconfig_kubelet_key}
|
||||
contexts:
|
||||
- context:
|
||||
cluster: local
|
||||
user: kubelet
|
||||
${kubeconfig}
|
||||
- path: /etc/kubernetes/kubelet.env
|
||||
filesystem: root
|
||||
mode: 0644
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.6
|
||||
- path: /etc/sysctl.d/max-user-watches.conf
|
||||
filesystem: root
|
||||
contents:
|
||||
@ -121,7 +108,7 @@ storage:
|
||||
--volume config,kind=host,source=/etc/kubernetes \
|
||||
--mount volume=config,target=/etc/kubernetes \
|
||||
--insecure-options=image \
|
||||
docker://gcr.io/google_containers/hyperkube:v1.8.3 \
|
||||
docker://gcr.io/google_containers/hyperkube:v1.9.6 \
|
||||
--net=host \
|
||||
--dns=host \
|
||||
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
|
82
aws/container-linux/kubernetes/workers/ingress.tf
Normal file
82
aws/container-linux/kubernetes/workers/ingress.tf
Normal file
@ -0,0 +1,82 @@
|
||||
# Network Load Balancer for Ingress
|
||||
resource "aws_lb" "ingress" {
|
||||
name = "${var.name}-ingress"
|
||||
load_balancer_type = "network"
|
||||
internal = false
|
||||
|
||||
subnets = ["${var.subnet_ids}"]
|
||||
|
||||
enable_cross_zone_load_balancing = true
|
||||
}
|
||||
|
||||
# Forward HTTP traffic to workers
|
||||
resource "aws_lb_listener" "ingress-http" {
|
||||
load_balancer_arn = "${aws_lb.ingress.arn}"
|
||||
protocol = "TCP"
|
||||
port = 80
|
||||
|
||||
default_action {
|
||||
type = "forward"
|
||||
target_group_arn = "${aws_lb_target_group.workers-http.arn}"
|
||||
}
|
||||
}
|
||||
|
||||
# Forward HTTPS traffic to workers
|
||||
resource "aws_lb_listener" "ingress-https" {
|
||||
load_balancer_arn = "${aws_lb.ingress.arn}"
|
||||
protocol = "TCP"
|
||||
port = 443
|
||||
|
||||
default_action {
|
||||
type = "forward"
|
||||
target_group_arn = "${aws_lb_target_group.workers-https.arn}"
|
||||
}
|
||||
}
|
||||
|
||||
# Network Load Balancer target groups of instances
|
||||
|
||||
resource "aws_lb_target_group" "workers-http" {
|
||||
name = "${var.name}-workers-http"
|
||||
vpc_id = "${var.vpc_id}"
|
||||
target_type = "instance"
|
||||
|
||||
protocol = "TCP"
|
||||
port = 80
|
||||
|
||||
# Ingress Controller HTTP health check
|
||||
health_check {
|
||||
protocol = "HTTP"
|
||||
port = 10254
|
||||
path = "/healthz"
|
||||
|
||||
# NLBs required to use same healthy and unhealthy thresholds
|
||||
healthy_threshold = 3
|
||||
unhealthy_threshold = 3
|
||||
|
||||
# Interval between health checks required to be 10 or 30
|
||||
interval = 10
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_lb_target_group" "workers-https" {
|
||||
name = "${var.name}-workers-https"
|
||||
vpc_id = "${var.vpc_id}"
|
||||
target_type = "instance"
|
||||
|
||||
protocol = "TCP"
|
||||
port = 443
|
||||
|
||||
# Ingress Controller HTTP health check
|
||||
health_check {
|
||||
protocol = "HTTP"
|
||||
port = 10254
|
||||
path = "/healthz"
|
||||
|
||||
# NLBs required to use same healthy and unhealthy thresholds
|
||||
healthy_threshold = 3
|
||||
unhealthy_threshold = 3
|
||||
|
||||
# Interval between health checks required to be 10 or 30
|
||||
interval = 10
|
||||
}
|
||||
}
|
4
aws/container-linux/kubernetes/workers/outputs.tf
Normal file
4
aws/container-linux/kubernetes/workers/outputs.tf
Normal file
@ -0,0 +1,4 @@
|
||||
output "ingress_dns_name" {
|
||||
value = "${aws_lb.ingress.dns_name}"
|
||||
description = "DNS name of the network load balancer for distributing traffic to Ingress controllers"
|
||||
}
|
79
aws/container-linux/kubernetes/workers/variables.tf
Normal file
79
aws/container-linux/kubernetes/workers/variables.tf
Normal file
@ -0,0 +1,79 @@
|
||||
variable "name" {
|
||||
type = "string"
|
||||
description = "Unique name instance group"
|
||||
}
|
||||
|
||||
variable "vpc_id" {
|
||||
type = "string"
|
||||
description = "ID of the VPC for creating instances"
|
||||
}
|
||||
|
||||
variable "subnet_ids" {
|
||||
type = "list"
|
||||
description = "List of subnet IDs for creating instances"
|
||||
}
|
||||
|
||||
variable "security_groups" {
|
||||
type = "list"
|
||||
description = "List of security group IDs"
|
||||
}
|
||||
|
||||
# instances
|
||||
|
||||
variable "count" {
|
||||
type = "string"
|
||||
default = "1"
|
||||
description = "Number of instances"
|
||||
}
|
||||
|
||||
variable "instance_type" {
|
||||
type = "string"
|
||||
default = "t2.small"
|
||||
description = "EC2 instance type"
|
||||
}
|
||||
|
||||
variable "os_channel" {
|
||||
type = "string"
|
||||
default = "stable"
|
||||
description = "Container Linux AMI channel (stable, beta, alpha)"
|
||||
}
|
||||
|
||||
variable "disk_size" {
|
||||
type = "string"
|
||||
default = "40"
|
||||
description = "Size of the disk in GB"
|
||||
}
|
||||
|
||||
# configuration
|
||||
|
||||
variable "kubeconfig" {
|
||||
type = "string"
|
||||
description = "Generated Kubelet kubeconfig"
|
||||
}
|
||||
|
||||
variable "ssh_authorized_key" {
|
||||
type = "string"
|
||||
description = "SSH public key for user 'core'"
|
||||
}
|
||||
|
||||
variable "service_cidr" {
|
||||
description = <<EOD
|
||||
CIDR IPv4 range to assign Kubernetes services.
|
||||
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for kube-dns.
|
||||
EOD
|
||||
|
||||
type = "string"
|
||||
default = "10.3.0.0/16"
|
||||
}
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
||||
variable "clc_snippets" {
|
||||
type = "list"
|
||||
description = "Container Linux Config snippets"
|
||||
default = []
|
||||
}
|
75
aws/container-linux/kubernetes/workers/workers.tf
Normal file
75
aws/container-linux/kubernetes/workers/workers.tf
Normal file
@ -0,0 +1,75 @@
|
||||
# Workers AutoScaling Group
|
||||
resource "aws_autoscaling_group" "workers" {
|
||||
name = "${var.name}-worker ${aws_launch_configuration.worker.name}"
|
||||
|
||||
# count
|
||||
desired_capacity = "${var.count}"
|
||||
min_size = "${var.count}"
|
||||
max_size = "${var.count + 2}"
|
||||
default_cooldown = 30
|
||||
health_check_grace_period = 30
|
||||
|
||||
# network
|
||||
vpc_zone_identifier = ["${var.subnet_ids}"]
|
||||
|
||||
# template
|
||||
launch_configuration = "${aws_launch_configuration.worker.name}"
|
||||
|
||||
# target groups to which instances should be added
|
||||
target_group_arns = [
|
||||
"${aws_lb_target_group.workers-http.id}",
|
||||
"${aws_lb_target_group.workers-https.id}",
|
||||
]
|
||||
|
||||
lifecycle {
|
||||
# override the default destroy and replace update behavior
|
||||
create_before_destroy = true
|
||||
}
|
||||
|
||||
tags = [{
|
||||
key = "Name"
|
||||
value = "${var.name}-worker"
|
||||
propagate_at_launch = true
|
||||
}]
|
||||
}
|
||||
|
||||
# Worker template
|
||||
resource "aws_launch_configuration" "worker" {
|
||||
image_id = "${data.aws_ami.coreos.image_id}"
|
||||
instance_type = "${var.instance_type}"
|
||||
|
||||
user_data = "${data.ct_config.worker_ign.rendered}"
|
||||
|
||||
# storage
|
||||
root_block_device {
|
||||
volume_type = "standard"
|
||||
volume_size = "${var.disk_size}"
|
||||
}
|
||||
|
||||
# network
|
||||
security_groups = ["${var.security_groups}"]
|
||||
|
||||
lifecycle {
|
||||
// Override the default destroy and replace update behavior
|
||||
create_before_destroy = true
|
||||
ignore_changes = ["image_id"]
|
||||
}
|
||||
}
|
||||
|
||||
# Worker Container Linux Config
|
||||
data "template_file" "worker_config" {
|
||||
template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
|
||||
|
||||
vars = {
|
||||
kubeconfig = "${indent(10, var.kubeconfig)}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
}
|
||||
|
||||
data "ct_config" "worker_ign" {
|
||||
content = "${data.template_file.worker_config.rendered}"
|
||||
pretty_print = false
|
||||
snippets = ["${var.clc_snippets}"]
|
||||
}
|
@ -1,4 +1,4 @@
|
||||
# Typhoon
|
||||
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
|
||||
|
||||
Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
@ -9,12 +9,12 @@ Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
|
||||
|
||||
## Features
|
||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||
|
||||
* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Kubernetes v1.9.6 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
|
||||
## Docs
|
||||
|
||||
|
@ -1,13 +1,14 @@
|
||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||
module "bootkube" {
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=88b361207d42ec3121930a4add6b64ba7cf18360"
|
||||
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${var.k8s_domain_name}"]
|
||||
etcd_servers = ["${var.controller_domains}"]
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = "${var.network_mtu}"
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${var.k8s_domain_name}"]
|
||||
etcd_servers = ["${var.controller_domains}"]
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = "${var.network_mtu}"
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
|
@ -7,7 +7,7 @@ systemd:
|
||||
- name: 40-etcd-cluster.conf
|
||||
contents: |
|
||||
[Service]
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.0"
|
||||
Environment="ETCD_IMAGE_TAG=v3.3.2"
|
||||
Environment="ETCD_NAME=${etcd_name}"
|
||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
|
||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
|
||||
@ -50,10 +50,11 @@ systemd:
|
||||
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
|
||||
[Install]
|
||||
RequiredBy=kubelet.service
|
||||
RequiredBy=etcd-member.service
|
||||
- name: kubelet.service
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -73,6 +74,7 @@ systemd:
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/cni
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
|
||||
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
|
||||
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
ExecStart=/usr/lib/coreos/kubelet-wrapper \
|
||||
@ -80,7 +82,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--hostname-override=${domain_name} \
|
||||
@ -88,8 +90,10 @@ systemd:
|
||||
--lock-file=/var/run/lock/kubelet.lock \
|
||||
--network-plugin=cni \
|
||||
--node-labels=node-role.kubernetes.io/master \
|
||||
--node-labels=node-role.kubernetes.io/controller="true" \
|
||||
--pod-manifest-path=/etc/kubernetes/manifests \
|
||||
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule
|
||||
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
|
||||
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
|
||||
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
@ -114,7 +118,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.6
|
||||
- path: /etc/hostname
|
||||
filesystem: root
|
||||
mode: 0644
|
||||
@ -139,11 +143,9 @@ storage:
|
||||
# Wrapper for bootkube start
|
||||
set -e
|
||||
# Move experimental manifests
|
||||
[ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
[ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
|
||||
[ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
|
||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.11.0}"
|
||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
||||
exec /usr/bin/rkt run \
|
||||
--trust-keys-from-https \
|
||||
|
@ -30,7 +30,7 @@ systemd:
|
||||
- name: kubelet.service
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -50,6 +50,7 @@ systemd:
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/cni
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
|
||||
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
|
||||
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
ExecStart=/usr/lib/coreos/kubelet-wrapper \
|
||||
@ -57,7 +58,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--hostname-override=${domain_name} \
|
||||
@ -65,7 +66,8 @@ systemd:
|
||||
--lock-file=/var/run/lock/kubelet.lock \
|
||||
--network-plugin=cni \
|
||||
--node-labels=node-role.kubernetes.io/node \
|
||||
--pod-manifest-path=/etc/kubernetes/manifests
|
||||
--pod-manifest-path=/etc/kubernetes/manifests \
|
||||
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
|
||||
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
@ -80,7 +82,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.6
|
||||
- path: /etc/hostname
|
||||
filesystem: root
|
||||
mode: 0644
|
||||
|
@ -3,7 +3,7 @@ resource "matchbox_group" "container-linux-install" {
|
||||
count = "${length(var.controller_names) + length(var.worker_names)}"
|
||||
|
||||
name = "${format("container-linux-install-%s", element(concat(var.controller_names, var.worker_names), count.index))}"
|
||||
profile = "${var.cached_install == "true" ? matchbox_profile.cached-container-linux-install.name : matchbox_profile.container-linux-install.name}"
|
||||
profile = "${var.cached_install == "true" ? element(matchbox_profile.cached-container-linux-install.*.name, count.index) : element(matchbox_profile.container-linux-install.*.name, count.index)}"
|
||||
|
||||
selector {
|
||||
mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}"
|
||||
|
@ -1,6 +1,8 @@
|
||||
// Container Linux Install profile (from release.core-os.net)
|
||||
resource "matchbox_profile" "container-linux-install" {
|
||||
name = "container-linux-install"
|
||||
count = "${length(var.controller_names) + length(var.worker_names)}"
|
||||
name = "${format("%s-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}"
|
||||
|
||||
kernel = "http://${var.container_linux_channel}.release.core-os.net/amd64-usr/${var.container_linux_version}/coreos_production_pxe.vmlinuz"
|
||||
|
||||
initrd = [
|
||||
@ -8,6 +10,7 @@ resource "matchbox_profile" "container-linux-install" {
|
||||
]
|
||||
|
||||
args = [
|
||||
"initrd=coreos_production_pxe_image.cpio.gz",
|
||||
"coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
|
||||
"coreos.first_boot=yes",
|
||||
"console=tty0",
|
||||
@ -15,10 +18,12 @@ resource "matchbox_profile" "container-linux-install" {
|
||||
"${var.kernel_args}",
|
||||
]
|
||||
|
||||
container_linux_config = "${data.template_file.container-linux-install-config.rendered}"
|
||||
container_linux_config = "${element(data.template_file.container-linux-install-configs.*.rendered, count.index)}"
|
||||
}
|
||||
|
||||
data "template_file" "container-linux-install-config" {
|
||||
data "template_file" "container-linux-install-configs" {
|
||||
count = "${length(var.controller_names) + length(var.worker_names)}"
|
||||
|
||||
template = "${file("${path.module}/cl/container-linux-install.yaml.tmpl")}"
|
||||
|
||||
vars {
|
||||
@ -36,7 +41,9 @@ data "template_file" "container-linux-install-config" {
|
||||
// Container Linux Install profile (from matchbox /assets cache)
|
||||
// Note: Admin must have downloaded container_linux_version into matchbox assets.
|
||||
resource "matchbox_profile" "cached-container-linux-install" {
|
||||
name = "cached-container-linux-install"
|
||||
count = "${length(var.controller_names) + length(var.worker_names)}"
|
||||
name = "${format("%s-cached-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}"
|
||||
|
||||
kernel = "/assets/coreos/${var.container_linux_version}/coreos_production_pxe.vmlinuz"
|
||||
|
||||
initrd = [
|
||||
@ -44,6 +51,7 @@ resource "matchbox_profile" "cached-container-linux-install" {
|
||||
]
|
||||
|
||||
args = [
|
||||
"initrd=coreos_production_pxe_image.cpio.gz",
|
||||
"coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
|
||||
"coreos.first_boot=yes",
|
||||
"console=tty0",
|
||||
@ -51,10 +59,12 @@ resource "matchbox_profile" "cached-container-linux-install" {
|
||||
"${var.kernel_args}",
|
||||
]
|
||||
|
||||
container_linux_config = "${data.template_file.cached-container-linux-install-config.rendered}"
|
||||
container_linux_config = "${element(data.template_file.cached-container-linux-install-configs.*.rendered, count.index)}"
|
||||
}
|
||||
|
||||
data "template_file" "cached-container-linux-install-config" {
|
||||
data "template_file" "cached-container-linux-install-configs" {
|
||||
count = "${length(var.controller_names) + length(var.worker_names)}"
|
||||
|
||||
template = "${file("${path.module}/cl/container-linux-install.yaml.tmpl")}"
|
||||
|
||||
vars {
|
||||
@ -82,11 +92,12 @@ data "template_file" "controller-configs" {
|
||||
template = "${file("${path.module}/cl/controller.yaml.tmpl")}"
|
||||
|
||||
vars {
|
||||
domain_name = "${element(var.controller_domains, count.index)}"
|
||||
etcd_name = "${element(var.controller_names, count.index)}"
|
||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
|
||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
domain_name = "${element(var.controller_domains, count.index)}"
|
||||
etcd_name = "${element(var.controller_names, count.index)}"
|
||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
|
||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
|
||||
# Terraform evaluates both sides regardless and element cannot be used on 0 length lists
|
||||
networkd_content = "${length(var.controller_networkds) == 0 ? "" : element(concat(var.controller_networkds, list("")), count.index)}"
|
||||
@ -106,9 +117,10 @@ data "template_file" "worker-configs" {
|
||||
template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
|
||||
|
||||
vars {
|
||||
domain_name = "${element(var.worker_domains, count.index)}"
|
||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
domain_name = "${element(var.worker_domains, count.index)}"
|
||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
|
||||
# Terraform evaluates both sides regardless and element cannot be used on 0 length lists
|
||||
networkd_content = "${length(var.worker_networkds) == 0 ? "" : element(concat(var.worker_networkds, list("")), count.index)}"
|
||||
|
@ -24,7 +24,7 @@ variable "ssh_authorized_key" {
|
||||
}
|
||||
|
||||
# Machines
|
||||
# Terraform's crude "type system" does properly support lists of maps so we do this.
|
||||
# Terraform's crude "type system" does not properly support lists of maps so we do this.
|
||||
|
||||
variable "controller_names" {
|
||||
type = "list"
|
||||
@ -92,6 +92,12 @@ EOD
|
||||
|
||||
# optional
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
||||
variable "cached_install" {
|
||||
type = "string"
|
||||
default = "false"
|
||||
|
@ -30,7 +30,7 @@ systemd:
|
||||
- name: kubelet.service
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -50,6 +50,7 @@ systemd:
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/cni
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
|
||||
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
|
||||
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
ExecStart=/usr/lib/coreos/kubelet-wrapper \
|
||||
@ -57,7 +58,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns={{.k8s_dns_service_ip}} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain={{.cluster_domain_suffix}} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--hostname-override={{.domain_name}} \
|
||||
@ -65,7 +66,8 @@ systemd:
|
||||
--lock-file=/var/run/lock/kubelet.lock \
|
||||
--network-plugin=cni \
|
||||
--node-labels=node-role.kubernetes.io/node \
|
||||
--pod-manifest-path=/etc/kubernetes/manifests
|
||||
--pod-manifest-path=/etc/kubernetes/manifests \
|
||||
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
|
||||
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
@ -96,7 +98,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.6
|
||||
- path: /etc/hostname
|
||||
filesystem: root
|
||||
mode: 0644
|
||||
|
@ -12,10 +12,8 @@ resource "matchbox_group" "workers" {
|
||||
domain_name = "${element(var.worker_domains, count.index)}"
|
||||
etcd_endpoints = "${join(",", formatlist("%s:2379", var.controller_domains))}"
|
||||
|
||||
# TODO
|
||||
etcd_on_host = "true"
|
||||
k8s_etcd_service_ip = "10.3.0.15"
|
||||
k8s_dns_service_ip = "${var.kube_dns_service_ip}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
k8s_dns_service_ip = "${var.kube_dns_service_ip}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
}
|
||||
}
|
||||
|
@ -8,6 +8,7 @@ resource "matchbox_profile" "bootkube-worker-pxe" {
|
||||
]
|
||||
|
||||
args = [
|
||||
"initrd=coreos_production_pxe_image.cpio.gz",
|
||||
"coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
|
||||
"coreos.first_boot=yes",
|
||||
"console=tty0",
|
||||
|
@ -64,3 +64,9 @@ variable "kernel_args" {
|
||||
"root=/dev/sda1",
|
||||
]
|
||||
}
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Typhoon
|
||||
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
|
||||
|
||||
Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
@ -9,12 +9,12 @@ Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
|
||||
|
||||
## Features
|
||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||
|
||||
* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Kubernetes v1.9.6 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
|
||||
## Docs
|
||||
|
||||
|
@ -1,13 +1,14 @@
|
||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||
module "bootkube" {
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=88b361207d42ec3121930a4add6b64ba7cf18360"
|
||||
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||
etcd_servers = "${digitalocean_record.etcds.*.fqdn}"
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = 1440
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||
etcd_servers = "${digitalocean_record.etcds.*.fqdn}"
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = 1440
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
|
@ -7,7 +7,7 @@ systemd:
|
||||
- name: 40-etcd-cluster.conf
|
||||
contents: |
|
||||
[Service]
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.0"
|
||||
Environment="ETCD_IMAGE_TAG=v3.3.2"
|
||||
Environment="ETCD_NAME=${etcd_name}"
|
||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
||||
@ -50,10 +50,11 @@ systemd:
|
||||
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
|
||||
[Install]
|
||||
RequiredBy=kubelet.service
|
||||
RequiredBy=etcd-member.service
|
||||
- name: kubelet.service
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Requires=coreos-metadata.service
|
||||
After=coreos-metadata.service
|
||||
Wants=rpc-statd.service
|
||||
@ -76,6 +77,7 @@ systemd:
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/cni
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
|
||||
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
|
||||
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
ExecStart=/usr/lib/coreos/kubelet-wrapper \
|
||||
@ -83,7 +85,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
|
||||
@ -91,8 +93,10 @@ systemd:
|
||||
--lock-file=/var/run/lock/kubelet.lock \
|
||||
--network-plugin=cni \
|
||||
--node-labels=node-role.kubernetes.io/master \
|
||||
--node-labels=node-role.kubernetes.io/controller="true" \
|
||||
--pod-manifest-path=/etc/kubernetes/manifests \
|
||||
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule
|
||||
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
|
||||
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
|
||||
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
@ -119,7 +123,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.6
|
||||
- path: /etc/sysctl.d/max-user-watches.conf
|
||||
filesystem: root
|
||||
contents:
|
||||
@ -138,11 +142,9 @@ storage:
|
||||
# Wrapper for bootkube start
|
||||
set -e
|
||||
# Move experimental manifests
|
||||
[ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
[ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
|
||||
[ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
|
||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.11.0}"
|
||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
||||
exec /usr/bin/rkt run \
|
||||
--trust-keys-from-https \
|
||||
|
@ -30,7 +30,7 @@ systemd:
|
||||
- name: kubelet.service
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Requires=coreos-metadata.service
|
||||
After=coreos-metadata.service
|
||||
Wants=rpc-statd.service
|
||||
@ -53,6 +53,7 @@ systemd:
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/cni
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
|
||||
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
|
||||
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
ExecStart=/usr/lib/coreos/kubelet-wrapper \
|
||||
@ -60,7 +61,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
|
||||
@ -68,7 +69,8 @@ systemd:
|
||||
--lock-file=/var/run/lock/kubelet.lock \
|
||||
--network-plugin=cni \
|
||||
--node-labels=node-role.kubernetes.io/node \
|
||||
--pod-manifest-path=/etc/kubernetes/manifests
|
||||
--pod-manifest-path=/etc/kubernetes/manifests \
|
||||
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
|
||||
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
@ -94,7 +96,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.6
|
||||
- path: /etc/sysctl.d/max-user-watches.conf
|
||||
filesystem: root
|
||||
contents:
|
||||
@ -112,7 +114,7 @@ storage:
|
||||
--volume config,kind=host,source=/etc/kubernetes \
|
||||
--mount volume=config,target=/etc/kubernetes \
|
||||
--insecure-options=image \
|
||||
docker://gcr.io/google_containers/hyperkube:v1.8.3 \
|
||||
docker://gcr.io/google_containers/hyperkube:v1.9.6 \
|
||||
--net=host \
|
||||
--dns=host \
|
||||
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
|
||||
|
@ -45,7 +45,7 @@ resource "digitalocean_droplet" "controllers" {
|
||||
private_networking = true
|
||||
|
||||
user_data = "${element(data.ct_config.controller_ign.*.rendered, count.index)}"
|
||||
ssh_keys = "${var.ssh_fingerprints}"
|
||||
ssh_keys = ["${var.ssh_fingerprints}"]
|
||||
|
||||
tags = [
|
||||
"${digitalocean_tag.controllers.id}",
|
||||
@ -69,8 +69,9 @@ data "template_file" "controller_config" {
|
||||
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
|
||||
|
||||
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
}
|
||||
|
||||
@ -89,4 +90,6 @@ data "ct_config" "controller_ign" {
|
||||
count = "${var.controller_count}"
|
||||
content = "${element(data.template_file.controller_config.*.rendered, count.index)}"
|
||||
pretty_print = false
|
||||
|
||||
snippets = ["${var.controller_clc_snippets}"]
|
||||
}
|
||||
|
@ -22,12 +22,12 @@ resource "digitalocean_firewall" "rules" {
|
||||
},
|
||||
{
|
||||
protocol = "udp"
|
||||
port_range = "all"
|
||||
port_range = "1-65535"
|
||||
source_tags = ["${digitalocean_tag.controllers.name}", "${digitalocean_tag.workers.name}"]
|
||||
},
|
||||
{
|
||||
protocol = "tcp"
|
||||
port_range = "all"
|
||||
port_range = "1-65535"
|
||||
source_tags = ["${digitalocean_tag.controllers.name}", "${digitalocean_tag.workers.name}"]
|
||||
},
|
||||
]
|
||||
@ -35,17 +35,18 @@ resource "digitalocean_firewall" "rules" {
|
||||
# allow all outbound traffic
|
||||
outbound_rule = [
|
||||
{
|
||||
protocol = "icmp"
|
||||
protocol = "tcp"
|
||||
port_range = "1-65535"
|
||||
destination_addresses = ["0.0.0.0/0", "::/0"]
|
||||
},
|
||||
{
|
||||
protocol = "udp"
|
||||
port_range = "all"
|
||||
port_range = "1-65535"
|
||||
destination_addresses = ["0.0.0.0/0", "::/0"]
|
||||
},
|
||||
{
|
||||
protocol = "tcp"
|
||||
port_range = "all"
|
||||
protocol = "icmp"
|
||||
port_range = "1-65535"
|
||||
destination_addresses = ["0.0.0.0/0", "::/0"]
|
||||
},
|
||||
]
|
||||
|
@ -5,7 +5,7 @@ terraform {
|
||||
}
|
||||
|
||||
provider "digitalocean" {
|
||||
version = "0.1.2"
|
||||
version = "~> 0.1.2"
|
||||
}
|
||||
|
||||
provider "local" {
|
||||
|
@ -27,8 +27,8 @@ variable "controller_count" {
|
||||
|
||||
variable "controller_type" {
|
||||
type = "string"
|
||||
default = "2gb"
|
||||
description = "Digital Ocean droplet size (e.g. 2gb (min), 4gb, 8gb)."
|
||||
default = "s-2vcpu-2gb"
|
||||
description = "Digital Ocean droplet size (e.g. s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb)."
|
||||
}
|
||||
|
||||
variable "worker_count" {
|
||||
@ -39,8 +39,8 @@ variable "worker_count" {
|
||||
|
||||
variable "worker_type" {
|
||||
type = "string"
|
||||
default = "512mb"
|
||||
description = "Digital Ocean droplet size (e.g. 512mb, 1gb, 2gb, 4gb)"
|
||||
default = "s-1vcpu-1gb"
|
||||
description = "Digital Ocean droplet size (e.g. s-1vcpu-1gb, s-1vcpu-2gb, s-2vcpu-2gb)"
|
||||
}
|
||||
|
||||
variable "ssh_fingerprints" {
|
||||
@ -48,6 +48,18 @@ variable "ssh_fingerprints" {
|
||||
description = "SSH public key fingerprints. (e.g. see `ssh-add -l -E md5`)"
|
||||
}
|
||||
|
||||
variable "controller_clc_snippets" {
|
||||
type = "list"
|
||||
description = "Controller Container Linux Config snippets"
|
||||
default = []
|
||||
}
|
||||
|
||||
variable "worker_clc_snippets" {
|
||||
type = "list"
|
||||
description = "Worker Container Linux Config snippets"
|
||||
default = []
|
||||
}
|
||||
|
||||
# bootkube assets
|
||||
|
||||
variable "asset_dir" {
|
||||
@ -76,3 +88,9 @@ EOD
|
||||
type = "string"
|
||||
default = "10.3.0.0/16"
|
||||
}
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
@ -26,7 +26,7 @@ resource "digitalocean_droplet" "workers" {
|
||||
private_networking = true
|
||||
|
||||
user_data = "${data.ct_config.worker_ign.rendered}"
|
||||
ssh_keys = "${var.ssh_fingerprints}"
|
||||
ssh_keys = ["${var.ssh_fingerprints}"]
|
||||
|
||||
tags = [
|
||||
"${digitalocean_tag.workers.id}",
|
||||
@ -43,12 +43,13 @@ data "template_file" "worker_config" {
|
||||
template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
|
||||
|
||||
vars = {
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
k8s_etcd_service_ip = "${cidrhost(var.service_cidr, 15)}"
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
}
|
||||
|
||||
data "ct_config" "worker_ign" {
|
||||
content = "${data.template_file.worker_config.rendered}"
|
||||
pretty_print = false
|
||||
snippets = ["${var.worker_clc_snippets}"]
|
||||
}
|
||||
|
@ -12,13 +12,13 @@ kubectl apply -f addons/cluo -R
|
||||
|
||||
## Usage
|
||||
|
||||
`update-agent` runs as a DaemonSet and annotates a node when `update-engine.service` indiates an update has been installed and a reboot is needed. It also adds additional labels and annotations to nodes.
|
||||
`update-agent` runs as a DaemonSet and annotates a node when `update-engine.service` indicates an update has been installed and a reboot is needed. It also adds additional labels and annotations to nodes.
|
||||
|
||||
```
|
||||
$ kubectl get nodes --show-labels
|
||||
...
|
||||
container-linux-update.v1.coreos.com/group=stable
|
||||
container-linux-update.v1.coreos.com/version=1465.6.0
|
||||
container-linux-update.v1.coreos.com/version=1632.3.0
|
||||
```
|
||||
|
||||
`update-operator` ensures one node reboots at a time and that pods are drained prior to reboot.
|
||||
|
@ -1,24 +0,0 @@
|
||||
# Kubernetes Dashboard
|
||||
|
||||
The Kubernetes [Dashboard](https://github.com/kubernetes/dashboard) provides a web UI to manage a Kubernetes cluster for those who prefer an alternative to `kubectl`.
|
||||
|
||||
## Create
|
||||
|
||||
Create the dashboard deployment and service.
|
||||
|
||||
```
|
||||
kubectl apply -f addons/dashboard -R
|
||||
```
|
||||
|
||||
## Access
|
||||
|
||||
Use `kubectl` to authenticate to the apiserver and create a local port forward to the remote port on the dashboard pod.
|
||||
|
||||
```sh
|
||||
kubectl get pods -n kube-system
|
||||
kubectl port-forward POD [LOCAL_PORT:]REMOTE_PORT
|
||||
kubectl port-forward kubernetes-dashboard-id 9090 -n kube-system
|
||||
```
|
||||
|
||||
!!! tip
|
||||
If you'd like to expose the Dashboard via Ingress and add authentication, use a suitable OAuth2 proxy sidecar and pick your favorite OAuth2 provider.
|
20
docs/addons/grafana.md
Normal file
20
docs/addons/grafana.md
Normal file
@ -0,0 +1,20 @@
|
||||
## Grafana
|
||||
|
||||
Grafana can be used to build dashboards and visualizations that use Prometheus as the datasource. Create the grafana deployment and service.
|
||||
|
||||
```
|
||||
kubectl apply -f addons/grafana -R
|
||||
```
|
||||
|
||||
Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Grafana pod.
|
||||
|
||||
```
|
||||
kubectl port-forward grafana-POD-ID 8080 -n monitoring
|
||||
```
|
||||
|
||||
Visit [127.0.0.1:8080](http://127.0.0.1:8080) to view the bundled dashboards.
|
||||
|
||||

|
||||

|
||||

|
||||
|
@ -1,6 +1,6 @@
|
||||
# Heapster
|
||||
|
||||
[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command and Kubernetes dashbard graphs.
|
||||
[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command and Kubernetes dashboard graphs.
|
||||
|
||||
## Create
|
||||
|
||||
|
@ -6,6 +6,5 @@ Every Typhoon cluster is verified to work well with several post-install addons.
|
||||
* Nginx [Ingress Controller](ingress.md)
|
||||
* [Heapster](heapster.md)
|
||||
* [Prometheus](prometheus.md)
|
||||
* [Grafana](prometheus.md#grafana)
|
||||
* Kubernetes [Dashboard](dashboard.md)
|
||||
* [Grafana](grafana.md)
|
||||
|
||||
|
@ -20,7 +20,7 @@ On Kubernetes clusters, Prometheus is run as a Deployment, configured with a Con
|
||||
kubectl apply -f addons/prometheus -R
|
||||
```
|
||||
|
||||
The ConfigMap configures Prometheus to target apiserver endpoints, node metrics, cAdvisor metrics, and exporters. By default, data is kept in an `emptyDir` so it is persisted until the pod is rescheduled.
|
||||
The ConfigMap configures Prometheus to discover apiservers, kubelets, cAdvisor, services, endpoints, and exporters. By default, data is kept in an `emptyDir` so it is persisted until the pod is rescheduled.
|
||||
|
||||
### Exporters
|
||||
|
||||
@ -32,7 +32,7 @@ Exporters expose metrics for 3rd-party systems that don't natively expose Promet
|
||||
|
||||
### Queries and Alerts
|
||||
|
||||
Prometheus provides a simplistic UI for querying metrics and viewing alerts. Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Prometheus pod.
|
||||
Prometheus provides a basic UI for querying metrics and viewing alerts. Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Prometheus pod.
|
||||
|
||||
```
|
||||
kubectl get pods -n monitoring
|
||||
@ -47,21 +47,4 @@ Visit [127.0.0.1:9090](http://127.0.0.1:9090) to query [expressions](http://127.
|
||||
<br/>
|
||||

|
||||
|
||||
## Grafana
|
||||
|
||||
Grafana can be used to build dashboards and rich visualizations that use Prometheus as the datasource. Create the grafana deployment and service.
|
||||
|
||||
```
|
||||
kubectl apply -f addons/grafana -R
|
||||
```
|
||||
|
||||
Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Grafana pod.
|
||||
|
||||
```
|
||||
kubectl port-forward grafana-POD-ID 8080 -n monitoring
|
||||
```
|
||||
|
||||
Visit [127.0.0.1:8080](http://127.0.0.1:8080), add the prometheus data-source (http://prometheus.monitoring.svc.cluster.local), and import your desired dashboard (e.g. 315).
|
||||
|
||||

|
||||
|
||||
Use [Grafana](/addons/grafana.md) to view or build dashboards that use Prometheus as the datasource.
|
||||
|
@ -1,6 +1,130 @@
|
||||
# Customization
|
||||
|
||||
To customize clusters in ways that aren't supported by input variables, fork the repo and make changes to the Terraform module. Stay tuned for improvements to this strategy since it is beneficial to stay close to this upstream.
|
||||
Typhoon provides minimal Kubernetes clusters with defaults we recommend for production. Terraform variables provide easy to use and supported customizations for clusters. Advanced options are available for customizing the architecture or hosts.
|
||||
|
||||
## Variables
|
||||
|
||||
Typhoon modules accept Terraform input variables for customizing clusters in meritorious ways (e.g. `worker_count`, etc). Variables are carefully considered to provide essentials, while limiting complexity and test matrix burden. See each platform's tutorial for options.
|
||||
|
||||
## Addons
|
||||
|
||||
Clusters are kept to a minimal Kubernetes control plane by offering components like Nginx Ingress Controller, Prometheus, Grafana, and Heapster as optional post-install [addons](https://github.com/poseidon/typhoon/tree/master/addons). Customize addons by modifying a copy of our addon manifests.
|
||||
|
||||
## Hosts
|
||||
|
||||
### Container Linux
|
||||
|
||||
!!! danger
|
||||
Container Linux Configs provide powerful host customization abilities. You are responsible for the additional configs defined for hosts.
|
||||
|
||||
Container Linux Configs (CLCs) declare how a Container Linux instance's disk should be provisioned on first boot from disk. CLCs define disk partitions, filesystems, files, systemd units, dropins, networkd configs, mount units, raid arrays, and users. Typhoon creates controller and worker instances with base Container Linux Configs to create a minimal, secure Kubernetes cluster on each platform.
|
||||
|
||||
Typhoon AWS, Google Cloud, and Digital Ocean give users the ability to provide CLC *snippets* - valid Container Linux Configs that are validated and additively merged into the Typhoon base config during `terraform plan`. This allows advanced host customizations and experimentation.
|
||||
|
||||
#### Examples
|
||||
|
||||
Container Linux [docs](https://coreos.com/os/docs/latest/clc-examples.html) show many simple config examples. Ensure a file `/opt/hello` is created with permissions 0644.
|
||||
|
||||
```
|
||||
# custom-files
|
||||
storage:
|
||||
files:
|
||||
- path: /opt/hello
|
||||
filesystem: root
|
||||
contents:
|
||||
inline: |
|
||||
Hello World
|
||||
mode: 0644
|
||||
```
|
||||
|
||||
Ensure a systemd unit `hello.service` is created and a dropin `50-etcd-cluster.conf` is added for `etcd-member.service`.
|
||||
|
||||
```
|
||||
# custom-units
|
||||
systemd:
|
||||
units:
|
||||
- name: hello.service
|
||||
enable: true
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Hello World
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/bin/echo Hello World!
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
- name: etcd-member.service
|
||||
enable: true
|
||||
dropins:
|
||||
- name: 50-etcd-cluster.conf
|
||||
contents: |
|
||||
Environment="ETCD_LOG_PACKAGE_LEVELS=etcdserver=WARNING,security=DEBUG"
|
||||
```
|
||||
|
||||
#### Specification
|
||||
|
||||
View the Container Linux Config [format](https://coreos.com/os/docs/1576.4.0/configuration.html) to read about each field.
|
||||
|
||||
#### Usage
|
||||
|
||||
Write Container Linux Configs *snippets* as files in the repository where you keep Terraform configs for clusters (perhaps in a `clc` or `snippets` subdirectory). You may organize snippets in multiple files as desired, provided they are each valid.
|
||||
|
||||
Define an [AWS](https://typhoon.psdn.io/aws/#cluster), [Google Cloud](https://typhoon.psdn.io/google-cloud/#cluster), or [Digital Ocean](https://typhoon.psdn.io/digital-ocean/#cluster) cluster and fill in the optional `controller_clc_snippets` or `worker_clc_snippets` fields.
|
||||
|
||||
```
|
||||
module "digital-ocean-nemo" {
|
||||
...
|
||||
|
||||
controller_count = 1
|
||||
worker_count = 2
|
||||
controller_clc_snippets = [
|
||||
"${file("./custom-files")}",
|
||||
"${file("./custom-units")}",
|
||||
]
|
||||
worker_clc_snippets = [
|
||||
"${file("./custom-files")}",
|
||||
"${file("./custom-units")}",
|
||||
]
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Plan the resources to be created.
|
||||
|
||||
```
|
||||
$ terraform plan
|
||||
Plan: 54 to add, 0 to change, 0 to destroy.
|
||||
```
|
||||
|
||||
Most syntax errors in CLCs can be caught during planning. For example, mangle the indentation in one of the CLC files:
|
||||
|
||||
```
|
||||
$ terraform plan
|
||||
...
|
||||
error parsing Container Linux Config: error: yaml: line 3: did not find expected '-' indicator
|
||||
```
|
||||
|
||||
Undo the mangle. Apply the changes to create the cluster per the tutorial.
|
||||
|
||||
```
|
||||
$ terraform apply
|
||||
```
|
||||
|
||||
Container Linux Configs (and the CoreOS Ignition system) create immutable infrastructure. Disk provisioning is performed only on first boot from disk. That means if you change a snippet used by an instance, Terraform will (correctly) try to destroy and recreate that instance. Be careful!
|
||||
|
||||
!!! danger
|
||||
Destroying and recreating controller instances is destructive! etcd runs on controller instances and stores data there. Do not modify controller snippets. See [blue/green](https://typhoon.psdn.io/topics/maintenance/#upgrades) clusters.
|
||||
|
||||
## Architecture
|
||||
|
||||
To customize clusters in ways that aren't supported by input variables, fork Typhoon and maintain a repository with customizations. Reference the repository by changing the username.
|
||||
|
||||
```
|
||||
module "digital-ocean-nemo" {
|
||||
source = "git::https://github.com/USERNAME/typhoon//digital-ocean/container-linux/kubernetes?ref=myspecialcase"
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
To customize lower-level Kubernetes control plane bootstrapping, see the [poseidon/bootkube-terraform](https://github.com/poseidon/bootkube-terraform) Terraform module.
|
||||
|
||||
|
6
docs/advanced/overview.md
Normal file
6
docs/advanced/overview.md
Normal file
@ -0,0 +1,6 @@
|
||||
# Advanced
|
||||
|
||||
Typhoon clusters offer several advanced features for skilled users.
|
||||
|
||||
* [Customization](customization.md)
|
||||
* [Worker Pools](worker-pools.md)
|
149
docs/advanced/worker-pools.md
Normal file
149
docs/advanced/worker-pools.md
Normal file
@ -0,0 +1,149 @@
|
||||
# Worker Pools
|
||||
|
||||
Typhoon AWS and Google Cloud allow additional groups of workers to be defined and joined to a cluster. For example, add worker pools of instances with different types, disk sizes, Container Linux channels, or preemptibility modes.
|
||||
|
||||
Internal Terraform Modules:
|
||||
|
||||
* `aws/container-linux/kubernetes/workers`
|
||||
* `google-cloud/container-linux/kubernetes/workers`
|
||||
|
||||
## AWS
|
||||
|
||||
Create a cluster following the AWS [tutorial](../aws.md#cluster). Define a worker pool using the AWS internal `workers` module.
|
||||
|
||||
```tf
|
||||
module "tempest-worker-pool" {
|
||||
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes/workers?ref=v1.9.6"
|
||||
|
||||
providers = {
|
||||
aws = "aws.default"
|
||||
}
|
||||
|
||||
# AWS
|
||||
vpc_id = "${module.aws-tempest.vpc_id}"
|
||||
subnet_ids = "${module.aws-tempest.subnet_ids}"
|
||||
security_groups = "${module.aws-tempest.worker_security_groups}"
|
||||
|
||||
# configuration
|
||||
name = "tempest-worker-pool"
|
||||
kubeconfig = "${module.aws-tempest.kubeconfig}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
|
||||
count = 2
|
||||
instance_type = "m5.large"
|
||||
os_channel = "beta"
|
||||
}
|
||||
```
|
||||
|
||||
Apply the change.
|
||||
|
||||
```
|
||||
terraform apply
|
||||
```
|
||||
|
||||
Verify an auto-scaling group of workers join the cluster within a few minutes.
|
||||
|
||||
### Variables
|
||||
|
||||
The AWS internal `workers` module supports a number of [variables](https://github.com/poseidon/typhoon/blob/master/aws/container-linux/kubernetes/workers/variables.tf).
|
||||
|
||||
#### Required
|
||||
|
||||
| Name | Description | Example |
|
||||
|:-----|:------------|:--------|
|
||||
| vpc_id | Must be set to `vpc_id` output by cluster | "${module.cluster.vpc_id}" |
|
||||
| subnet_ids | Must be set to `subnet_ids` output by cluster | "${module.cluster.subnet_ids}" |
|
||||
| security_groups | Must be set to `worker_security_groups` output by cluster | "${module.cluster.worker_security_groups}" |
|
||||
| name | Unique name (distinct from cluster name) | "tempest-m5s" |
|
||||
| kubeconfig | Must be set to `kubeconfig` output by cluster | "${module.cluster.kubeconfig}" |
|
||||
| ssh_authorized_key | SSH public key for ~/.ssh_authorized_keys | "ssh-rsa AAAAB3NZ..." |
|
||||
|
||||
#### Optional
|
||||
|
||||
| Name | Description | Default | Example |
|
||||
|:-----|:------------|:--------|:--------|
|
||||
| count | Number of instances | 1 | 3 |
|
||||
| instance_type | EC2 instance type | "t2.small" | "t2.medium" |
|
||||
| os_channel | Container Linux AMI channel | stable| "beta", "alpha" |
|
||||
| disk_size | Size of the disk in GB | 40 | 100 |
|
||||
| service_cidr | Must match `service_cidr` of cluster | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| cluster_domain_suffix | Must match `cluster_domain_suffix` of cluster | "cluster.local" | "k8s.example.com" |
|
||||
|
||||
Check the list of valid [instance types](https://aws.amazon.com/ec2/instance-types/).
|
||||
|
||||
## Google Cloud
|
||||
|
||||
Create a cluster following the Google Cloud [tutorial](../google-cloud.md#cluster). Define a worker pool using the Google Cloud internal `workers` module.
|
||||
|
||||
```tf
|
||||
module "yavin-worker-pool" {
|
||||
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes/workers?ref=v1.9.6"
|
||||
|
||||
providers = {
|
||||
google = "google.default"
|
||||
}
|
||||
|
||||
# Google Cloud
|
||||
region = "us-central1"
|
||||
network = "${module.google-cloud-yavin.network_name}"
|
||||
cluster_name = "yavin"
|
||||
|
||||
# configuration
|
||||
name = "yavin-16x"
|
||||
kubeconfig = "${module.google-cloud-yavin.kubeconfig}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
|
||||
count = 2
|
||||
machine_type = "n1-standard-16"
|
||||
os_image = "coreos-beta"
|
||||
preemptible = true
|
||||
}
|
||||
```
|
||||
|
||||
Apply the change.
|
||||
|
||||
```
|
||||
terraform apply
|
||||
```
|
||||
|
||||
Verify a managed instance group of workers joins the cluster within a few minutes.
|
||||
|
||||
```
|
||||
$ kubectl get nodes
|
||||
NAME STATUS AGE VERSION
|
||||
yavin-controller-0.c.example-com.internal Ready 6m v1.9.6
|
||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.9.6
|
||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.9.6
|
||||
yavin-16x-worker-jrbf.c.example-com.internal Ready 3m v1.9.6
|
||||
yavin-16x-worker-mzdm.c.example-com.internal Ready 3m v1.9.6
|
||||
```
|
||||
|
||||
### Variables
|
||||
|
||||
The Google Cloud internal `workers` module supports a number of [variables](https://github.com/poseidon/typhoon/blob/master/google-cloud/container-linux/kubernetes/workers/variables.tf).
|
||||
|
||||
#### Required
|
||||
|
||||
| Name | Description | Example |
|
||||
|:-----|:------------|:--------|
|
||||
| region | Must be set to `region` of cluster | "us-central1" |
|
||||
| network | Must be set to `network_name` output by cluster | "${module.cluster.network_name}" |
|
||||
| name | Unique name (distinct from cluster name) | "yavin-16x" |
|
||||
| cluster_name | Must be set to `cluster_name` of cluster | "yavin" |
|
||||
| kubeconfig | Must be set to `kubeconfig` output by cluster | "${module.cluster.kubeconfig}" |
|
||||
| ssh_authorized_key | SSH public key for ~/.ssh_authorized_keys | "ssh-rsa AAAAB3NZ..." |
|
||||
|
||||
#### Optional
|
||||
|
||||
| Name | Description | Default | Example |
|
||||
|:-----|:------------|:--------|:--------|
|
||||
| count | Number of instances | 1 | 3 |
|
||||
| machine_type | Compute instance machine type | "n1-standard-1" | See below |
|
||||
| os_image | OS image for compute instances | "coreos-stable" | "coreos-alpha", "coreos-beta" |
|
||||
| disk_size | Size of the disk in GB | 40 | 100 |
|
||||
| preemptible | If true, Compute Engine will terminate instances randomly within 24 hours | false | true |
|
||||
| service_cidr | Must match `service_cidr` of cluster | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| cluster_domain_suffix | Must match `cluster_domain_suffix` of cluster | "cluster.local" | "k8s.example.com" |
|
||||
|
||||
Check the list of valid [machine types](https://cloud.google.com/compute/docs/machine-types).
|
||||
|
72
docs/aws.md
72
docs/aws.md
@ -1,6 +1,6 @@
|
||||
# AWS
|
||||
|
||||
In this tutorial, we'll create a Kubernetes v1.8.3 cluster on AWS.
|
||||
In this tutorial, we'll create a Kubernetes v1.9.6 cluster on AWS.
|
||||
|
||||
We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module. On apply, a VPC, gateway, subnets, auto-scaling groups of controllers and workers, network load balancers for controllers and workers, and security groups will be created.
|
||||
|
||||
@ -10,23 +10,23 @@ Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube
|
||||
|
||||
* AWS Account and IAM credentials
|
||||
* AWS Route53 DNS Zone (registered Domain Name or delegated subdomain)
|
||||
* Terraform v0.10.4+ and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
|
||||
* Terraform v0.11.x and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
|
||||
|
||||
## Terraform Setup
|
||||
|
||||
Install [Terraform](https://www.terraform.io/downloads.html) v0.10.1 on your system.
|
||||
Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your system.
|
||||
|
||||
```sh
|
||||
$ terraform version
|
||||
Terraform v0.10.7
|
||||
Terraform v0.11.1
|
||||
```
|
||||
|
||||
Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system.
|
||||
|
||||
```sh
|
||||
wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.2.0/terraform-provider-ct-v0.2.0-linux-amd64.tar.gz
|
||||
tar xzf terraform-provider-ct-v0.2.0-linux-amd64.tar.gz
|
||||
sudo mv terraform-provider-ct-v0.2.0-linux-amd64/terraform-provider-ct /usr/local/bin/
|
||||
wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.2.1/terraform-provider-ct-v0.2.1-linux-amd64.tar.gz
|
||||
tar xzf terraform-provider-ct-v0.2.1-linux-amd64.tar.gz
|
||||
sudo mv terraform-provider-ct-v0.2.1-linux-amd64/terraform-provider-ct /usr/local/bin/
|
||||
```
|
||||
|
||||
Add the plugin to your `~/.terraformrc`.
|
||||
@ -57,9 +57,32 @@ Configure the AWS provider to use your access key credentials in a `providers.tf
|
||||
|
||||
```tf
|
||||
provider "aws" {
|
||||
version = "~> 1.5.0"
|
||||
alias = "default"
|
||||
|
||||
region = "eu-central-1"
|
||||
shared_credentials_file = "/home/user/.config/aws/credentials"
|
||||
}
|
||||
|
||||
provider "local" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "null" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "template" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "tls" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
```
|
||||
|
||||
Additional configuration options are described in the `aws` provider [docs](https://www.terraform.io/docs/providers/aws/).
|
||||
@ -73,8 +96,16 @@ Define a Kubernetes cluster using the module `aws/container-linux/kubernetes`.
|
||||
|
||||
```tf
|
||||
module "aws-tempest" {
|
||||
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes"
|
||||
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.9.6"
|
||||
|
||||
providers = {
|
||||
aws = "aws.default"
|
||||
local = "local.default"
|
||||
null = "null.default"
|
||||
template = "template.default"
|
||||
tls = "tls.default"
|
||||
}
|
||||
|
||||
cluster_name = "tempest"
|
||||
|
||||
# AWS
|
||||
@ -103,7 +134,7 @@ ssh-add -L
|
||||
```
|
||||
|
||||
!!! warning
|
||||
`terrafrom apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
|
||||
`terraform apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
|
||||
|
||||
## Apply
|
||||
|
||||
@ -119,7 +150,7 @@ Get or update Terraform modules.
|
||||
$ terraform get # downloads missing modules
|
||||
$ terraform get --update # updates all modules
|
||||
Get: git::https://github.com/poseidon/typhoon (update)
|
||||
Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.8.2 (update)
|
||||
Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.11.0 (update)
|
||||
```
|
||||
|
||||
Plan the resources to be created.
|
||||
@ -148,12 +179,12 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
|
||||
[Install kubectl](https://coreos.com/kubernetes/docs/latest/configure-kubectl.html) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
|
||||
|
||||
```
|
||||
$ KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
|
||||
$ export KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
|
||||
$ kubectl get nodes
|
||||
NAME STATUS AGE VERSION
|
||||
ip-10-0-12-221 Ready 34m v1.8.3
|
||||
ip-10-0-19-112 Ready 34m v1.8.3
|
||||
ip-10-0-4-22 Ready 34m v1.8.3
|
||||
ip-10-0-12-221 Ready 34m v1.9.6
|
||||
ip-10-0-19-112 Ready 34m v1.9.6
|
||||
ip-10-0-4-22 Ready 34m v1.9.6
|
||||
```
|
||||
|
||||
List the pods.
|
||||
@ -179,10 +210,10 @@ kube-system pod-checkpointer-4kxtl-ip-10-0-12-221 1/1 Running 0
|
||||
|
||||
## Going Further
|
||||
|
||||
Learn about [version pinning](concepts.md#versioning), maintenance, and [addons](addons/overview.md).
|
||||
Learn about [maintenance](topics/maintenance.md) and [addons](addons/overview.md).
|
||||
|
||||
!!! note
|
||||
On Container Linux clusters, install the `container-linux-update-operator` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
|
||||
On Container Linux clusters, install the `CLUO` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
|
||||
|
||||
## Variables
|
||||
|
||||
@ -194,14 +225,13 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]
|
||||
| dns_zone | AWS Route53 DNS zone | "aws.example.com" |
|
||||
| dns_zone_id | AWS Route53 DNS zone id | "Z3PAABBCFAKEC0" |
|
||||
| ssh_authorized_key | SSH public key for ~/.ssh_authorized_keys | "ssh-rsa AAAAB3NZ..." |
|
||||
| os_channel | Container Linux AMI channel | stable, beta, alpha |
|
||||
| asset_dir | Path to a directory where generated assets should be placed (contains secrets) | "/home/user/.secrets/clusters/tempest" |
|
||||
|
||||
#### DNS Zone
|
||||
|
||||
Clusters create a DNS A record `${cluster_name}.${dns_zone}` to resolve a network load balancer backed by controller instances. This FQDN is used by workers and `kubectl` to access the apiserver. In this example, the cluster's apiserver would be accessible at `tempest.aws.example.com`.
|
||||
|
||||
You'll need a registered domain name or subdomain registered in a AWS Route53 DNS zone. You can set this up once and create many clusters with unqiue names.
|
||||
You'll need a registered domain name or subdomain registered in a AWS Route53 DNS zone. You can set this up once and create many clusters with unique names.
|
||||
|
||||
```tf
|
||||
resource "aws_route53_zone" "zone-for-clusters" {
|
||||
@ -222,12 +252,16 @@ Reference the DNS zone id with `"${aws_route53_zone.zone-for-clusters.zone_id}"`
|
||||
| controller_type | Controller EC2 instance type | "t2.small" | "t2.medium" |
|
||||
| worker_count | Number of workers | 1 | 3 |
|
||||
| worker_type | Worker EC2 instance type | "t2.small" | "t2.medium" |
|
||||
| os_channel | Container Linux AMI channel | stable | stable, beta, alpha |
|
||||
| disk_size | Size of the EBS volume in GB | "40" | "100" |
|
||||
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
|
||||
| network_mtu | CNI interface MTU (calico only) | 1480 | 8981 |
|
||||
| host_cidr | CIDR range to assign to EC2 instances | "10.0.0.0/16" | "10.1.0.0/16" |
|
||||
| pod_cidr | CIDR range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
|
||||
| service_cidr | CIDR range to assgin to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| service_cidr | CIDR range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by kube-dns. | "cluster.local" | "k8s.example.com" |
|
||||
| controller_clc_snippets | Controller Container Linux Config snippets | [] | |
|
||||
| worker_clc_snippets | Worker Container Linux Config snippets | [] | |
|
||||
|
||||
Check the list of valid [instance types](https://aws.amazon.com/ec2/instance-types/).
|
||||
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user