Compare commits
72 Commits
Author | SHA1 | Date | |
---|---|---|---|
a37aff7f35 | |||
03d23bfde7 | |||
2c10d24113 | |||
82a616c70b | |||
2fa7dac247 | |||
a41691b222 | |||
9034203d7a | |||
d42f6d6b5d | |||
2fa1840c30 | |||
8e0b8d7e40 | |||
a0cf527ccf | |||
65321acad2 | |||
064ce83f25 | |||
c3b0cdddf3 | |||
211ec94c75 | |||
8aca5a089e | |||
3e6e4ea339 | |||
103f1e16d7 | |||
50dd3e3b82 | |||
3dc755994b | |||
ddbfb2eee1 | |||
868265988b | |||
6adffcb778 | |||
bc967ddcd0 | |||
ef18f19ec4 | |||
f5efcc1ff8 | |||
996651c605 | |||
38fa7dff1a | |||
bbe295a3f1 | |||
d8db296932 | |||
388ac08492 | |||
527b5ca602 | |||
ecd6a9443b | |||
2523d64f95 | |||
fc455c8624 | |||
7a0a60708e | |||
51a5f64024 | |||
e1f2125f02 | |||
9329b775f6 | |||
e04cce1201 | |||
201a38bd90 | |||
fbdd946601 | |||
19102636a9 | |||
21e540159b | |||
43e65a4d13 | |||
e79088baa0 | |||
495e33e213 | |||
63f5a26a72 | |||
eea79e895d | |||
99c07661c6 | |||
521a1f0fee | |||
7345cb6419 | |||
a481d71d7d | |||
831a5c976c | |||
85e6783503 | |||
165396d6aa | |||
ce49a93d5d | |||
e623439eec | |||
9548572d98 | |||
f00ecde854 | |||
d85300f947 | |||
65f006e6cc | |||
8d3817e0ae | |||
5f5eec1175 | |||
5308fde3d3 | |||
9ab61d7bf5 | |||
6483f613c5 | |||
56c6bf431a | |||
63ab117205 | |||
1cd262e712 | |||
32bdda1b6c | |||
07d257aa7b |
2
.github/ISSUE_TEMPLATE.md
vendored
@ -4,7 +4,7 @@
|
||||
|
||||
### Environment
|
||||
|
||||
* Platform: bare-metal, google-cloud, digital-ocean
|
||||
* Platform: aws, bare-metal, google-cloud, digital-ocean
|
||||
* OS: container-linux, fedora-cloud
|
||||
* Terraform: `terraform version`
|
||||
* Plugins: Provider plugin versions
|
||||
|
110
CHANGES.md
@ -4,6 +4,108 @@ Notable changes between versions.
|
||||
|
||||
## Latest
|
||||
|
||||
## v1.9.3
|
||||
|
||||
* Kubernetes [v1.9.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v193)
|
||||
* Network improvements and fixes ([#104](https://github.com/poseidon/typhoon/pull/104))
|
||||
* Switch from Calico v2.6.6 to v3.0.2
|
||||
* Add Calico GlobalNetworkSet CRD
|
||||
* Update flannel from v0.9.0 to v0.10.0
|
||||
* Use separate service account for flannel
|
||||
* Update etcd from v3.2.14 to v3.2.15
|
||||
|
||||
#### Addons
|
||||
|
||||
* Update Prometheus from v2.0.0 to v2.1.0 ([#113](https://github.com/poseidon/typhoon/pull/113))
|
||||
* Improve alerting rules
|
||||
* Relabel discovered kubelet, endpoint, service, and apiserver scrapes
|
||||
* Use separate service accounts
|
||||
* Update node-exporter and kube-state-metrics
|
||||
* Include Grafana dashboards for Kubernetes admins ([#113](https://github.com/poseidon/typhoon/pull/113))
|
||||
* Add grafana-watcher to load bundled upstream dashboards
|
||||
* Update nginx-ingress from 0.9.0 to 0.10.2
|
||||
* Update CLUO from v0.5.0 to v0.6.0
|
||||
* Switch manifests to use `apps/v1` Deployments and Daemonsets ([#120](https://github.com/poseidon/typhoon/pull/120))
|
||||
* Remove Kubernetes Dashboard manifests ([#121](https://github.com/poseidon/typhoon/pull/121))
|
||||
|
||||
#### Digital Ocean
|
||||
|
||||
* Use new Droplet [types](https://developers.digitalocean.com/documentation/changelog/api-v2/new-size-slugs-for-droplet-plan-changes/) which offer more CPU/memory, at lower cost. ([#105](https://github.com/poseidon/typhoon/pull/105))
|
||||
* A small Digital Ocean cluster costs less than $25 a month!
|
||||
|
||||
## v1.9.2
|
||||
|
||||
* Kubernetes [v1.9.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v192)
|
||||
* Add Terraform v0.11.x support
|
||||
* Add explicit "providers" section to modules for Terraform v0.11.x
|
||||
* Retain support for Terraform v0.10.4+
|
||||
* Add [migration guide](https://github.com/poseidon/typhoon/blob/master/docs/topics/maintenance.md) from Terraform v0.10.x to v0.11.x (**action required!**)
|
||||
* Update etcd from 3.2.13 to 3.2.14
|
||||
* Update calico from 2.6.5 to 2.6.6
|
||||
* Update kube-dns from v1.14.7 to v1.14.8
|
||||
* Use separate service account for kube-dns
|
||||
* Use kubernetes-incubator/bootkube v0.10.0
|
||||
|
||||
#### Addons
|
||||
|
||||
* Update CLUO to v0.5.0 to fix compatibility with Kubernetes 1.9 (**important**)
|
||||
* Earlier versions can't roll out Container Linux updates on Kubernetes 1.9 nodes ([cluo#163](https://github.com/coreos/container-linux-update-operator/issues/163))
|
||||
* Update kube-state-metrics from v1.1.0 to v1.2.0
|
||||
* Fix RBAC cluster role for kube-state-metrics
|
||||
|
||||
#### Bare-Metal
|
||||
|
||||
* Use per-node Container Linux install profiles ([#97](https://github.com/poseidon/typhoon/pull/97))
|
||||
* Allow Container Linux channel/version to be chosen per-cluster
|
||||
* Fix issue where cluster deletion could require `terraform apply` multiple times
|
||||
|
||||
#### Digital Ocean
|
||||
|
||||
* Relax `digitalocean` provider version constraint
|
||||
* Fix bug with `terraform plan` always showing a firewall diff to be applied ([#3](https://github.com/poseidon/typhoon/issues/3))
|
||||
|
||||
## v1.9.1
|
||||
|
||||
* Kubernetes [v1.9.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v191)
|
||||
* Update kube-dns from 1.14.5 to v1.14.7
|
||||
* Update etcd from 3.2.0 to 3.2.13
|
||||
* Update Calico from v2.6.4 to v2.6.5
|
||||
* Enable portmap to fix hostPort with Calico
|
||||
* Use separate service account for controller-manager
|
||||
|
||||
## v1.8.6
|
||||
|
||||
* Kubernetes [v1.8.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#v186)
|
||||
* Update Calico from v2.6.3 to v2.6.4
|
||||
|
||||
## v1.8.5
|
||||
|
||||
* Kubernetes [v1.8.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#v185)
|
||||
* Recommend Container Linux [images](https://coreos.com/releases/) with Docker 17.09
|
||||
* Container Linux stable, beta, and alpha now provide Docker 17.09 (instead
|
||||
of 1.12)
|
||||
* Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09
|
||||
* Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69))
|
||||
* Add optional `cluster_domain_suffix` variable (#74)
|
||||
* Use kubernetes-incubator/bootkube v0.9.1
|
||||
|
||||
#### Bare-Metal
|
||||
|
||||
* Add kubelet `--volume-plugin-dir` flag to allow flexvolume providers ([#61](https://github.com/poseidon/typhoon/pull/61))
|
||||
|
||||
#### Addons
|
||||
|
||||
* Discourage deploying the Kubernetes Dashboard (security)
|
||||
|
||||
## v1.8.4
|
||||
|
||||
* Kubernetes v1.8.4
|
||||
* Calico related bug fixes
|
||||
* Update Calico from v2.6.1 to v2.6.3
|
||||
* Update flannel from v0.9.0 to v0.9.1
|
||||
* Service accounts for kube-proxy and pod-checkpointer
|
||||
* Use kubernetes-incubator/bootkube v0.9.0
|
||||
|
||||
## v1.8.3
|
||||
|
||||
* Kubernetes v1.8.3
|
||||
@ -86,7 +188,7 @@ Notable changes between versions.
|
||||
## v1.7.3
|
||||
|
||||
* Kubernetes v1.7.3
|
||||
* Use kubernete-incubator/bootkube v0.6.1
|
||||
* Use kubernetes-incubator/bootkube v0.6.1
|
||||
|
||||
#### Digital Ocean
|
||||
|
||||
@ -96,7 +198,7 @@ Notable changes between versions.
|
||||
## v1.7.1
|
||||
|
||||
* Kubernetes v1.7.1
|
||||
* Use kubernete-incubator/bootkube v0.6.0
|
||||
* Use kubernetes-incubator/bootkube v0.6.0
|
||||
* Add Bare-Metal Terraform module (stable)
|
||||
* Add Digital Ocean Terraform module (beta)
|
||||
|
||||
@ -109,12 +211,12 @@ Notable changes between versions.
|
||||
## v1.6.7
|
||||
|
||||
* Kubernetes v1.6.7
|
||||
* Use kubernete-incubator/bootkube v0.5.1
|
||||
* Use kubernetes-incubator/bootkube v0.5.1
|
||||
|
||||
## v1.6.6
|
||||
|
||||
* Kubernetes v1.6.6
|
||||
* Use kubernete-incubator/bootkube v0.4.5
|
||||
* Use kubernetes-incubator/bootkube v0.4.5
|
||||
* Disable locksmithd on hosts, in favor of [CLUO](https://github.com/coreos/container-linux-update-operator).
|
||||
|
||||
## v1.6.4
|
||||
|
@ -2,4 +2,4 @@
|
||||
|
||||
## Developer Certificate of Origin
|
||||
|
||||
By contributing, you agree to the Linux Foundation's Developer Certificate of Origin ([DOC](DCO)). The DCO is a statement that you, the contributor, have the legal right to make your contribution and understand the contribution will be distributed as part of this project.
|
||||
By contributing, you agree to the Linux Foundation's Developer Certificate of Origin ([DCO](DCO)). The DCO is a statement that you, the contributor, have the legal right to make your contribution and understand the contribution will be distributed as part of this project.
|
||||
|
26
README.md
@ -1,4 +1,4 @@
|
||||
# Typhoon []() <img align="right" src="https://storage.googleapis.com/dghubble/spin.png">
|
||||
# Typhoon []() <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
|
||||
|
||||
Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
|
||||
|
||||
## Features
|
||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||
|
||||
* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Kubernetes v1.9.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
@ -44,12 +44,20 @@ Define a Kubernetes cluster by using the Terraform module for your chosen platfo
|
||||
```tf
|
||||
module "google-cloud-yavin" {
|
||||
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes"
|
||||
|
||||
providers = {
|
||||
google = "google.default"
|
||||
local = "local.default"
|
||||
null = "null.default"
|
||||
template = "template.default"
|
||||
tls = "tls.default"
|
||||
}
|
||||
|
||||
# Google Cloud
|
||||
region = "us-central1"
|
||||
dns_zone = "example.com"
|
||||
dns_zone_name = "example-zone"
|
||||
os_image = "coreos-stable-1465-6-0-v20170817"
|
||||
os_image = "coreos-stable-1576-5-0-v20180105"
|
||||
|
||||
cluster_name = "yavin"
|
||||
controller_count = 1
|
||||
@ -75,12 +83,12 @@ Apply complete! Resources: 37 added, 0 changed, 0 destroyed.
|
||||
In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.
|
||||
|
||||
```sh
|
||||
$ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
|
||||
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
|
||||
$ kubectl get nodes
|
||||
NAME STATUS AGE VERSION
|
||||
yavin-controller-0.c.example-com.internal Ready 6m v1.8.3
|
||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.8.3
|
||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.8.3
|
||||
yavin-controller-0.c.example-com.internal Ready 6m v1.9.3
|
||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.9.3
|
||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.9.3
|
||||
```
|
||||
|
||||
List the pods.
|
||||
@ -127,4 +135,4 @@ Typhoon is not a product, trial, or free-tier. It is not run by a company, does
|
||||
|
||||
Typhoon clusters will contain only [free](https://www.debian.org/intro/free) components. Cluster components will not collect data on users without their permission.
|
||||
|
||||
*Disclosure: The author works for CoreOS and previously wrote Matchbox and original Tectonic for bare-metal and AWS. This project is not associated with CoreOS.*
|
||||
*Disclosure: The author works for Red Hat (prev CoreOS), but Typhoon is unassociated and maintained independently.*
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: reboot-coordinator
|
||||
roleRef:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: reboot-coordinator
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: container-linux-update-agent
|
||||
@ -8,6 +8,9 @@ spec:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: container-linux-update-agent
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
@ -15,7 +18,7 @@ spec:
|
||||
spec:
|
||||
containers:
|
||||
- name: update-agent
|
||||
image: quay.io/coreos/container-linux-update-operator:v0.4.1
|
||||
image: quay.io/coreos/container-linux-update-operator:v0.6.0
|
||||
command:
|
||||
- "/bin/update-agent"
|
||||
volumeMounts:
|
||||
|
@ -1,10 +1,13 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: container-linux-update-operator
|
||||
namespace: reboot-coordinator
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: container-linux-update-operator
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
@ -12,7 +15,7 @@ spec:
|
||||
spec:
|
||||
containers:
|
||||
- name: update-operator
|
||||
image: quay.io/coreos/container-linux-update-operator:v0.4.1
|
||||
image: quay.io/coreos/container-linux-update-operator:v0.6.0
|
||||
command:
|
||||
- "/bin/update-operator"
|
||||
env:
|
||||
|
@ -1,32 +0,0 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: kubernetes-dashboard
|
||||
namespace: kube-system
|
||||
spec:
|
||||
replicas: 1
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
name: kubernetes-dashboard
|
||||
phase: prod
|
||||
spec:
|
||||
containers:
|
||||
- name: kubernetes-dashboard
|
||||
image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.1
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: 9090
|
||||
resources:
|
||||
limits:
|
||||
cpu: 100m
|
||||
memory: 300Mi
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 100Mi
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /
|
||||
port: 9090
|
||||
initialDelaySeconds: 30
|
||||
timeoutSeconds: 30
|
@ -1,15 +0,0 @@
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: kubernetes-dashboard
|
||||
namespace: kube-system
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
name: kubernetes-dashboard
|
||||
phase: prod
|
||||
ports:
|
||||
- name: http
|
||||
protocol: TCP
|
||||
port: 80
|
||||
targetPort: 9090
|
7499
addons/grafana/config.yaml
Normal file
@ -0,0 +1,7499 @@
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: grafana-dashboards
|
||||
namespace: monitoring
|
||||
data:
|
||||
deployment-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 1,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "200px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 8,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "cores",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "CPU",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 9,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "GB",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "80%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(container_memory_usage_bytes{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}) / 1024^3",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Memory",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "Bps",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": false
|
||||
},
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Network",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "100px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": false
|
||||
},
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"metric": "kube_deployment_spec_replicas",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Desired Replicas",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 6,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Available Replicas",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 3,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_deployment_status_observed_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Observed Generation",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_deployment_metadata_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Metadata Generation",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "350px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 1,
|
||||
"isNew": true,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_deployment_status_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "current replicas",
|
||||
"refId": "A",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "available",
|
||||
"refId": "B",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "max(kube_deployment_status_replicas_unavailable{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "unavailable",
|
||||
"refId": "C",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "min(kube_deployment_status_replicas_updated{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "updated",
|
||||
"refId": "D",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "desired",
|
||||
"refId": "E",
|
||||
"step": 30
|
||||
}
|
||||
],
|
||||
"title": "Replicas",
|
||||
"tooltip": {
|
||||
"msResolution": true,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "none",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"show": false
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Namespace",
|
||||
"multi": false,
|
||||
"name": "deployment_namespace",
|
||||
"options": [],
|
||||
"query": "label_values(kube_deployment_metadata_generation, namespace)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": null,
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
},
|
||||
{
|
||||
"allValue": null,
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Deployment",
|
||||
"multi": false,
|
||||
"name": "deployment_name",
|
||||
"options": [],
|
||||
"query": "label_values(kube_deployment_metadata_generation{namespace=\"$deployment_namespace\"}, deployment)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "deployment",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Deployment",
|
||||
"version": 1
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
etcd-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"label": "prometheus",
|
||||
"description": "",
|
||||
"type": "datasource",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus"
|
||||
}
|
||||
],
|
||||
"__requires": [
|
||||
{
|
||||
"type": "grafana",
|
||||
"id": "grafana",
|
||||
"name": "Grafana",
|
||||
"version": "4.5.2"
|
||||
},
|
||||
{
|
||||
"type": "panel",
|
||||
"id": "graph",
|
||||
"name": "Graph",
|
||||
"version": ""
|
||||
},
|
||||
{
|
||||
"type": "datasource",
|
||||
"id": "prometheus",
|
||||
"name": "Prometheus",
|
||||
"version": "1.0.0"
|
||||
},
|
||||
{
|
||||
"type": "panel",
|
||||
"id": "singlestat",
|
||||
"name": "Singlestat",
|
||||
"version": ""
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"description": "etcd sample Grafana dashboard with Prometheus",
|
||||
"editable": false,
|
||||
"gnetId": null,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"id": null,
|
||||
"links": [],
|
||||
"refresh": false,
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"cacheTimeout": null,
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 28,
|
||||
"interval": null,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"nullText": null,
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"tableColumn": "",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(etcd_server_has_leader)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"metric": "etcd_server_has_leader",
|
||||
"refId": "A",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"thresholds": "",
|
||||
"title": "Up",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "200%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 23,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 5,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(grpc_server_started_total{grpc_type=\"unary\"}[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "RPC Rate",
|
||||
"metric": "grpc_server_started_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(grpc_server_handled_total{grpc_type=\"unary\",grpc_code!=\"OK\"}[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "RPC Failed Rate",
|
||||
"metric": "grpc_server_handled_total",
|
||||
"refId": "B",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "RPC Rate",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "ops",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 41,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 4,
|
||||
"stack": true,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Watch Streams",
|
||||
"metric": "grpc_server_handled_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
},
|
||||
{
|
||||
"expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Lease Streams",
|
||||
"metric": "grpc_server_handled_total",
|
||||
"refId": "B",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Active Streams",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"repeat": null,
|
||||
"repeatIteration": null,
|
||||
"repeatRowId": null,
|
||||
"showTitle": false,
|
||||
"title": "Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"decimals": null,
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"grid": {},
|
||||
"id": 1,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 4,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "etcd_debugging_mvcc_db_total_size_in_bytes",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} DB Size",
|
||||
"metric": "",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "DB Size",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"grid": {},
|
||||
"id": 3,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 1,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 4,
|
||||
"stack": false,
|
||||
"steppedLine": true,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} WAL fsync",
|
||||
"metric": "etcd_disk_wal_fsync_duration_seconds_bucket",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
},
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} DB fsync",
|
||||
"metric": "etcd_disk_backend_commit_duration_seconds_bucket",
|
||||
"refId": "B",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Disk Sync Duration",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "s",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 29,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 4,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "process_resident_memory_bytes",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Resident Memory",
|
||||
"metric": "process_resident_memory_bytes",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Memory",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"repeat": null,
|
||||
"repeatIteration": null,
|
||||
"repeatRowId": null,
|
||||
"showTitle": false,
|
||||
"title": "New row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 5,
|
||||
"id": 22,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 3,
|
||||
"stack": true,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(etcd_network_client_grpc_received_bytes_total[5m])",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Client Traffic In",
|
||||
"metric": "etcd_network_client_grpc_received_bytes_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Client Traffic In",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "Bps",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 5,
|
||||
"id": 21,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 3,
|
||||
"stack": true,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(etcd_network_client_grpc_sent_bytes_total[5m])",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Client Traffic Out",
|
||||
"metric": "etcd_network_client_grpc_sent_bytes_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Client Traffic Out",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "Bps",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 20,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 3,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Peer Traffic In",
|
||||
"metric": "etcd_network_peer_received_bytes_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Peer Traffic In",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "Bps",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"decimals": null,
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"grid": {},
|
||||
"id": 16,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 3,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance)",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Peer Traffic Out",
|
||||
"metric": "etcd_network_peer_sent_bytes_total",
|
||||
"refId": "A",
|
||||
"step": 4
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Peer Traffic Out",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "Bps",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"repeat": null,
|
||||
"repeatIteration": null,
|
||||
"repeatRowId": null,
|
||||
"showTitle": false,
|
||||
"title": "New row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 40,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(etcd_server_proposals_failed_total[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Proposal Failure Rate",
|
||||
"metric": "etcd_server_proposals_failed_total",
|
||||
"refId": "A",
|
||||
"step": 2
|
||||
},
|
||||
{
|
||||
"expr": "sum(etcd_server_proposals_pending)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Proposal Pending Total",
|
||||
"metric": "etcd_server_proposals_pending",
|
||||
"refId": "B",
|
||||
"step": 2
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(etcd_server_proposals_committed_total[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Proposal Commit Rate",
|
||||
"metric": "etcd_server_proposals_committed_total",
|
||||
"refId": "C",
|
||||
"step": 2
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(etcd_server_proposals_applied_total[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Proposal Apply Rate",
|
||||
"refId": "D",
|
||||
"step": 2
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Raft Proposals",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"decimals": 0,
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 0,
|
||||
"id": 19,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "changes(etcd_server_leader_changes_seen_total[1d])",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{instance}} Total Leader Elections Per Day",
|
||||
"metric": "etcd_server_leader_changes_seen_total",
|
||||
"refId": "A",
|
||||
"step": 2
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeShift": null,
|
||||
"title": "Total Leader Elections Per Day",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": null,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"repeat": null,
|
||||
"repeatIteration": null,
|
||||
"repeatRowId": null,
|
||||
"showTitle": false,
|
||||
"title": "New row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-15m",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"now": true,
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "etcd",
|
||||
"version": 4
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
kubernetes-capacity-planning-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"gnetId": 22,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"refresh": false,
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 3,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_cpu{mode=\"idle\"}[2m])) * 100",
|
||||
"hide": false,
|
||||
"intervalFactor": 10,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 50
|
||||
}
|
||||
],
|
||||
"title": "Idle CPU",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "percent",
|
||||
"label": "cpu usage",
|
||||
"logBase": 1,
|
||||
"min": 0,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 9,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(node_load1)",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 1m",
|
||||
"refId": "A",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(node_load5)",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 5m",
|
||||
"refId": "B",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(node_load15)",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 15m",
|
||||
"refId": "C",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "System Load",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "percentunit",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 4,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": true,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory usage",
|
||||
"metric": "memo",
|
||||
"refId": "A",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(node_memory_Buffers)",
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory buffers",
|
||||
"metric": "memo",
|
||||
"refId": "B",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(node_memory_Cached)",
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory cached",
|
||||
"metric": "memo",
|
||||
"refId": "C",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(node_memory_MemFree)",
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory free",
|
||||
"metric": "memo",
|
||||
"refId": "D",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "Memory Usage",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"min": "0",
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100",
|
||||
"intervalFactor": 2,
|
||||
"metric": "",
|
||||
"refId": "A",
|
||||
"step": 60,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Memory Usage",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "246px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 6,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "read",
|
||||
"yaxis": 1
|
||||
},
|
||||
{
|
||||
"alias": "{instance=\"172.17.0.1:9100\"}",
|
||||
"yaxis": 2
|
||||
},
|
||||
{
|
||||
"alias": "io time",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_disk_bytes_read[5m]))",
|
||||
"hide": false,
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "read",
|
||||
"refId": "A",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(node_disk_bytes_written[5m]))",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "written",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "sum(rate(node_disk_io_time_ms[5m]))",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "io time",
|
||||
"refId": "C",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "Disk I/O",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "ms",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percentunit",
|
||||
"gauge": {
|
||||
"maxValue": 1,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 12,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 60,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"thresholds": "0.75, 0.9",
|
||||
"title": "Disk Space Usage",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 8,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "transmitted",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_network_receive_bytes{device!~\"lo\"}[5m]))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "Network Received",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 10,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "transmitted",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_network_transmit_bytes{device!~\"lo\"}[5m]))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "B",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "Network Transmitted",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "276px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 11,
|
||||
"isNew": true,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 11,
|
||||
"span": 9,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_pod_info)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Current number of Pods",
|
||||
"refId": "A",
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"expr": "sum(kube_node_status_capacity_pods)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Maximum capacity of pods",
|
||||
"refId": "B",
|
||||
"step": 10
|
||||
}
|
||||
],
|
||||
"title": "Cluster Pod Utilization",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 60,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Pod Utilization",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-1h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Kubernetes Capacity Planning",
|
||||
"version": 4
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
kubernetes-cluster-health-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"refresh": "10s",
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "254px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 1,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Control Plane Components Down",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "Everything UP and healthy",
|
||||
"value": "null"
|
||||
},
|
||||
{
|
||||
"op": "=",
|
||||
"text": "",
|
||||
"value": ""
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Alerts Firing",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 3,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(ALERTS{alertstate=\"pending\",alertname!=\"DeadMansSwitch\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "3, 5",
|
||||
"title": "Alerts Pending",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 4,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(increase(kube_pod_container_status_restarts[1h]) > 5)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Crashlooping Pods",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status!=\"true\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Node Not Ready",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 6,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_status_condition{condition=\"DiskPressure\",status=\"true\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Node Disk Pressure",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Node Memory Pressure",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 8,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_spec_unschedulable)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Nodes Unschedulable",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Kubernetes Cluster Health",
|
||||
"version": 9
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
kubernetes-cluster-status-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "129px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 6,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Control Plane UP",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "UP",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "total"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 6,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 6,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "3, 5",
|
||||
"title": "Alerts Firing",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": true,
|
||||
"title": "Cluster Health",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "168px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 1,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"apiserver\"} == 1) / count(up{job=\"apiserver\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "API Servers UP",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / count(up{job=\"kube-controller-manager\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "Controller Managers UP",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 3,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"kube-scheduler\"} == 1) / count(up{job=\"kube-scheduler\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "Schedulers UP",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": true,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 4,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(increase(kube_pod_container_status_restarts{namespace=~\"kube-system|tectonic-system\"}[1h]) > 5)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "1, 3",
|
||||
"title": "Crashlooping Control Plane Pods",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": true,
|
||||
"title": "Control Plane Status",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "158px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 8,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(100 - (avg by (instance) (rate(node_cpu{job=\"node-exporter\",mode=\"idle\"}[5m])) * 100)) / count(node_cpu{job=\"node-exporter\",mode=\"idle\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "CPU Utilization",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Memory Utilization",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 9,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Filesystem Utilization",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 10,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Pod Utilization",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": true,
|
||||
"title": "Capacity Planning",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Kubernetes Cluster Status",
|
||||
"version": 3
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
kubernetes-control-plane-status-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 1,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"apiserver\"} == 1) / sum(up{job=\"apiserver\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "API Servers UP",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / sum(up{job=\"kube-controller-manager\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "Controller Managers UP",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 3,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(up{job=\"kube-scheduler\"} == 1) / sum(up{job=\"kube-scheduler\"})) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "50, 80",
|
||||
"title": "Schedulers UP",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 4,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(sum by(instance) (rate(apiserver_request_count{code=~\"5..\"}[5m])) / sum by(instance) (rate(apiserver_request_count[5m]))) * 100",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"thresholds": "5, 10",
|
||||
"title": "API Server Request Error Rate",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "0",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 7,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by(verb) (rate(apiserver_latency_seconds:quantile[5m]) >= 0)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 30
|
||||
}
|
||||
],
|
||||
"title": "API Server Request Latency",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 5,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "cluster:scheduler_e2e_scheduling_latency_seconds:quantile",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 60
|
||||
}
|
||||
],
|
||||
"title": "End to End Scheduling Latency",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "dtdurations",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 6,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by(instance) (rate(apiserver_request_count{code!~\"2..\"}[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Error Rate",
|
||||
"refId": "A",
|
||||
"step": 60
|
||||
},
|
||||
{
|
||||
"expr": "sum by(instance) (rate(apiserver_request_count[5m]))",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Request Rate",
|
||||
"refId": "B",
|
||||
"step": 60
|
||||
}
|
||||
],
|
||||
"title": "API Server Request Rates",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Kubernetes Control Plane Status",
|
||||
"version": 3
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
kubernetes-resource-requests-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"refresh": false,
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "300px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"description": "This represents the total [CPU resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) in the cluster.\nFor comparison the total [allocatable CPU cores](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 1,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "min(sum(kube_node_status_allocatable_cpu_cores) by (instance))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Allocatable CPU Cores",
|
||||
"refId": "A",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Requested CPU Cores",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "CPU Cores",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"label": "CPU Cores",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance)) / min(sum(kube_node_status_allocatable_cpu_cores) by (instance)) * 100",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 240
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "CPU Cores",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "CPU Cores",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "300px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"description": "This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) in the cluster.\nFor comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 3,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "min(sum(kube_node_status_allocatable_memory_bytes) by (instance))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Allocatable Memory",
|
||||
"refId": "A",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance))",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Requested Memory",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "Memory",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"label": "Memory",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 4,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance)) / min(sum(kube_node_status_allocatable_memory_bytes) by (instance)) * 100",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "",
|
||||
"refId": "A",
|
||||
"step": 240
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Memory",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Memory",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-3h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Kubernetes Resource Requests",
|
||||
"version": 2
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
nodes-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"description": "Dashboard to get an overview of one server",
|
||||
"editable": false,
|
||||
"gnetId": 22,
|
||||
"graphTooltip": 0,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"refresh": false,
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 3,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (avg by (cpu) (irate(node_cpu{mode=\"idle\", instance=\"$server\"}[5m])) * 100)",
|
||||
"hide": false,
|
||||
"intervalFactor": 10,
|
||||
"legendFormat": "{{cpu}}",
|
||||
"refId": "A",
|
||||
"step": 50
|
||||
}
|
||||
],
|
||||
"title": "Idle CPU",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "percent",
|
||||
"label": "cpu usage",
|
||||
"logBase": 1,
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 9,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_load1{instance=\"$server\"}",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 1m",
|
||||
"refId": "A",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "node_load5{instance=\"$server\"}",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 5m",
|
||||
"refId": "B",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "node_load15{instance=\"$server\"}",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "load 15m",
|
||||
"refId": "C",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "System Load",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "percentunit",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 4,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": true,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}",
|
||||
"hide": false,
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory used",
|
||||
"metric": "",
|
||||
"refId": "C",
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"expr": "node_memory_Buffers{instance=\"$server\"}",
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory buffers",
|
||||
"metric": "",
|
||||
"refId": "E",
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"expr": "node_memory_Cached{instance=\"$server\"}",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory cached",
|
||||
"metric": "",
|
||||
"refId": "F",
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"expr": "node_memory_MemFree{instance=\"$server\"}",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "memory free",
|
||||
"metric": "",
|
||||
"refId": "D",
|
||||
"step": 10
|
||||
}
|
||||
],
|
||||
"title": "Memory Usage",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"min": "0",
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percent",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "((node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}) / node_memory_MemTotal{instance=\"$server\"}) * 100",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 60,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"thresholds": "80, 90",
|
||||
"title": "Memory Usage",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 6,
|
||||
"isNew": true,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "read",
|
||||
"yaxis": 1
|
||||
},
|
||||
{
|
||||
"alias": "{instance=\"172.17.0.1:9100\"}",
|
||||
"yaxis": 2
|
||||
},
|
||||
{
|
||||
"alias": "io time",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 9,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (instance) (rate(node_disk_bytes_read{instance=\"$server\"}[2m]))",
|
||||
"hide": false,
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "read",
|
||||
"refId": "A",
|
||||
"step": 20,
|
||||
"target": ""
|
||||
},
|
||||
{
|
||||
"expr": "sum by (instance) (rate(node_disk_bytes_written{instance=\"$server\"}[2m]))",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "written",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "sum by (instance) (rate(node_disk_io_time_ms{instance=\"$server\"}[2m]))",
|
||||
"intervalFactor": 4,
|
||||
"legendFormat": "io time",
|
||||
"refId": "C",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "Disk I/O",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "ms",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(50, 172, 45, 0.97)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(245, 54, 54, 0.9)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "percentunit",
|
||||
"gauge": {
|
||||
"maxValue": 1,
|
||||
"minValue": 0,
|
||||
"show": true,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"hideTimeOverride": false,
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"}) - sum(node_filesystem_free{device!=\"rootfs\",instance=\"$server\"})) / sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"})",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 60,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"thresholds": "0.75, 0.9",
|
||||
"title": "Disk Space Usage",
|
||||
"transparent": false,
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "current"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 8,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "transmitted",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(node_network_receive_bytes{instance=\"$server\",device!~\"lo\"}[5m])",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{device}}",
|
||||
"refId": "A",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "Network Received",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 10,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "transmitted",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"span": 6,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(node_network_transmit_bytes{instance=\"$server\",device!~\"lo\"}[5m])",
|
||||
"hide": false,
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{device}}",
|
||||
"refId": "B",
|
||||
"step": 10,
|
||||
"target": ""
|
||||
}
|
||||
],
|
||||
"title": "Network Transmitted",
|
||||
"tooltip": {
|
||||
"msResolution": false,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"allValue": null,
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": null,
|
||||
"multi": false,
|
||||
"name": "server",
|
||||
"options": [],
|
||||
"query": "label_values(node_boot_time, instance)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-1h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Nodes",
|
||||
"version": 2
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
pods-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 1,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"refresh": false,
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 1,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": true,
|
||||
"avg": true,
|
||||
"current": true,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": true,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": true
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by(container_name) (container_memory_usage_bytes{pod_name=\"$pod\", container_name=~\"$container\", container_name!=\"POD\"})",
|
||||
"interval": "10s",
|
||||
"intervalFactor": 1,
|
||||
"legendFormat": "Current: {{ container_name }}",
|
||||
"metric": "container_memory_usage_bytes",
|
||||
"refId": "A",
|
||||
"step": 15
|
||||
},
|
||||
{
|
||||
"expr": "kube_pod_container_resource_requests_memory_bytes{pod=\"$pod\", container=~\"$container\"}",
|
||||
"interval": "10s",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Requested: {{ container }}",
|
||||
"metric": "kube_pod_container_resource_requests_memory_bytes",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "kube_pod_container_resource_limits_memory_bytes{pod=\"$pod\", container=~\"$container\"}",
|
||||
"interval": "10s",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Limit: {{ container }}",
|
||||
"metric": "kube_pod_container_resource_limits_memory_bytes",
|
||||
"refId": "C",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "Memory Usage",
|
||||
"tooltip": {
|
||||
"msResolution": true,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 2,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": true,
|
||||
"avg": true,
|
||||
"current": true,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": true,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": true
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (container_name)(rate(container_cpu_usage_seconds_total{image!=\"\",container_name!=\"POD\",pod_name=\"$pod\"}[1m]))",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{ container_name }}",
|
||||
"refId": "A",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "kube_pod_container_resource_requests_cpu_cores{pod=\"$pod\", container=~\"$container\"}",
|
||||
"interval": "10s",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Requested: {{ container }}",
|
||||
"metric": "kube_pod_container_resource_requests_cpu_cores",
|
||||
"refId": "B",
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"expr": "kube_pod_container_resource_limits_cpu_cores{pod=\"$pod\", container=~\"$container\"}",
|
||||
"interval": "10s",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Limit: {{ container }}",
|
||||
"metric": "kube_pod_container_resource_limits_memory_bytes",
|
||||
"refId": "C",
|
||||
"step": 20
|
||||
}
|
||||
],
|
||||
"title": "CPU Usage",
|
||||
"tooltip": {
|
||||
"msResolution": true,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "250px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 3,
|
||||
"isNew": false,
|
||||
"legend": {
|
||||
"alignAsTable": true,
|
||||
"avg": true,
|
||||
"current": true,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": true,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": true
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{pod_name=\"$pod\"}[1m])))",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{ pod_name }}",
|
||||
"refId": "A",
|
||||
"step": 30
|
||||
}
|
||||
],
|
||||
"title": "Network I/O",
|
||||
"tooltip": {
|
||||
"msResolution": true,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "bytes",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "New Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": true,
|
||||
"label": "Namespace",
|
||||
"multi": false,
|
||||
"name": "namespace",
|
||||
"options": [],
|
||||
"query": "label_values(kube_pod_info, namespace)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
},
|
||||
{
|
||||
"allValue": null,
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Pod",
|
||||
"multi": false,
|
||||
"name": "pod",
|
||||
"options": [],
|
||||
"query": "label_values(kube_pod_info{namespace=~\"$namespace\"}, pod)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
},
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": true,
|
||||
"label": "Container",
|
||||
"multi": false,
|
||||
"name": "container",
|
||||
"options": [],
|
||||
"query": "label_values(kube_pod_container_info{namespace=\"$namespace\", pod=\"$pod\"}, container)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Pods",
|
||||
"version": 1
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
statefulset-dashboard.json: |+
|
||||
{
|
||||
"dashboard":
|
||||
{
|
||||
"__inputs": [
|
||||
{
|
||||
"description": "",
|
||||
"label": "prometheus",
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"pluginName": "Prometheus",
|
||||
"type": "datasource"
|
||||
}
|
||||
],
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": false,
|
||||
"graphTooltip": 1,
|
||||
"hideControls": false,
|
||||
"links": [],
|
||||
"rows": [
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "200px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 8,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "cores",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "CPU",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 9,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "GB",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "80%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(container_memory_usage_bytes{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}) / 1024^3",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Memory",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "110%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "Bps",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": false
|
||||
},
|
||||
"id": 7,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfix": "",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 4,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Network",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "100px",
|
||||
"panels": [
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": false
|
||||
},
|
||||
"id": 5,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"metric": "kube_statefulset_replicas",
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Desired Replicas",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 6,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Available Replicas",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 3,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_statefulset_status_observed_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Observed Generation",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
},
|
||||
{
|
||||
"colorBackground": false,
|
||||
"colorValue": false,
|
||||
"colors": [
|
||||
"rgba(245, 54, 54, 0.9)",
|
||||
"rgba(237, 129, 40, 0.89)",
|
||||
"rgba(50, 172, 45, 0.97)"
|
||||
],
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"format": "none",
|
||||
"gauge": {
|
||||
"maxValue": 100,
|
||||
"minValue": 0,
|
||||
"show": false,
|
||||
"thresholdLabels": false,
|
||||
"thresholdMarkers": true
|
||||
},
|
||||
"id": 2,
|
||||
"links": [],
|
||||
"mappingType": 1,
|
||||
"mappingTypes": [
|
||||
{
|
||||
"name": "value to text",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"name": "range to text",
|
||||
"value": 2
|
||||
}
|
||||
],
|
||||
"maxDataPoints": 100,
|
||||
"nullPointMode": "connected",
|
||||
"postfixFontSize": "50%",
|
||||
"prefix": "",
|
||||
"prefixFontSize": "50%",
|
||||
"rangeMaps": [
|
||||
{
|
||||
"from": "null",
|
||||
"text": "N/A",
|
||||
"to": "null"
|
||||
}
|
||||
],
|
||||
"span": 3,
|
||||
"sparkline": {
|
||||
"fillColor": "rgba(31, 118, 189, 0.18)",
|
||||
"full": false,
|
||||
"lineColor": "rgb(31, 120, 193)",
|
||||
"show": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(kube_statefulset_metadata_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"refId": "A",
|
||||
"step": 600
|
||||
}
|
||||
],
|
||||
"title": "Metadata Generation",
|
||||
"type": "singlestat",
|
||||
"valueFontSize": "80%",
|
||||
"valueMaps": [
|
||||
{
|
||||
"op": "=",
|
||||
"text": "N/A",
|
||||
"value": "null"
|
||||
}
|
||||
],
|
||||
"valueName": "avg"
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
},
|
||||
{
|
||||
"collapse": false,
|
||||
"editable": false,
|
||||
"height": "350px",
|
||||
"panels": [
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"editable": false,
|
||||
"error": false,
|
||||
"fill": 1,
|
||||
"grid": {
|
||||
"threshold1Color": "rgba(216, 200, 27, 0.27)",
|
||||
"threshold2Color": "rgba(234, 112, 112, 0.22)"
|
||||
},
|
||||
"id": 1,
|
||||
"isNew": true,
|
||||
"legend": {
|
||||
"alignAsTable": false,
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"hideEmpty": false,
|
||||
"hideZero": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"rightSide": false,
|
||||
"show": true,
|
||||
"total": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 2,
|
||||
"links": [],
|
||||
"nullPointMode": "connected",
|
||||
"percentage": false,
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"span": 12,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "available",
|
||||
"refId": "B",
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "desired",
|
||||
"refId": "E",
|
||||
"step": 30
|
||||
}
|
||||
],
|
||||
"title": "Replicas",
|
||||
"tooltip": {
|
||||
"msResolution": true,
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "cumulative"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"mode": "time",
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "none",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": "",
|
||||
"logBase": 1,
|
||||
"show": false
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"showTitle": false,
|
||||
"title": "Dashboard Row",
|
||||
"titleSize": "h6"
|
||||
}
|
||||
],
|
||||
"schemaVersion": 14,
|
||||
"sharedCrosshair": false,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"allValue": ".*",
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Namespace",
|
||||
"multi": false,
|
||||
"name": "statefulset_namespace",
|
||||
"options": [],
|
||||
"query": "label_values(kube_statefulset_metadata_generation, namespace)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": null,
|
||||
"tags": [],
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
},
|
||||
{
|
||||
"allValue": null,
|
||||
"current": {},
|
||||
"datasource": "${DS_PROMETHEUS}",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "StatefulSet",
|
||||
"multi": false,
|
||||
"name": "statefulset_name",
|
||||
"options": [],
|
||||
"query": "label_values(kube_statefulset_metadata_generation{namespace=\"$statefulset_namespace\"}, statefulset)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tags": [],
|
||||
"tagsQuery": "statefulset",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-6h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"5s",
|
||||
"10s",
|
||||
"30s",
|
||||
"1m",
|
||||
"5m",
|
||||
"15m",
|
||||
"30m",
|
||||
"1h",
|
||||
"2h",
|
||||
"1d"
|
||||
],
|
||||
"time_options": [
|
||||
"5m",
|
||||
"15m",
|
||||
"1h",
|
||||
"6h",
|
||||
"12h",
|
||||
"24h",
|
||||
"2d",
|
||||
"7d",
|
||||
"30d"
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "StatefulSet",
|
||||
"version": 1
|
||||
}
|
||||
,
|
||||
"inputs": [
|
||||
{
|
||||
"name": "DS_PROMETHEUS",
|
||||
"pluginId": "prometheus",
|
||||
"type": "datasource",
|
||||
"value": "prometheus"
|
||||
}
|
||||
],
|
||||
"overwrite": true
|
||||
}
|
||||
prometheus-datasource.json: |+
|
||||
{
|
||||
"access": "proxy",
|
||||
"basicAuth": false,
|
||||
"name": "prometheus",
|
||||
"type": "prometheus",
|
||||
"url": "http://prometheus.monitoring.svc"
|
||||
}
|
||||
---
|
@ -1,4 +1,4 @@
|
||||
apiVersion: apps/v1beta2
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: grafana
|
||||
@ -21,7 +21,7 @@ spec:
|
||||
spec:
|
||||
containers:
|
||||
- name: grafana
|
||||
image: grafana/grafana:4.6.1
|
||||
image: grafana/grafana:4.6.3
|
||||
env:
|
||||
- name: GF_SERVER_HTTP_PORT
|
||||
value: "8080"
|
||||
@ -41,6 +41,22 @@ spec:
|
||||
limits:
|
||||
memory: 200Mi
|
||||
cpu: 200m
|
||||
- name: grafana-watcher
|
||||
image: quay.io/coreos/grafana-watcher:v0.0.8
|
||||
args:
|
||||
- '--watch-dir=/etc/grafana/dashboards'
|
||||
- '--grafana-url=http://localhost:8080'
|
||||
resources:
|
||||
requests:
|
||||
memory: "16Mi"
|
||||
cpu: "50m"
|
||||
limits:
|
||||
memory: "32Mi"
|
||||
cpu: "100m"
|
||||
volumeMounts:
|
||||
- name: dashboards
|
||||
mountPath: /etc/grafana/dashboards
|
||||
volumes:
|
||||
- name: grafana-storage
|
||||
emptyDir: {}
|
||||
- name: dashboards
|
||||
configMap:
|
||||
name: grafana-dashboards
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: heapster
|
||||
@ -19,7 +19,7 @@ spec:
|
||||
spec:
|
||||
containers:
|
||||
- name: heapster
|
||||
image: gcr.io/google_containers/heapster-amd64:v1.4.3
|
||||
image: gcr.io/google_containers/heapster-amd64:v1.5.0
|
||||
command:
|
||||
- /heapster
|
||||
- --source=kubernetes.summary_api:''
|
||||
@ -31,16 +31,18 @@ spec:
|
||||
initialDelaySeconds: 180
|
||||
timeoutSeconds: 5
|
||||
- name: heapster-nanny
|
||||
image: gcr.io/google_containers/addon-resizer:2.0
|
||||
image: gcr.io/google_containers/addon-resizer:1.7
|
||||
command:
|
||||
- /pod_nanny
|
||||
- --cpu=80m
|
||||
- --extra-cpu=0.5m
|
||||
- --memory=140Mi
|
||||
- --extra-memory=4Mi
|
||||
- --threshold=5
|
||||
- --deployment=heapster
|
||||
- --container=heapster
|
||||
- --poll-period=300000
|
||||
- --estimator=exponential
|
||||
env:
|
||||
- name: MY_POD_NAME
|
||||
valueFrom:
|
||||
|
@ -1,10 +1,14 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: default-backend
|
||||
namespace: ingress
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: default-backend
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: nginx-ingress-controller
|
||||
@ -8,6 +8,10 @@ spec:
|
||||
strategy:
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: nginx-ingress-controller
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
@ -19,7 +23,7 @@ spec:
|
||||
hostNetwork: true
|
||||
containers:
|
||||
- name: nginx-ingress-controller
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.10.2
|
||||
args:
|
||||
- /nginx-ingress-controller
|
||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
roleRef:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: nginx-ingress-controller
|
||||
@ -8,6 +8,10 @@ spec:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: nginx-ingress-controller
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
@ -19,7 +23,7 @@ spec:
|
||||
hostNetwork: true
|
||||
containers:
|
||||
- name: nginx-ingress-controller
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.10.2
|
||||
args:
|
||||
- /nginx-ingress-controller
|
||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
||||
|
@ -1,10 +1,14 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: default-backend
|
||||
namespace: ingress
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: default-backend
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
roleRef:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -1,10 +1,14 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: default-backend
|
||||
namespace: ingress
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: default-backend
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: nginx-ingress-controller
|
||||
@ -8,6 +8,10 @@ spec:
|
||||
strategy:
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: nginx-ingess-controller
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
@ -19,7 +23,7 @@ spec:
|
||||
hostNetwork: true
|
||||
containers:
|
||||
- name: nginx-ingress-controller
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
|
||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.10.2
|
||||
args:
|
||||
- /nginx-ingress-controller
|
||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
roleRef:
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -1,5 +1,5 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: ingress
|
||||
namespace: ingress
|
||||
|
@ -39,7 +39,7 @@ data:
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
# Using endpoints to discover kube-apiserver targets finds the pod IP
|
||||
# (host IP since apiserver is uses host network) which is not used in
|
||||
# (host IP since apiserver uses host network) which is not used in
|
||||
# the server certificate.
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
@ -51,6 +51,9 @@ data:
|
||||
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- replacement: apiserver
|
||||
action: replace
|
||||
target_label: job
|
||||
|
||||
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
|
||||
# metrics from a node by scraping kubelet (127.0.0.1:10255/metrics).
|
||||
@ -59,7 +62,7 @@ data:
|
||||
# Kubernetes apiserver. This means it will work if Prometheus is running out of
|
||||
# cluster, or can't connect to nodes for some other reason (e.g. because of
|
||||
# firewalling).
|
||||
- job_name: 'kubernetes-nodes'
|
||||
- job_name: 'kubelet'
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
|
||||
@ -149,7 +152,7 @@ data:
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
target_label: job
|
||||
|
||||
# Example scrape config for probing services via the Blackbox Exporter.
|
||||
#
|
||||
@ -181,7 +184,7 @@ data:
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
target_label: kubernetes_name
|
||||
target_label: job
|
||||
|
||||
# Example scrape config for pods
|
||||
#
|
||||
|
@ -1,22 +1,24 @@
|
||||
apiVersion: extensions/v1beta1
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: prometheus
|
||||
namespace: monitoring
|
||||
spec:
|
||||
replicas: 1
|
||||
strategy:
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
name: prometheus
|
||||
phase: prod
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
name: prometheus
|
||||
phase: prod
|
||||
spec:
|
||||
serviceAccountName: prometheus
|
||||
containers:
|
||||
- name: prometheus
|
||||
image: quay.io/prometheus/prometheus:v2.0.0
|
||||
image: quay.io/prometheus/prometheus:v2.1.0
|
||||
args:
|
||||
- '--config.file=/etc/prometheus/prometheus.yaml'
|
||||
ports:
|
||||
|
@ -12,7 +12,9 @@ rules:
|
||||
- replicationcontrollers
|
||||
- limitranges
|
||||
- persistentvolumeclaims
|
||||
- persistentvolumes
|
||||
- namespaces
|
||||
- endpoints
|
||||
verbs: ["list", "watch"]
|
||||
- apiGroups: ["extensions"]
|
||||
resources:
|
||||
@ -29,3 +31,7 @@ rules:
|
||||
- cronjobs
|
||||
- jobs
|
||||
verbs: ["list", "watch"]
|
||||
- apiGroups: ["autoscaling"]
|
||||
resources:
|
||||
- horizontalpodautoscalers
|
||||
verbs: ["list", "watch"]
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: apps/v1beta2
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: kube-state-metrics
|
||||
@ -22,7 +22,7 @@ spec:
|
||||
serviceAccountName: kube-state-metrics
|
||||
containers:
|
||||
- name: kube-state-metrics
|
||||
image: quay.io/coreos/kube-state-metrics:v1.1.0
|
||||
image: quay.io/coreos/kube-state-metrics:v1.2.0
|
||||
ports:
|
||||
- name: metrics
|
||||
containerPort: 8080
|
||||
@ -54,8 +54,8 @@ spec:
|
||||
- /pod_nanny
|
||||
- --container=kube-state-metrics
|
||||
- --cpu=100m
|
||||
- --extra-cpu=1m
|
||||
- --memory=100Mi
|
||||
- --extra-memory=2Mi
|
||||
- --extra-cpu=2m
|
||||
- --memory=150Mi
|
||||
- --extra-memory=30Mi
|
||||
- --threshold=5
|
||||
- --deployment=kube-state-metrics
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: apps/v1beta2
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: node-exporter
|
||||
@ -18,11 +18,15 @@ spec:
|
||||
name: node-exporter
|
||||
phase: prod
|
||||
spec:
|
||||
serviceAccountName: node-exporter
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 65534
|
||||
hostNetwork: true
|
||||
hostPID: true
|
||||
containers:
|
||||
- name: node-exporter
|
||||
image: quay.io/prometheus/node-exporter:v0.15.0
|
||||
image: quay.io/prometheus/node-exporter:v0.15.2
|
||||
args:
|
||||
- "--path.procfs=/host/proc"
|
||||
- "--path.sysfs=/host/sys"
|
||||
@ -45,9 +49,8 @@ spec:
|
||||
mountPath: /host/sys
|
||||
readOnly: true
|
||||
tolerations:
|
||||
- key: node-role.kubernetes.io/master
|
||||
- effect: NoSchedule
|
||||
operator: Exists
|
||||
effect: NoSchedule
|
||||
volumes:
|
||||
- name: proc
|
||||
hostPath:
|
||||
|
@ -0,0 +1,5 @@
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: node-exporter
|
||||
namespace: monitoring
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: prometheus
|
||||
@ -8,5 +8,5 @@ roleRef:
|
||||
name: prometheus
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: default
|
||||
name: prometheus
|
||||
namespace: monitoring
|
||||
|
@ -1,4 +1,4 @@
|
||||
apiVersion: rbac.authorization.k8s.io/v1beta1
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: prometheus
|
||||
|
@ -4,10 +4,9 @@ metadata:
|
||||
name: prometheus-rules
|
||||
namespace: monitoring
|
||||
data:
|
||||
# Rules adapted from those provided by coreos/prometheus-operator and SoundCloud
|
||||
alertmanager.rules.yaml: |+
|
||||
alertmanager.rules.yaml: |
|
||||
groups:
|
||||
- name: ./alertmanager.rules
|
||||
- name: alertmanager.rules
|
||||
rules:
|
||||
- alert: AlertmanagerConfigInconsistent
|
||||
expr: count_values("config_hash", alertmanager_config_hash) BY (service) / ON(service)
|
||||
@ -19,7 +18,6 @@ data:
|
||||
annotations:
|
||||
description: The configuration of the instances of the Alertmanager cluster
|
||||
`{{$labels.service}}` are out of sync.
|
||||
summary: Alertmanager configurations are inconsistent
|
||||
- alert: AlertmanagerDownOrMissing
|
||||
expr: label_replace(prometheus_operator_alertmanager_spec_replicas, "job", "alertmanager-$1",
|
||||
"alertmanager", "(.*)") / ON(job) GROUP_RIGHT() sum(up) BY (job) != 1
|
||||
@ -29,8 +27,7 @@ data:
|
||||
annotations:
|
||||
description: An unexpected number of Alertmanagers are scraped or Alertmanagers
|
||||
disappeared from discovery.
|
||||
summary: Alertmanager down or not discovered
|
||||
- alert: FailedReload
|
||||
- alert: AlertmanagerFailedReload
|
||||
expr: alertmanager_config_last_reload_successful == 0
|
||||
for: 10m
|
||||
labels:
|
||||
@ -38,8 +35,7 @@ data:
|
||||
annotations:
|
||||
description: Reloading Alertmanager's configuration has failed for {{ $labels.namespace
|
||||
}}/{{ $labels.pod}}.
|
||||
summary: Alertmanager configuration reload has failed
|
||||
etcd3.rules.yaml: |+
|
||||
etcd3.rules.yaml: |
|
||||
groups:
|
||||
- name: ./etcd3.rules
|
||||
rules:
|
||||
@ -68,8 +64,8 @@ data:
|
||||
changes within the last hour
|
||||
summary: a high number of leader changes within the etcd cluster are happening
|
||||
- alert: HighNumberOfFailedGRPCRequests
|
||||
expr: sum(rate(etcd_grpc_requests_failed_total{job="etcd"}[5m])) BY (grpc_method)
|
||||
/ sum(rate(etcd_grpc_total{job="etcd"}[5m])) BY (grpc_method) > 0.01
|
||||
expr: sum(rate(grpc_server_handled_total{grpc_code!="OK",job="etcd"}[5m])) BY (grpc_service, grpc_method)
|
||||
/ sum(rate(grpc_server_handled_total{job="etcd"}[5m])) BY (grpc_service, grpc_method) > 0.01
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
@ -78,8 +74,8 @@ data:
|
||||
on etcd instance {{ $labels.instance }}'
|
||||
summary: a high number of gRPC requests are failing
|
||||
- alert: HighNumberOfFailedGRPCRequests
|
||||
expr: sum(rate(etcd_grpc_requests_failed_total{job="etcd"}[5m])) BY (grpc_method)
|
||||
/ sum(rate(etcd_grpc_total{job="etcd"}[5m])) BY (grpc_method) > 0.05
|
||||
expr: sum(rate(grpc_server_handled_total{grpc_code!="OK",job="etcd"}[5m])) BY (grpc_service, grpc_method)
|
||||
/ sum(rate(grpc_server_handled_total{job="etcd"}[5m])) BY (grpc_service, grpc_method) > 0.05
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
@ -88,7 +84,7 @@ data:
|
||||
on etcd instance {{ $labels.instance }}'
|
||||
summary: a high number of gRPC requests are failing
|
||||
- alert: GRPCRequestsSlow
|
||||
expr: histogram_quantile(0.99, rate(etcd_grpc_unary_requests_duration_seconds_bucket[5m]))
|
||||
expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job="etcd",grpc_type="unary"}[5m])) by (grpc_service, grpc_method, le))
|
||||
> 0.15
|
||||
for: 10m
|
||||
labels:
|
||||
@ -128,7 +124,7 @@ data:
|
||||
}} are slow
|
||||
summary: slow HTTP requests
|
||||
- alert: EtcdMemberCommunicationSlow
|
||||
expr: histogram_quantile(0.99, rate(etcd_network_member_round_trip_time_seconds_bucket[5m]))
|
||||
expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[5m]))
|
||||
> 0.15
|
||||
for: 10m
|
||||
labels:
|
||||
@ -163,9 +159,9 @@ data:
|
||||
annotations:
|
||||
description: etcd instance {{ $labels.instance }} commit durations are high
|
||||
summary: high commit durations
|
||||
general.rules.yaml: |+
|
||||
general.rules.yaml: |
|
||||
groups:
|
||||
- name: ./general.rules
|
||||
- name: general.rules
|
||||
rules:
|
||||
- alert: TargetDown
|
||||
expr: 100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10
|
||||
@ -173,66 +169,34 @@ data:
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{ $value }}% or more of {{ $labels.job }} targets are down.'
|
||||
description: '{{ $value }}% of {{ $labels.job }} targets are down.'
|
||||
summary: Targets are down
|
||||
- alert: TooManyOpenFileDescriptors
|
||||
expr: 100 * (process_open_fds / process_max_fds) > 95
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
|
||||
$labels.instance }}) is using {{ $value }}% of the available file/socket descriptors.'
|
||||
summary: too many open file descriptors
|
||||
- record: instance:fd_utilization
|
||||
- record: fd_utilization
|
||||
expr: process_open_fds / process_max_fds
|
||||
- alert: FdExhaustionClose
|
||||
expr: predict_linear(instance:fd_utilization[1h], 3600 * 4) > 1
|
||||
expr: predict_linear(fd_utilization[1h], 3600 * 4) > 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
|
||||
$labels.instance }}) instance will exhaust in file/socket descriptors soon'
|
||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
|
||||
will exhaust in file/socket descriptors within the next 4 hours'
|
||||
summary: file descriptors soon exhausted
|
||||
- alert: FdExhaustionClose
|
||||
expr: predict_linear(instance:fd_utilization[10m], 3600) > 1
|
||||
expr: predict_linear(fd_utilization[10m], 3600) > 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
|
||||
$labels.instance }}) instance will exhaust in file/socket descriptors soon'
|
||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
|
||||
will exhaust in file/socket descriptors within the next hour'
|
||||
summary: file descriptors soon exhausted
|
||||
kube-apiserver.rules.yaml: |+
|
||||
kube-controller-manager.rules.yaml: |
|
||||
groups:
|
||||
- name: ./kube-apiserver.rules
|
||||
rules:
|
||||
- alert: K8SApiserverDown
|
||||
expr: absent(up{job="kubernetes-apiservers"} == 1)
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: Prometheus failed to scrape API server(s), or all API servers have
|
||||
disappeared from service discovery.
|
||||
summary: API server unreachable
|
||||
- alert: K8SApiServerLatency
|
||||
expr: histogram_quantile(0.99, sum(apiserver_request_latencies_bucket{subresource!="log",verb!~"^(?:CONNECT|WATCHLIST|WATCH|PROXY)$"})
|
||||
WITHOUT (instance, resource)) / 1e+06 > 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: 99th percentile Latency for {{ $labels.verb }} requests to the
|
||||
kube-apiserver is higher than 1s.
|
||||
summary: Kubernetes apiserver latency is high
|
||||
kube-controller-manager.rules.yaml: |+
|
||||
groups:
|
||||
- name: ./kube-controller-manager.rules
|
||||
- name: kube-controller-manager.rules
|
||||
rules:
|
||||
- alert: K8SControllerManagerDown
|
||||
expr: absent(up{kubernetes_name="kube-controller-manager"} == 1)
|
||||
expr: absent(up{job="kube-controller-manager"} == 1)
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
@ -240,12 +204,57 @@ data:
|
||||
description: There is no running K8S controller manager. Deployments and replication
|
||||
controllers are not making progress.
|
||||
summary: Controller manager is down
|
||||
kube-scheduler.rules.yaml: |+
|
||||
kube-scheduler.rules.yaml: |
|
||||
groups:
|
||||
- name: ./kube-scheduler.rules
|
||||
- name: kube-scheduler.rules
|
||||
rules:
|
||||
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- record: cluster:scheduler_binding_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_binding_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_binding_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- alert: K8SSchedulerDown
|
||||
expr: absent(up{kubernetes_name="kube-scheduler"} == 1)
|
||||
expr: absent(up{job="kube-scheduler"} == 1)
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
@ -253,9 +262,69 @@ data:
|
||||
description: There is no running K8S scheduler. New pods are not being assigned
|
||||
to nodes.
|
||||
summary: Scheduler is down
|
||||
kubelet.rules.yaml: |+
|
||||
kube-state-metrics.rules.yaml: |
|
||||
groups:
|
||||
- name: ./kubelet.rules
|
||||
- name: kube-state-metrics.rules
|
||||
rules:
|
||||
- alert: DeploymentGenerationMismatch
|
||||
expr: kube_deployment_status_observed_generation != kube_deployment_metadata_generation
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Observed deployment generation does not match expected one for
|
||||
deployment {{$labels.namespaces}}/{{$labels.deployment}}
|
||||
summary: Deployment is outdated
|
||||
- alert: DeploymentReplicasNotUpdated
|
||||
expr: ((kube_deployment_status_replicas_updated != kube_deployment_spec_replicas)
|
||||
or (kube_deployment_status_replicas_available != kube_deployment_spec_replicas))
|
||||
unless (kube_deployment_spec_paused == 1)
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Replicas are not updated and available for deployment {{$labels.namespaces}}/{{$labels.deployment}}
|
||||
summary: Deployment replicas are outdated
|
||||
- alert: DaemonSetRolloutStuck
|
||||
expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled
|
||||
* 100 < 100
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Only {{$value}}% of desired pods scheduled and ready for daemon
|
||||
set {{$labels.namespaces}}/{{$labels.daemonset}}
|
||||
summary: DaemonSet is missing pods
|
||||
- alert: K8SDaemonSetsNotScheduled
|
||||
expr: kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled
|
||||
> 0
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: A number of daemonsets are not scheduled.
|
||||
summary: Daemonsets are not scheduled correctly
|
||||
- alert: DaemonSetsMissScheduled
|
||||
expr: kube_daemonset_status_number_misscheduled > 0
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: A number of daemonsets are running where they are not supposed
|
||||
to run.
|
||||
summary: Daemonsets are not scheduled correctly
|
||||
- alert: PodFrequentlyRestarting
|
||||
expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Pod {{$labels.namespaces}}/{{$labels.pod}} is was restarted {{$value}}
|
||||
times within the last hour
|
||||
summary: Pod is restarting frequently
|
||||
kubelet.rules.yaml: |
|
||||
groups:
|
||||
- name: kubelet.rules
|
||||
rules:
|
||||
- alert: K8SNodeNotReady
|
||||
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
|
||||
@ -274,20 +343,17 @@ data:
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: '{{ $value }} Kubernetes nodes (more than 10% are in the NotReady
|
||||
state).'
|
||||
summary: Many Kubernetes nodes are Not Ready
|
||||
description: '{{ $value }}% of Kubernetes nodes are not ready'
|
||||
- alert: K8SKubeletDown
|
||||
expr: count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"}) > 0.03
|
||||
expr: count(up{job="kubelet"} == 0) / count(up{job="kubelet"}) * 100 > 3
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Prometheus failed to scrape {{ $value }}% of kubelets.
|
||||
summary: Many Kubelets cannot be scraped
|
||||
- alert: K8SKubeletDown
|
||||
expr: absent(up{job="kubernetes-nodes"} == 1) or count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"})
|
||||
> 0.1
|
||||
expr: (absent(up{job="kubelet"} == 1) or count(up{job="kubelet"} == 0) / count(up{job="kubelet"}))
|
||||
* 100 > 1
|
||||
for: 1h
|
||||
labels:
|
||||
severity: critical
|
||||
@ -297,176 +363,228 @@ data:
|
||||
summary: Many Kubelets cannot be scraped
|
||||
- alert: K8SKubeletTooManyPods
|
||||
expr: kubelet_running_pod_count > 100
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Kubelet {{$labels.instance}} is running {{$value}} pods, close
|
||||
to the limit of 110
|
||||
summary: Kubelet is close to pod limit
|
||||
kubernetes.rules.yaml: |+
|
||||
kubernetes.rules.yaml: |
|
||||
groups:
|
||||
- name: ./kubernetes.rules
|
||||
- name: kubernetes.rules
|
||||
rules:
|
||||
- record: cluster_namespace_controller_pod_container:spec_memory_limit_bytes
|
||||
expr: sum(label_replace(container_spec_memory_limit_bytes{container_name!=""},
|
||||
"controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
|
||||
controller, pod_name, container_name)
|
||||
- record: cluster_namespace_controller_pod_container:spec_cpu_shares
|
||||
expr: sum(label_replace(container_spec_cpu_shares{container_name!=""}, "controller",
|
||||
"$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
|
||||
container_name)
|
||||
- record: cluster_namespace_controller_pod_container:cpu_usage:rate
|
||||
expr: sum(label_replace(irate(container_cpu_usage_seconds_total{container_name!=""}[5m]),
|
||||
"controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
|
||||
controller, pod_name, container_name)
|
||||
- record: cluster_namespace_controller_pod_container:memory_usage:bytes
|
||||
expr: sum(label_replace(container_memory_usage_bytes{container_name!=""}, "controller",
|
||||
"$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
|
||||
container_name)
|
||||
- record: cluster_namespace_controller_pod_container:memory_working_set:bytes
|
||||
expr: sum(label_replace(container_memory_working_set_bytes{container_name!=""},
|
||||
"controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
|
||||
controller, pod_name, container_name)
|
||||
- record: cluster_namespace_controller_pod_container:memory_rss:bytes
|
||||
expr: sum(label_replace(container_memory_rss{container_name!=""}, "controller",
|
||||
"$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
|
||||
container_name)
|
||||
- record: cluster_namespace_controller_pod_container:memory_cache:bytes
|
||||
expr: sum(label_replace(container_memory_cache{container_name!=""}, "controller",
|
||||
"$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
|
||||
container_name)
|
||||
- record: cluster_namespace_controller_pod_container:disk_usage:bytes
|
||||
expr: sum(label_replace(container_disk_usage_bytes{container_name!=""}, "controller",
|
||||
"$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
|
||||
container_name)
|
||||
- record: cluster_namespace_controller_pod_container:memory_pagefaults:rate
|
||||
expr: sum(label_replace(irate(container_memory_failures_total{container_name!=""}[5m]),
|
||||
"controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
|
||||
controller, pod_name, container_name, scope, type)
|
||||
- record: cluster_namespace_controller_pod_container:memory_oom:rate
|
||||
expr: sum(label_replace(irate(container_memory_failcnt{container_name!=""}[5m]),
|
||||
"controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
|
||||
controller, pod_name, container_name, scope, type)
|
||||
- record: cluster:memory_allocation:percent
|
||||
expr: 100 * sum(container_spec_memory_limit_bytes{pod_name!=""}) BY (cluster)
|
||||
/ sum(machine_memory_bytes) BY (cluster)
|
||||
- record: cluster:memory_used:percent
|
||||
expr: 100 * sum(container_memory_usage_bytes{pod_name!=""}) BY (cluster) / sum(machine_memory_bytes)
|
||||
BY (cluster)
|
||||
- record: cluster:cpu_allocation:percent
|
||||
expr: 100 * sum(container_spec_cpu_shares{pod_name!=""}) BY (cluster) / sum(container_spec_cpu_shares{id="/"}
|
||||
* ON(cluster, instance) machine_cpu_cores) BY (cluster)
|
||||
- record: cluster:node_cpu_use:percent
|
||||
expr: 100 * sum(rate(node_cpu{mode!="idle"}[5m])) BY (cluster) / sum(machine_cpu_cores)
|
||||
BY (cluster)
|
||||
- record: cluster_resource_verb:apiserver_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.99, sum(apiserver_request_latencies_bucket) BY (le,
|
||||
cluster, job, resource, verb)) / 1e+06
|
||||
- record: pod_name:container_memory_usage_bytes:sum
|
||||
expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
|
||||
(pod_name)
|
||||
- record: pod_name:container_spec_cpu_shares:sum
|
||||
expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) BY (pod_name)
|
||||
- record: pod_name:container_cpu_usage:sum
|
||||
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
|
||||
BY (pod_name)
|
||||
- record: pod_name:container_fs_usage_bytes:sum
|
||||
expr: sum(container_fs_usage_bytes{container_name!="POD",pod_name!=""}) BY (pod_name)
|
||||
- record: namespace:container_memory_usage_bytes:sum
|
||||
expr: sum(container_memory_usage_bytes{container_name!=""}) BY (namespace)
|
||||
- record: namespace:container_spec_cpu_shares:sum
|
||||
expr: sum(container_spec_cpu_shares{container_name!=""}) BY (namespace)
|
||||
- record: namespace:container_cpu_usage:sum
|
||||
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m]))
|
||||
BY (namespace)
|
||||
- record: cluster:memory_usage:ratio
|
||||
expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
|
||||
(cluster) / sum(machine_memory_bytes) BY (cluster)
|
||||
- record: cluster:container_spec_cpu_shares:ratio
|
||||
expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) / 1000
|
||||
/ sum(machine_cpu_cores)
|
||||
- record: cluster:container_cpu_usage:ratio
|
||||
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
|
||||
/ sum(machine_cpu_cores)
|
||||
- record: apiserver_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.99, rate(apiserver_request_latencies_bucket[5m])) /
|
||||
1e+06
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster_resource_verb:apiserver_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.9, sum(apiserver_request_latencies_bucket) BY (le,
|
||||
cluster, job, resource, verb)) / 1e+06
|
||||
- record: apiserver_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.9, rate(apiserver_request_latencies_bucket[5m])) /
|
||||
1e+06
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster_resource_verb:apiserver_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.5, sum(apiserver_request_latencies_bucket) BY (le,
|
||||
cluster, job, resource, verb)) / 1e+06
|
||||
- record: apiserver_latency_seconds:quantile
|
||||
expr: histogram_quantile(0.5, rate(apiserver_request_latencies_bucket[5m])) /
|
||||
1e+06
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
- alert: APIServerLatencyHigh
|
||||
expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
|
||||
> 1
|
||||
for: 10m
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: warning
|
||||
annotations:
|
||||
description: the API server has a 99th percentile latency of {{ $value }} seconds
|
||||
for {{$labels.verb}} {{$labels.resource}}
|
||||
- alert: APIServerLatencyHigh
|
||||
expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
|
||||
> 4
|
||||
for: 10m
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: critical
|
||||
annotations:
|
||||
description: the API server has a 99th percentile latency of {{ $value }} seconds
|
||||
for {{$labels.verb}} {{$labels.resource}}
|
||||
- alert: APIServerErrorsHigh
|
||||
expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
|
||||
* 100 > 2
|
||||
for: 10m
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: warning
|
||||
annotations:
|
||||
description: API server returns errors for {{ $value }}% of requests
|
||||
- alert: APIServerErrorsHigh
|
||||
expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
|
||||
* 100 > 5
|
||||
for: 10m
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: critical
|
||||
annotations:
|
||||
description: API server returns errors for {{ $value }}% of requests
|
||||
- alert: K8SApiserverDown
|
||||
expr: absent(up{job="apiserver"} == 1)
|
||||
for: 20m
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: critical
|
||||
annotations:
|
||||
description: No API servers are reachable or all have disappeared from service
|
||||
discovery
|
||||
|
||||
- alert: K8sCertificateExpirationNotice
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
- record: cluster:scheduler_binding_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Kubernetes API Certificate is expiring soon (less than 7 days)
|
||||
expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="604800"}) > 0
|
||||
|
||||
- alert: K8sCertificateExpirationNotice
|
||||
labels:
|
||||
quantile: "0.99"
|
||||
- record: cluster:scheduler_binding_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.9"
|
||||
- record: cluster:scheduler_binding_latency:quantile_seconds
|
||||
expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket)
|
||||
BY (le, cluster)) / 1e+06
|
||||
labels:
|
||||
quantile: "0.5"
|
||||
node.rules.yaml: |+
|
||||
severity: critical
|
||||
annotations:
|
||||
description: Kubernetes API Certificate is expiring in less than 1 day
|
||||
expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="86400"}) > 0
|
||||
node.rules.yaml: |
|
||||
groups:
|
||||
- name: ./node.rules
|
||||
- name: node.rules
|
||||
rules:
|
||||
- record: instance:node_cpu:rate:sum
|
||||
expr: sum(rate(node_cpu{mode!="idle",mode!="iowait",mode!~"^(?:guest.*)$"}[3m]))
|
||||
BY (instance)
|
||||
- record: instance:node_filesystem_usage:sum
|
||||
expr: sum((node_filesystem_size{mountpoint="/"} - node_filesystem_free{mountpoint="/"}))
|
||||
BY (instance)
|
||||
- record: instance:node_network_receive_bytes:rate:sum
|
||||
expr: sum(rate(node_network_receive_bytes[3m])) BY (instance)
|
||||
- record: instance:node_network_transmit_bytes:rate:sum
|
||||
expr: sum(rate(node_network_transmit_bytes[3m])) BY (instance)
|
||||
- record: instance:node_cpu:ratio
|
||||
expr: sum(rate(node_cpu{mode!="idle"}[5m])) WITHOUT (cpu, mode) / ON(instance)
|
||||
GROUP_LEFT() count(sum(node_cpu) BY (instance, cpu)) BY (instance)
|
||||
- record: cluster:node_cpu:sum_rate5m
|
||||
expr: sum(rate(node_cpu{mode!="idle"}[5m]))
|
||||
- record: cluster:node_cpu:ratio
|
||||
expr: cluster:node_cpu:rate5m / count(sum(node_cpu) BY (instance, cpu))
|
||||
- alert: NodeExporterDown
|
||||
expr: absent(up{kubernetes_name="node-exporter"} == 1)
|
||||
expr: absent(up{job="node-exporter"} == 1)
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Prometheus could not scrape a node-exporter for more than 10m,
|
||||
or node-exporters have disappeared from discovery.
|
||||
summary: node-exporter cannot be scraped
|
||||
- alert: K8SNodeOutOfDisk
|
||||
expr: kube_node_status_condition{condition="OutOfDisk",status="true"} == 1
|
||||
or node-exporters have disappeared from discovery
|
||||
- alert: NodeDiskRunningFull
|
||||
expr: predict_linear(node_filesystem_free[6h], 3600 * 24) < 0
|
||||
for: 30m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: device {{$labels.device}} on node {{$labels.instance}} is running
|
||||
full within the next 24 hours (mounted at {{$labels.mountpoint}})
|
||||
- alert: NodeDiskRunningFull
|
||||
expr: predict_linear(node_filesystem_free[30m], 3600 * 2) < 0
|
||||
for: 10m
|
||||
labels:
|
||||
service: k8s
|
||||
severity: critical
|
||||
annotations:
|
||||
description: '{{ $labels.node }} has run out of disk space.'
|
||||
summary: Node ran out of disk space.
|
||||
- alert: K8SNodeMemoryPressure
|
||||
expr: kube_node_status_condition{condition="MemoryPressure",status="true"} ==
|
||||
1
|
||||
labels:
|
||||
service: k8s
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{ $labels.node }} is under memory pressure.'
|
||||
summary: Node is under memory pressure.
|
||||
- alert: K8SNodeDiskPressure
|
||||
expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
|
||||
labels:
|
||||
service: k8s
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{ $labels.node }} is under disk pressure.'
|
||||
summary: Node is under disk pressure.
|
||||
prometheus.rules.yaml: |+
|
||||
description: device {{$labels.device}} on node {{$labels.instance}} is running
|
||||
full within the next 2 hours (mounted at {{$labels.mountpoint}})
|
||||
prometheus.rules.yaml: |
|
||||
groups:
|
||||
- name: ./prometheus.rules
|
||||
- name: prometheus.rules
|
||||
rules:
|
||||
- alert: FailedReload
|
||||
- alert: PrometheusConfigReloadFailed
|
||||
expr: prometheus_config_last_reload_successful == 0
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Reloading Prometheus' configuration has failed for {{ $labels.namespace
|
||||
}}/{{ $labels.pod}}.
|
||||
summary: Prometheus configuration reload has failed
|
||||
description: Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}}
|
||||
- alert: PrometheusNotificationQueueRunningFull
|
||||
expr: predict_linear(prometheus_notifications_queue_length[5m], 60 * 30) > prometheus_notifications_queue_capacity
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Prometheus' alert notification queue is running full for {{$labels.namespace}}/{{
|
||||
$labels.pod}}
|
||||
- alert: PrometheusErrorSendingAlerts
|
||||
expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
|
||||
> 0.01
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
|
||||
$labels.pod}} to Alertmanager {{$labels.Alertmanager}}
|
||||
- alert: PrometheusErrorSendingAlerts
|
||||
expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
|
||||
> 0.03
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
|
||||
$labels.pod}} to Alertmanager {{$labels.Alertmanager}}
|
||||
- alert: PrometheusNotConnectedToAlertmanagers
|
||||
expr: prometheus_notifications_alertmanagers_discovered < 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected
|
||||
to any Alertmanagers
|
||||
- alert: PrometheusTSDBReloadsFailing
|
||||
expr: increase(prometheus_tsdb_reloads_failures_total[2h]) > 0
|
||||
for: 12h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}
|
||||
reload failures over the last four hours.'
|
||||
summary: Prometheus has issues reloading data blocks from disk
|
||||
- alert: PrometheusTSDBCompactionsFailing
|
||||
expr: increase(prometheus_tsdb_compactions_failed_total[2h]) > 0
|
||||
for: 12h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}
|
||||
compaction failures over the last four hours.'
|
||||
summary: Prometheus has issues compacting sample blocks
|
||||
- alert: PrometheusTSDBWALCorruptions
|
||||
expr: tsdb_wal_corruptions_total > 0
|
||||
for: 4h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
description: '{{$labels.job}} at {{$labels.instance}} has a corrupted write-ahead
|
||||
log (WAL).'
|
||||
summary: Prometheus write-ahead log is corrupted
|
||||
|
5
addons/prometheus/service-account.yaml
Normal file
@ -0,0 +1,5 @@
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: prometheus
|
||||
namespace: monitoring
|
@ -1,4 +1,4 @@
|
||||
# Typhoon
|
||||
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
|
||||
|
||||
Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
|
||||
|
||||
## Features
|
||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||
|
||||
* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Kubernetes v1.9.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
|
@ -1,13 +1,14 @@
|
||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||
module "bootkube" {
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=203b90169ead2380f74cc64ea1f02c109806c9bc"
|
||||
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||
etcd_servers = ["${aws_route53_record.etcds.*.fqdn}"]
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = "${var.network_mtu}"
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||
etcd_servers = ["${aws_route53_record.etcds.*.fqdn}"]
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = "${var.network_mtu}"
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
|
@ -7,7 +7,7 @@ systemd:
|
||||
- name: 40-etcd-cluster.conf
|
||||
contents: |
|
||||
[Service]
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.0"
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.15"
|
||||
Environment="ETCD_NAME=${etcd_name}"
|
||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
||||
@ -41,11 +41,12 @@ systemd:
|
||||
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
|
||||
[Install]
|
||||
RequiredBy=kubelet.service
|
||||
RequiredBy=etcd-member.service
|
||||
- name: kubelet.service
|
||||
enable: true
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -72,7 +73,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--kubeconfig=/etc/kubernetes/kubeconfig \
|
||||
@ -128,7 +129,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.3
|
||||
- path: /etc/sysctl.d/max-user-watches.conf
|
||||
filesystem: root
|
||||
contents:
|
||||
@ -147,11 +148,9 @@ storage:
|
||||
# Wrapper for bootkube start
|
||||
set -e
|
||||
# Move experimental manifests
|
||||
[ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
[ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
|
||||
[ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
|
||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.10.0}"
|
||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
||||
exec /usr/bin/rkt run \
|
||||
--trust-keys-from-https \
|
||||
|
@ -22,7 +22,7 @@ systemd:
|
||||
enable: true
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -49,7 +49,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--kubeconfig=/etc/kubernetes/kubeconfig \
|
||||
@ -103,7 +103,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.3
|
||||
- path: /etc/sysctl.d/max-user-watches.conf
|
||||
filesystem: root
|
||||
contents:
|
||||
@ -121,7 +121,7 @@ storage:
|
||||
--volume config,kind=host,source=/etc/kubernetes \
|
||||
--mount volume=config,target=/etc/kubernetes \
|
||||
--insecure-options=image \
|
||||
docker://gcr.io/google_containers/hyperkube:v1.8.3 \
|
||||
docker://gcr.io/google_containers/hyperkube:v1.9.3 \
|
||||
--net=host \
|
||||
--dns=host \
|
||||
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
|
||||
|
@ -54,6 +54,7 @@ data "template_file" "controller_config" {
|
||||
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
kubeconfig_ca_cert = "${module.bootkube.ca_cert}"
|
||||
kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
|
||||
kubeconfig_kubelet_key = "${module.bootkube.kubelet_key}"
|
||||
|
@ -94,3 +94,9 @@ EOD
|
||||
type = "string"
|
||||
default = "10.3.0.0/16"
|
||||
}
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
@ -59,6 +59,7 @@ data "template_file" "worker_config" {
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
k8s_etcd_service_ip = "${cidrhost(var.service_cidr, 15)}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
kubeconfig_ca_cert = "${module.bootkube.ca_cert}"
|
||||
kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
|
||||
kubeconfig_kubelet_key = "${module.bootkube.kubelet_key}"
|
||||
@ -145,11 +146,11 @@ resource "aws_security_group_rule" "worker-flannel-self" {
|
||||
resource "aws_security_group_rule" "worker-node-exporter" {
|
||||
security_group_id = "${aws_security_group.worker.id}"
|
||||
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 9100
|
||||
to_port = 9100
|
||||
self = true
|
||||
type = "ingress"
|
||||
protocol = "tcp"
|
||||
from_port = 9100
|
||||
to_port = 9100
|
||||
self = true
|
||||
}
|
||||
|
||||
resource "aws_security_group_rule" "worker-kubelet" {
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Typhoon
|
||||
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
|
||||
|
||||
Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
|
||||
|
||||
## Features
|
||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||
|
||||
* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Kubernetes v1.9.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
|
@ -1,13 +1,14 @@
|
||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||
module "bootkube" {
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=203b90169ead2380f74cc64ea1f02c109806c9bc"
|
||||
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${var.k8s_domain_name}"]
|
||||
etcd_servers = ["${var.controller_domains}"]
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = "${var.network_mtu}"
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${var.k8s_domain_name}"]
|
||||
etcd_servers = ["${var.controller_domains}"]
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = "${var.network_mtu}"
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
|
@ -7,7 +7,7 @@ systemd:
|
||||
- name: 40-etcd-cluster.conf
|
||||
contents: |
|
||||
[Service]
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.0"
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.15"
|
||||
Environment="ETCD_NAME=${etcd_name}"
|
||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
|
||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
|
||||
@ -50,10 +50,11 @@ systemd:
|
||||
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
|
||||
[Install]
|
||||
RequiredBy=kubelet.service
|
||||
RequiredBy=etcd-member.service
|
||||
- name: kubelet.service
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -73,6 +74,7 @@ systemd:
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/cni
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
|
||||
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
|
||||
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
ExecStart=/usr/lib/coreos/kubelet-wrapper \
|
||||
@ -80,7 +82,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--hostname-override=${domain_name} \
|
||||
@ -89,7 +91,8 @@ systemd:
|
||||
--network-plugin=cni \
|
||||
--node-labels=node-role.kubernetes.io/master \
|
||||
--pod-manifest-path=/etc/kubernetes/manifests \
|
||||
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule
|
||||
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
|
||||
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
|
||||
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
@ -114,7 +117,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.3
|
||||
- path: /etc/hostname
|
||||
filesystem: root
|
||||
mode: 0644
|
||||
@ -139,11 +142,9 @@ storage:
|
||||
# Wrapper for bootkube start
|
||||
set -e
|
||||
# Move experimental manifests
|
||||
[ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
[ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
|
||||
[ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
|
||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.10.0}"
|
||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
||||
exec /usr/bin/rkt run \
|
||||
--trust-keys-from-https \
|
||||
|
@ -30,7 +30,7 @@ systemd:
|
||||
- name: kubelet.service
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -50,6 +50,7 @@ systemd:
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/cni
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
|
||||
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
|
||||
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
ExecStart=/usr/lib/coreos/kubelet-wrapper \
|
||||
@ -57,7 +58,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--hostname-override=${domain_name} \
|
||||
@ -65,7 +66,8 @@ systemd:
|
||||
--lock-file=/var/run/lock/kubelet.lock \
|
||||
--network-plugin=cni \
|
||||
--node-labels=node-role.kubernetes.io/node \
|
||||
--pod-manifest-path=/etc/kubernetes/manifests
|
||||
--pod-manifest-path=/etc/kubernetes/manifests \
|
||||
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
|
||||
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
@ -80,7 +82,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.3
|
||||
- path: /etc/hostname
|
||||
filesystem: root
|
||||
mode: 0644
|
||||
|
@ -3,7 +3,7 @@ resource "matchbox_group" "container-linux-install" {
|
||||
count = "${length(var.controller_names) + length(var.worker_names)}"
|
||||
|
||||
name = "${format("container-linux-install-%s", element(concat(var.controller_names, var.worker_names), count.index))}"
|
||||
profile = "${var.cached_install == "true" ? matchbox_profile.cached-container-linux-install.name : matchbox_profile.container-linux-install.name}"
|
||||
profile = "${var.cached_install == "true" ? element(matchbox_profile.cached-container-linux-install.*.name, count.index) : element(matchbox_profile.container-linux-install.*.name, count.index)}"
|
||||
|
||||
selector {
|
||||
mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}"
|
||||
|
@ -1,6 +1,8 @@
|
||||
// Container Linux Install profile (from release.core-os.net)
|
||||
resource "matchbox_profile" "container-linux-install" {
|
||||
name = "container-linux-install"
|
||||
count = "${length(var.controller_names) + length(var.worker_names)}"
|
||||
name = "${format("%s-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}"
|
||||
|
||||
kernel = "http://${var.container_linux_channel}.release.core-os.net/amd64-usr/${var.container_linux_version}/coreos_production_pxe.vmlinuz"
|
||||
|
||||
initrd = [
|
||||
@ -8,6 +10,7 @@ resource "matchbox_profile" "container-linux-install" {
|
||||
]
|
||||
|
||||
args = [
|
||||
"initrd=coreos_production_pxe_image.cpio.gz",
|
||||
"coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
|
||||
"coreos.first_boot=yes",
|
||||
"console=tty0",
|
||||
@ -15,10 +18,12 @@ resource "matchbox_profile" "container-linux-install" {
|
||||
"${var.kernel_args}",
|
||||
]
|
||||
|
||||
container_linux_config = "${data.template_file.container-linux-install-config.rendered}"
|
||||
container_linux_config = "${element(data.template_file.container-linux-install-configs.*.rendered, count.index)}"
|
||||
}
|
||||
|
||||
data "template_file" "container-linux-install-config" {
|
||||
data "template_file" "container-linux-install-configs" {
|
||||
count = "${length(var.controller_names) + length(var.worker_names)}"
|
||||
|
||||
template = "${file("${path.module}/cl/container-linux-install.yaml.tmpl")}"
|
||||
|
||||
vars {
|
||||
@ -36,7 +41,9 @@ data "template_file" "container-linux-install-config" {
|
||||
// Container Linux Install profile (from matchbox /assets cache)
|
||||
// Note: Admin must have downloaded container_linux_version into matchbox assets.
|
||||
resource "matchbox_profile" "cached-container-linux-install" {
|
||||
name = "cached-container-linux-install"
|
||||
count = "${length(var.controller_names) + length(var.worker_names)}"
|
||||
name = "${format("%s-cached-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}"
|
||||
|
||||
kernel = "/assets/coreos/${var.container_linux_version}/coreos_production_pxe.vmlinuz"
|
||||
|
||||
initrd = [
|
||||
@ -44,6 +51,7 @@ resource "matchbox_profile" "cached-container-linux-install" {
|
||||
]
|
||||
|
||||
args = [
|
||||
"initrd=coreos_production_pxe_image.cpio.gz",
|
||||
"coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
|
||||
"coreos.first_boot=yes",
|
||||
"console=tty0",
|
||||
@ -51,10 +59,12 @@ resource "matchbox_profile" "cached-container-linux-install" {
|
||||
"${var.kernel_args}",
|
||||
]
|
||||
|
||||
container_linux_config = "${data.template_file.cached-container-linux-install-config.rendered}"
|
||||
container_linux_config = "${element(data.template_file.cached-container-linux-install-configs.*.rendered, count.index)}"
|
||||
}
|
||||
|
||||
data "template_file" "cached-container-linux-install-config" {
|
||||
data "template_file" "cached-container-linux-install-configs" {
|
||||
count = "${length(var.controller_names) + length(var.worker_names)}"
|
||||
|
||||
template = "${file("${path.module}/cl/container-linux-install.yaml.tmpl")}"
|
||||
|
||||
vars {
|
||||
@ -82,11 +92,12 @@ data "template_file" "controller-configs" {
|
||||
template = "${file("${path.module}/cl/controller.yaml.tmpl")}"
|
||||
|
||||
vars {
|
||||
domain_name = "${element(var.controller_domains, count.index)}"
|
||||
etcd_name = "${element(var.controller_names, count.index)}"
|
||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
|
||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
domain_name = "${element(var.controller_domains, count.index)}"
|
||||
etcd_name = "${element(var.controller_names, count.index)}"
|
||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
|
||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
|
||||
# Terraform evaluates both sides regardless and element cannot be used on 0 length lists
|
||||
networkd_content = "${length(var.controller_networkds) == 0 ? "" : element(concat(var.controller_networkds, list("")), count.index)}"
|
||||
@ -106,9 +117,10 @@ data "template_file" "worker-configs" {
|
||||
template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
|
||||
|
||||
vars {
|
||||
domain_name = "${element(var.worker_domains, count.index)}"
|
||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
domain_name = "${element(var.worker_domains, count.index)}"
|
||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
|
||||
# Terraform evaluates both sides regardless and element cannot be used on 0 length lists
|
||||
networkd_content = "${length(var.worker_networkds) == 0 ? "" : element(concat(var.worker_networkds, list("")), count.index)}"
|
||||
|
@ -24,7 +24,7 @@ variable "ssh_authorized_key" {
|
||||
}
|
||||
|
||||
# Machines
|
||||
# Terraform's crude "type system" does properly support lists of maps so we do this.
|
||||
# Terraform's crude "type system" does not properly support lists of maps so we do this.
|
||||
|
||||
variable "controller_names" {
|
||||
type = "list"
|
||||
@ -92,6 +92,12 @@ EOD
|
||||
|
||||
# optional
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
||||
variable "cached_install" {
|
||||
type = "string"
|
||||
default = "false"
|
||||
|
@ -30,7 +30,7 @@ systemd:
|
||||
- name: kubelet.service
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -50,6 +50,7 @@ systemd:
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
|
||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/cni
|
||||
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
|
||||
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
|
||||
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
ExecStart=/usr/lib/coreos/kubelet-wrapper \
|
||||
@ -57,7 +58,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns={{.k8s_dns_service_ip}} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain={{.cluster_domain_suffix}} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--hostname-override={{.domain_name}} \
|
||||
@ -65,7 +66,8 @@ systemd:
|
||||
--lock-file=/var/run/lock/kubelet.lock \
|
||||
--network-plugin=cni \
|
||||
--node-labels=node-role.kubernetes.io/node \
|
||||
--pod-manifest-path=/etc/kubernetes/manifests
|
||||
--pod-manifest-path=/etc/kubernetes/manifests \
|
||||
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
|
||||
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
@ -96,7 +98,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.3
|
||||
- path: /etc/hostname
|
||||
filesystem: root
|
||||
mode: 0644
|
||||
|
@ -13,9 +13,10 @@ resource "matchbox_group" "workers" {
|
||||
etcd_endpoints = "${join(",", formatlist("%s:2379", var.controller_domains))}"
|
||||
|
||||
# TODO
|
||||
etcd_on_host = "true"
|
||||
k8s_etcd_service_ip = "10.3.0.15"
|
||||
k8s_dns_service_ip = "${var.kube_dns_service_ip}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
etcd_on_host = "true"
|
||||
k8s_etcd_service_ip = "10.3.0.15"
|
||||
k8s_dns_service_ip = "${var.kube_dns_service_ip}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
}
|
||||
}
|
||||
|
@ -8,6 +8,7 @@ resource "matchbox_profile" "bootkube-worker-pxe" {
|
||||
]
|
||||
|
||||
args = [
|
||||
"initrd=coreos_production_pxe_image.cpio.gz",
|
||||
"coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
|
||||
"coreos.first_boot=yes",
|
||||
"console=tty0",
|
||||
|
@ -64,3 +64,9 @@ variable "kernel_args" {
|
||||
"root=/dev/sda1",
|
||||
]
|
||||
}
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Typhoon
|
||||
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
|
||||
|
||||
Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
|
||||
|
||||
## Features
|
||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||
|
||||
* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Kubernetes v1.9.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
|
@ -1,13 +1,14 @@
|
||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||
module "bootkube" {
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=203b90169ead2380f74cc64ea1f02c109806c9bc"
|
||||
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||
etcd_servers = "${digitalocean_record.etcds.*.fqdn}"
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = 1440
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||
etcd_servers = "${digitalocean_record.etcds.*.fqdn}"
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = 1440
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
|
@ -7,7 +7,7 @@ systemd:
|
||||
- name: 40-etcd-cluster.conf
|
||||
contents: |
|
||||
[Service]
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.0"
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.15"
|
||||
Environment="ETCD_NAME=${etcd_name}"
|
||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
||||
@ -50,10 +50,11 @@ systemd:
|
||||
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
|
||||
[Install]
|
||||
RequiredBy=kubelet.service
|
||||
RequiredBy=etcd-member.service
|
||||
- name: kubelet.service
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Requires=coreos-metadata.service
|
||||
After=coreos-metadata.service
|
||||
Wants=rpc-statd.service
|
||||
@ -83,7 +84,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
|
||||
@ -119,7 +120,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.3
|
||||
- path: /etc/sysctl.d/max-user-watches.conf
|
||||
filesystem: root
|
||||
contents:
|
||||
@ -138,11 +139,9 @@ storage:
|
||||
# Wrapper for bootkube start
|
||||
set -e
|
||||
# Move experimental manifests
|
||||
[ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
[ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
|
||||
[ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
|
||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.10.0}"
|
||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
||||
exec /usr/bin/rkt run \
|
||||
--trust-keys-from-https \
|
||||
|
@ -30,7 +30,7 @@ systemd:
|
||||
- name: kubelet.service
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Requires=coreos-metadata.service
|
||||
After=coreos-metadata.service
|
||||
Wants=rpc-statd.service
|
||||
@ -60,7 +60,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
|
||||
@ -94,7 +94,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.3
|
||||
- path: /etc/sysctl.d/max-user-watches.conf
|
||||
filesystem: root
|
||||
contents:
|
||||
@ -112,7 +112,7 @@ storage:
|
||||
--volume config,kind=host,source=/etc/kubernetes \
|
||||
--mount volume=config,target=/etc/kubernetes \
|
||||
--insecure-options=image \
|
||||
docker://gcr.io/google_containers/hyperkube:v1.8.3 \
|
||||
docker://gcr.io/google_containers/hyperkube:v1.9.3 \
|
||||
--net=host \
|
||||
--dns=host \
|
||||
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
|
||||
|
@ -69,8 +69,9 @@ data "template_file" "controller_config" {
|
||||
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
|
||||
|
||||
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -22,12 +22,12 @@ resource "digitalocean_firewall" "rules" {
|
||||
},
|
||||
{
|
||||
protocol = "udp"
|
||||
port_range = "all"
|
||||
port_range = "1-65535"
|
||||
source_tags = ["${digitalocean_tag.controllers.name}", "${digitalocean_tag.workers.name}"]
|
||||
},
|
||||
{
|
||||
protocol = "tcp"
|
||||
port_range = "all"
|
||||
port_range = "1-65535"
|
||||
source_tags = ["${digitalocean_tag.controllers.name}", "${digitalocean_tag.workers.name}"]
|
||||
},
|
||||
]
|
||||
@ -35,17 +35,18 @@ resource "digitalocean_firewall" "rules" {
|
||||
# allow all outbound traffic
|
||||
outbound_rule = [
|
||||
{
|
||||
protocol = "icmp"
|
||||
protocol = "tcp"
|
||||
port_range = "1-65535"
|
||||
destination_addresses = ["0.0.0.0/0", "::/0"]
|
||||
},
|
||||
{
|
||||
protocol = "udp"
|
||||
port_range = "all"
|
||||
port_range = "1-65535"
|
||||
destination_addresses = ["0.0.0.0/0", "::/0"]
|
||||
},
|
||||
{
|
||||
protocol = "tcp"
|
||||
port_range = "all"
|
||||
protocol = "icmp"
|
||||
port_range = "1-65535"
|
||||
destination_addresses = ["0.0.0.0/0", "::/0"]
|
||||
},
|
||||
]
|
||||
|
@ -5,7 +5,7 @@ terraform {
|
||||
}
|
||||
|
||||
provider "digitalocean" {
|
||||
version = "0.1.2"
|
||||
version = "~> 0.1.2"
|
||||
}
|
||||
|
||||
provider "local" {
|
||||
|
@ -27,8 +27,8 @@ variable "controller_count" {
|
||||
|
||||
variable "controller_type" {
|
||||
type = "string"
|
||||
default = "2gb"
|
||||
description = "Digital Ocean droplet size (e.g. 2gb (min), 4gb, 8gb)."
|
||||
default = "s-2vcpu-2gb"
|
||||
description = "Digital Ocean droplet size (e.g. s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb)."
|
||||
}
|
||||
|
||||
variable "worker_count" {
|
||||
@ -39,8 +39,8 @@ variable "worker_count" {
|
||||
|
||||
variable "worker_type" {
|
||||
type = "string"
|
||||
default = "512mb"
|
||||
description = "Digital Ocean droplet size (e.g. 512mb, 1gb, 2gb, 4gb)"
|
||||
default = "s-1vcpu-1gb"
|
||||
description = "Digital Ocean droplet size (e.g. s-1vcpu-1gb, s-1vcpu-2gb, s-2vcpu-2gb)"
|
||||
}
|
||||
|
||||
variable "ssh_fingerprints" {
|
||||
@ -76,3 +76,9 @@ EOD
|
||||
type = "string"
|
||||
default = "10.3.0.0/16"
|
||||
}
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
@ -43,8 +43,9 @@ data "template_file" "worker_config" {
|
||||
template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
|
||||
|
||||
vars = {
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
k8s_etcd_service_ip = "${cidrhost(var.service_cidr, 15)}"
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
k8s_etcd_service_ip = "${cidrhost(var.service_cidr, 15)}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -12,13 +12,13 @@ kubectl apply -f addons/cluo -R
|
||||
|
||||
## Usage
|
||||
|
||||
`update-agent` runs as a DaemonSet and annotates a node when `update-engine.service` indiates an update has been installed and a reboot is needed. It also adds additional labels and annotations to nodes.
|
||||
`update-agent` runs as a DaemonSet and annotates a node when `update-engine.service` indicates an update has been installed and a reboot is needed. It also adds additional labels and annotations to nodes.
|
||||
|
||||
```
|
||||
$ kubectl get nodes --show-labels
|
||||
...
|
||||
container-linux-update.v1.coreos.com/group=stable
|
||||
container-linux-update.v1.coreos.com/version=1465.6.0
|
||||
container-linux-update.v1.coreos.com/version=1576.5.0
|
||||
```
|
||||
|
||||
`update-operator` ensures one node reboots at a time and that pods are drained prior to reboot.
|
||||
|
@ -1,24 +0,0 @@
|
||||
# Kubernetes Dashboard
|
||||
|
||||
The Kubernetes [Dashboard](https://github.com/kubernetes/dashboard) provides a web UI to manage a Kubernetes cluster for those who prefer an alternative to `kubectl`.
|
||||
|
||||
## Create
|
||||
|
||||
Create the dashboard deployment and service.
|
||||
|
||||
```
|
||||
kubectl apply -f addons/dashboard -R
|
||||
```
|
||||
|
||||
## Access
|
||||
|
||||
Use `kubectl` to authenticate to the apiserver and create a local port forward to the remote port on the dashboard pod.
|
||||
|
||||
```sh
|
||||
kubectl get pods -n kube-system
|
||||
kubectl port-forward POD [LOCAL_PORT:]REMOTE_PORT
|
||||
kubectl port-forward kubernetes-dashboard-id 9090 -n kube-system
|
||||
```
|
||||
|
||||
!!! tip
|
||||
If you'd like to expose the Dashboard via Ingress and add authentication, use a suitable OAuth2 proxy sidecar and pick your favorite OAuth2 provider.
|
20
docs/addons/grafana.md
Normal file
@ -0,0 +1,20 @@
|
||||
## Grafana
|
||||
|
||||
Grafana can be used to build dashboards and visualizations that use Prometheus as the datasource. Create the grafana deployment and service.
|
||||
|
||||
```
|
||||
kubectl apply -f addons/grafana -R
|
||||
```
|
||||
|
||||
Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Grafana pod.
|
||||
|
||||
```
|
||||
kubectl port-forward grafana-POD-ID 8080 -n monitoring
|
||||
```
|
||||
|
||||
Visit [127.0.0.1:8080](http://127.0.0.1:8080) to view the bundled dashboards.
|
||||
|
||||

|
||||

|
||||

|
||||
|
@ -1,6 +1,6 @@
|
||||
# Heapster
|
||||
|
||||
[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command and Kubernetes dashbard graphs.
|
||||
[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command and Kubernetes dashboard graphs.
|
||||
|
||||
## Create
|
||||
|
||||
|
@ -6,6 +6,5 @@ Every Typhoon cluster is verified to work well with several post-install addons.
|
||||
* Nginx [Ingress Controller](ingress.md)
|
||||
* [Heapster](heapster.md)
|
||||
* [Prometheus](prometheus.md)
|
||||
* [Grafana](prometheus.md#grafana)
|
||||
* Kubernetes [Dashboard](dashboard.md)
|
||||
* [Grafana](grafana.md)
|
||||
|
||||
|
@ -20,7 +20,7 @@ On Kubernetes clusters, Prometheus is run as a Deployment, configured with a Con
|
||||
kubectl apply -f addons/prometheus -R
|
||||
```
|
||||
|
||||
The ConfigMap configures Prometheus to target apiserver endpoints, node metrics, cAdvisor metrics, and exporters. By default, data is kept in an `emptyDir` so it is persisted until the pod is rescheduled.
|
||||
The ConfigMap configures Prometheus to discover apiservers, kubelets, cAdvisor, services, endpoints, and exporters. By default, data is kept in an `emptyDir` so it is persisted until the pod is rescheduled.
|
||||
|
||||
### Exporters
|
||||
|
||||
@ -32,7 +32,7 @@ Exporters expose metrics for 3rd-party systems that don't natively expose Promet
|
||||
|
||||
### Queries and Alerts
|
||||
|
||||
Prometheus provides a simplistic UI for querying metrics and viewing alerts. Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Prometheus pod.
|
||||
Prometheus provides a basic UI for querying metrics and viewing alerts. Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Prometheus pod.
|
||||
|
||||
```
|
||||
kubectl get pods -n monitoring
|
||||
@ -47,21 +47,4 @@ Visit [127.0.0.1:9090](http://127.0.0.1:9090) to query [expressions](http://127.
|
||||
<br/>
|
||||

|
||||
|
||||
## Grafana
|
||||
|
||||
Grafana can be used to build dashboards and rich visualizations that use Prometheus as the datasource. Create the grafana deployment and service.
|
||||
|
||||
```
|
||||
kubectl apply -f addons/grafana -R
|
||||
```
|
||||
|
||||
Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Grafana pod.
|
||||
|
||||
```
|
||||
kubectl port-forward grafana-POD-ID 8080 -n monitoring
|
||||
```
|
||||
|
||||
Visit [127.0.0.1:8080](http://127.0.0.1:8080), add the prometheus data-source (http://prometheus.monitoring.svc.cluster.local), and import your desired dashboard (e.g. 315).
|
||||
|
||||

|
||||
|
||||
Use [Grafana](/addons/grafana.md) to view or build dashboards that use Prometheus as the datasource.
|
||||
|
60
docs/aws.md
@ -1,6 +1,6 @@
|
||||
# AWS
|
||||
|
||||
In this tutorial, we'll create a Kubernetes v1.8.3 cluster on AWS.
|
||||
In this tutorial, we'll create a Kubernetes v1.9.3 cluster on AWS.
|
||||
|
||||
We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module. On apply, a VPC, gateway, subnets, auto-scaling groups of controllers and workers, network load balancers for controllers and workers, and security groups will be created.
|
||||
|
||||
@ -10,15 +10,15 @@ Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube
|
||||
|
||||
* AWS Account and IAM credentials
|
||||
* AWS Route53 DNS Zone (registered Domain Name or delegated subdomain)
|
||||
* Terraform v0.10.4+ and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
|
||||
* Terraform v0.11.x and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
|
||||
|
||||
## Terraform Setup
|
||||
|
||||
Install [Terraform](https://www.terraform.io/downloads.html) v0.10.1 on your system.
|
||||
Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your system.
|
||||
|
||||
```sh
|
||||
$ terraform version
|
||||
Terraform v0.10.7
|
||||
Terraform v0.11.1
|
||||
```
|
||||
|
||||
Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system.
|
||||
@ -57,9 +57,32 @@ Configure the AWS provider to use your access key credentials in a `providers.tf
|
||||
|
||||
```tf
|
||||
provider "aws" {
|
||||
version = "~> 1.5.0"
|
||||
alias = "default"
|
||||
|
||||
region = "eu-central-1"
|
||||
shared_credentials_file = "/home/user/.config/aws/credentials"
|
||||
}
|
||||
|
||||
provider "local" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "null" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "template" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "tls" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
```
|
||||
|
||||
Additional configuration options are described in the `aws` provider [docs](https://www.terraform.io/docs/providers/aws/).
|
||||
@ -73,8 +96,16 @@ Define a Kubernetes cluster using the module `aws/container-linux/kubernetes`.
|
||||
|
||||
```tf
|
||||
module "aws-tempest" {
|
||||
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes"
|
||||
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.9.3"
|
||||
|
||||
providers = {
|
||||
aws = "aws.default"
|
||||
local = "local.default"
|
||||
null = "null.default"
|
||||
template = "template.default"
|
||||
tls = "tls.default"
|
||||
}
|
||||
|
||||
cluster_name = "tempest"
|
||||
|
||||
# AWS
|
||||
@ -103,7 +134,7 @@ ssh-add -L
|
||||
```
|
||||
|
||||
!!! warning
|
||||
`terrafrom apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
|
||||
`terraform apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
|
||||
|
||||
## Apply
|
||||
|
||||
@ -119,7 +150,7 @@ Get or update Terraform modules.
|
||||
$ terraform get # downloads missing modules
|
||||
$ terraform get --update # updates all modules
|
||||
Get: git::https://github.com/poseidon/typhoon (update)
|
||||
Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.8.2 (update)
|
||||
Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.10.0 (update)
|
||||
```
|
||||
|
||||
Plan the resources to be created.
|
||||
@ -148,12 +179,12 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
|
||||
[Install kubectl](https://coreos.com/kubernetes/docs/latest/configure-kubectl.html) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
|
||||
|
||||
```
|
||||
$ KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
|
||||
$ export KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
|
||||
$ kubectl get nodes
|
||||
NAME STATUS AGE VERSION
|
||||
ip-10-0-12-221 Ready 34m v1.8.3
|
||||
ip-10-0-19-112 Ready 34m v1.8.3
|
||||
ip-10-0-4-22 Ready 34m v1.8.3
|
||||
ip-10-0-12-221 Ready 34m v1.9.3
|
||||
ip-10-0-19-112 Ready 34m v1.9.3
|
||||
ip-10-0-4-22 Ready 34m v1.9.3
|
||||
```
|
||||
|
||||
List the pods.
|
||||
@ -179,7 +210,7 @@ kube-system pod-checkpointer-4kxtl-ip-10-0-12-221 1/1 Running 0
|
||||
|
||||
## Going Further
|
||||
|
||||
Learn about [version pinning](concepts.md#versioning), maintenance, and [addons](addons/overview.md).
|
||||
Learn about [version pinning](concepts.md#versioning), [maintenance](topics/maintenance.md), and [addons](addons/overview.md).
|
||||
|
||||
!!! note
|
||||
On Container Linux clusters, install the `container-linux-update-operator` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
|
||||
@ -201,7 +232,7 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]
|
||||
|
||||
Clusters create a DNS A record `${cluster_name}.${dns_zone}` to resolve a network load balancer backed by controller instances. This FQDN is used by workers and `kubectl` to access the apiserver. In this example, the cluster's apiserver would be accessible at `tempest.aws.example.com`.
|
||||
|
||||
You'll need a registered domain name or subdomain registered in a AWS Route53 DNS zone. You can set this up once and create many clusters with unqiue names.
|
||||
You'll need a registered domain name or subdomain registered in a AWS Route53 DNS zone. You can set this up once and create many clusters with unique names.
|
||||
|
||||
```tf
|
||||
resource "aws_route53_zone" "zone-for-clusters" {
|
||||
@ -227,7 +258,8 @@ Reference the DNS zone id with `"${aws_route53_zone.zone-for-clusters.zone_id}"`
|
||||
| network_mtu | CNI interface MTU (calico only) | 1480 | 8981 |
|
||||
| host_cidr | CIDR range to assign to EC2 instances | "10.0.0.0/16" | "10.1.0.0/16" |
|
||||
| pod_cidr | CIDR range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
|
||||
| service_cidr | CIDR range to assgin to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| service_cidr | CIDR range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by kube-dns. | "cluster.local" | "k8s.example.com" |
|
||||
|
||||
Check the list of valid [instance types](https://aws.amazon.com/ec2/instance-types/).
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# Bare-Metal
|
||||
|
||||
In this tutorial, we'll network boot and provison a Kubernetes v1.8.3 cluster on bare-metal.
|
||||
In this tutorial, we'll network boot and provision a Kubernetes v1.9.3 cluster on bare-metal.
|
||||
|
||||
First, we'll deploy a [Matchbox](https://github.com/coreos/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Container Linux to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers.
|
||||
|
||||
@ -12,7 +12,7 @@ Controllers are provisioned as etcd peers and run `etcd-member` (etcd3) and `kub
|
||||
* PXE-enabled [network boot](https://coreos.com/matchbox/docs/latest/network-setup.html) environment
|
||||
* Matchbox v0.6+ deployment with API enabled
|
||||
* Matchbox credentials `client.crt`, `client.key`, `ca.crt`
|
||||
* Terraform v0.10.4+ and [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) installed locally
|
||||
* Terraform v0.11.x and [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) installed locally
|
||||
|
||||
## Machines
|
||||
|
||||
@ -31,7 +31,7 @@ Configure each machine to boot from the disk [^1] through IPMI or the BIOS menu.
|
||||
ipmitool -H node1 -U USER -P PASS chassis bootdev disk options=persistent
|
||||
```
|
||||
|
||||
During provisioning, you'll explicitly set the boot device to `pxe` for the next boot only. Machines will install (overwrite) the operting system to disk on PXE boot and reboot into the disk install.
|
||||
During provisioning, you'll explicitly set the boot device to `pxe` for the next boot only. Machines will install (overwrite) the operating system to disk on PXE boot and reboot into the disk install.
|
||||
|
||||
!!! tip ""
|
||||
Ask your hardware vendor to provide MACs and preconfigure IPMI, if possible. With it, you can rack new servers, `terraform apply` with new info, and power on machines that network boot and provision into clusters.
|
||||
@ -105,15 +105,15 @@ Read about the [many ways](https://coreos.com/matchbox/docs/latest/network-setup
|
||||
* Place Matchbox behind a menu entry (timeout and default to Matchbox)
|
||||
|
||||
!!! note ""
|
||||
TFTP chainloding to modern boot firmware, like iPXE, avoids issues with old NICs and allows faster transfer protocols like HTTP to be used.
|
||||
TFTP chainloading to modern boot firmware, like iPXE, avoids issues with old NICs and allows faster transfer protocols like HTTP to be used.
|
||||
|
||||
## Terraform Setup
|
||||
|
||||
Install [Terraform](https://www.terraform.io/downloads.html) v0.9.2+ on your system.
|
||||
Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your system.
|
||||
|
||||
```sh
|
||||
$ terraform version
|
||||
Terraform v0.10.7
|
||||
Terraform v0.11.1
|
||||
```
|
||||
|
||||
Add the [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) plugin binary for your system.
|
||||
@ -149,6 +149,26 @@ provider "matchbox" {
|
||||
client_key = "${file("~/.config/matchbox/client.key")}"
|
||||
ca = "${file("~/.config/matchbox/ca.crt")}"
|
||||
}
|
||||
|
||||
provider "local" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "null" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "template" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "tls" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
```
|
||||
|
||||
## Cluster
|
||||
@ -157,12 +177,19 @@ Define a Kubernetes cluster using the module `bare-metal/container-linux/kuberne
|
||||
|
||||
```tf
|
||||
module "bare-metal-mercury" {
|
||||
source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes"
|
||||
source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes?ref=v1.9.3"
|
||||
|
||||
providers = {
|
||||
local = "local.default"
|
||||
null = "null.default"
|
||||
template = "template.default"
|
||||
tls = "tls.default"
|
||||
}
|
||||
|
||||
# install
|
||||
matchbox_http_endpoint = "http://matchbox.example.com"
|
||||
container_linux_channel = "stable"
|
||||
container_linux_version = "1520.6.0"
|
||||
container_linux_version = "1576.5.0"
|
||||
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
|
||||
|
||||
# cluster
|
||||
@ -203,7 +230,7 @@ ssh-add -L
|
||||
```
|
||||
|
||||
!!! warning
|
||||
`terrafrom apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
|
||||
`terraform apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
|
||||
|
||||
## Apply
|
||||
|
||||
@ -219,7 +246,7 @@ Get or update Terraform modules.
|
||||
$ terraform get # downloads missing modules
|
||||
$ terraform get --update # updates all modules
|
||||
Get: git::https://github.com/poseidon/typhoon (update)
|
||||
Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.8.2 (update)
|
||||
Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.10.0 (update)
|
||||
```
|
||||
|
||||
Plan the resources to be created.
|
||||
@ -287,12 +314,12 @@ bootkube[5]: Tearing down temporary bootstrap control plane...
|
||||
[Install kubectl](https://coreos.com/kubernetes/docs/latest/configure-kubectl.html) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
|
||||
|
||||
```
|
||||
$ KUBECONFIG=/home/user/.secrets/clusters/mercury/auth/kubeconfig
|
||||
$ export KUBECONFIG=/home/user/.secrets/clusters/mercury/auth/kubeconfig
|
||||
$ kubectl get nodes
|
||||
NAME STATUS AGE VERSION
|
||||
node1.example.com Ready 11m v1.8.3
|
||||
node2.example.com Ready 11m v1.8.3
|
||||
node3.example.com Ready 11m v1.8.3
|
||||
node1.example.com Ready 11m v1.9.3
|
||||
node2.example.com Ready 11m v1.9.3
|
||||
node3.example.com Ready 11m v1.9.3
|
||||
```
|
||||
|
||||
List the pods.
|
||||
@ -319,7 +346,7 @@ kube-system pod-checkpointer-wf65d-node1.example.com 1/1 Running 0
|
||||
|
||||
## Going Further
|
||||
|
||||
Learn about [version pinning](concepts.md#versioning), maintenance, and [addons](addons/overview.md).
|
||||
Learn about [version pinning](concepts.md#versioning), [maintenance](topics/maintenance.md), and [addons](addons/overview.md).
|
||||
|
||||
!!! note
|
||||
On Container Linux clusters, install the `container-linux-update-operator` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
|
||||
@ -332,7 +359,7 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]
|
||||
|:-----|:------------|:--------|
|
||||
| matchbox_http_endpoint | Matchbox HTTP read-only endpoint | http://matchbox.example.com:8080 |
|
||||
| container_linux_channel | Container Linux channel | stable, beta, alpha |
|
||||
| container_linux_version | Container Linux version of the kernel/initrd to PXE and the image to install | 1465.6.0 |
|
||||
| container_linux_version | Container Linux version of the kernel/initrd to PXE and the image to install | 1576.5.0 |
|
||||
| cluster_name | Cluster name | mercury |
|
||||
| k8s_domain_name | FQDN resolving to the controller(s) nodes. Workers and kubectl will communicate with this endpoint | "myk8s.example.com" |
|
||||
| ssh_authorized_key | SSH public key for ~/.ssh/authorized_keys | "ssh-rsa AAAAB3Nz..." |
|
||||
@ -354,6 +381,7 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]
|
||||
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
|
||||
| network_mtu | CNI interface MTU (calico-only) | 1480 | - |
|
||||
| pod_cidr | CIDR range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
|
||||
| service_cidr | CIDR range to assgin to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| service_cidr | CIDR range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by kube-dns. | "cluster.local" | "k8s.example.com" |
|
||||
| kernel_args | Additional kernel args to provide at PXE boot | [] | "kvm-intel.nested=1" |
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# Digital Ocean
|
||||
|
||||
In this tutorial, we'll create a Kubernetes v1.8.3 cluster on Digital Ocean.
|
||||
In this tutorial, we'll create a Kubernetes v1.9.3 cluster on Digital Ocean.
|
||||
|
||||
We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module. On apply, firewall rules, DNS records, tags, and droplets for Kubernetes controllers and workers will be created.
|
||||
|
||||
@ -10,15 +10,15 @@ Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube
|
||||
|
||||
* Digital Ocean Account and Token
|
||||
* Digital Ocean Domain (registered Domain Name or delegated subdomain)
|
||||
* Terraform v0.10.4+ and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
|
||||
* Terraform v0.11.x and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
|
||||
|
||||
## Terraform Setup
|
||||
|
||||
Install [Terraform](https://www.terraform.io/downloads.html) v0.10.1+ on your system.
|
||||
Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your system.
|
||||
|
||||
```sh
|
||||
$ terraform version
|
||||
Terraform v0.10.7
|
||||
Terraform v0.11.1
|
||||
```
|
||||
|
||||
Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system.
|
||||
@ -58,7 +58,29 @@ Configure the DigitalOcean provider to use your token in a `providers.tf` file.
|
||||
|
||||
```tf
|
||||
provider "digitalocean" {
|
||||
version = "0.1.3"
|
||||
token = "${chomp(file("~/.config/digital-ocean/token"))}"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "local" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "null" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "template" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "tls" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
```
|
||||
|
||||
@ -68,7 +90,15 @@ Define a Kubernetes cluster using the module `digital-ocean/container-linux/kube
|
||||
|
||||
```tf
|
||||
module "digital-ocean-nemo" {
|
||||
source = "git::https://github.com/poseidon/typhoon//digital-ocean/container-linux/kubernetes"
|
||||
source = "git::https://github.com/poseidon/typhoon//digital-ocean/container-linux/kubernetes?ref=v1.9.3"
|
||||
|
||||
providers = {
|
||||
digitalocean = "digitalocean.default"
|
||||
local = "local.default"
|
||||
null = "null.default"
|
||||
template = "template.default"
|
||||
tls = "tls.default"
|
||||
}
|
||||
|
||||
region = "nyc3"
|
||||
dns_zone = "digital-ocean.example.com"
|
||||
@ -76,9 +106,9 @@ module "digital-ocean-nemo" {
|
||||
cluster_name = "nemo"
|
||||
image = "coreos-stable"
|
||||
controller_count = 1
|
||||
controller_type = "2gb"
|
||||
controller_type = "s-2vcpu-2gb"
|
||||
worker_count = 2
|
||||
worker_type = "512mb"
|
||||
worker_type = "s-1vcpu-1gb"
|
||||
ssh_fingerprints = ["d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7"]
|
||||
|
||||
# output assets dir
|
||||
@ -98,7 +128,7 @@ ssh-add -L
|
||||
```
|
||||
|
||||
!!! warning
|
||||
`terrafrom apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
|
||||
`terraform apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
|
||||
|
||||
## Apply
|
||||
|
||||
@ -114,7 +144,7 @@ Get or update Terraform modules.
|
||||
$ terraform get # downloads missing modules
|
||||
$ terraform get --update # updates all modules
|
||||
Get: git::https://github.com/poseidon/typhoon (update)
|
||||
Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.8.2 (update)
|
||||
Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.10.0 (update)
|
||||
```
|
||||
|
||||
Plan the resources to be created.
|
||||
@ -144,12 +174,12 @@ In 3-6 minutes, the Kubernetes cluster will be ready.
|
||||
[Install kubectl](https://coreos.com/kubernetes/docs/latest/configure-kubectl.html) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
|
||||
|
||||
```
|
||||
$ KUBECONFIG=/home/user/.secrets/clusters/nemo/auth/kubeconfig
|
||||
$ export KUBECONFIG=/home/user/.secrets/clusters/nemo/auth/kubeconfig
|
||||
$ kubectl get nodes
|
||||
NAME STATUS AGE VERSION
|
||||
10.132.110.130 Ready 10m v1.8.3
|
||||
10.132.115.81 Ready 10m v1.8.3
|
||||
10.132.124.107 Ready 10m v1.8.3
|
||||
10.132.110.130 Ready 10m v1.9.3
|
||||
10.132.115.81 Ready 10m v1.9.3
|
||||
10.132.124.107 Ready 10m v1.9.3
|
||||
```
|
||||
|
||||
List the pods.
|
||||
@ -174,7 +204,7 @@ kube-system pod-checkpointer-pr1lq-10.132.115.81 1/1 Running 0
|
||||
|
||||
## Going Further
|
||||
|
||||
Learn about [version pinning](concepts.md#versioning), maintenance, and [addons](addons/overview.md).
|
||||
Learn about [version pinning](concepts.md#versioning), [maintenance](topics/maintenance.md), and [addons](addons/overview.md).
|
||||
|
||||
!!! note
|
||||
On Container Linux clusters, install the `container-linux-update-operator` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
|
||||
@ -195,7 +225,7 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]
|
||||
|
||||
Clusters create DNS A records `${cluster_name}.${dns_zone}` to resolve to controller droplets (round robin). This FQDN is used by workers and `kubectl` to access the apiserver. In this example, the cluster's apiserver would be accessible at `nemo.do.example.com`.
|
||||
|
||||
You'll need a registered domain name or subdomain registered in Digital Ocean Domains (i.e. DNS zones). You can set this up once and create many clusters with unqiue names.
|
||||
You'll need a registered domain name or subdomain registered in Digital Ocean Domains (i.e. DNS zones). You can set this up once and create many clusters with unique names.
|
||||
|
||||
```tf
|
||||
resource "digitalocean_domain" "zone-for-clusters" {
|
||||
@ -232,15 +262,18 @@ If you uploaded an SSH key to DigitalOcean (not required), find the fingerprint
|
||||
|:-----|:------------|:--------|:--------|
|
||||
| image | OS image for droplets | "coreos-stable" | coreos-stable, coreos-beta, coreos-alpha |
|
||||
| controller_count | Number of controllers (i.e. masters) | 1 | 1 |
|
||||
| controller_type | Digital Ocean droplet size | 2gb | 2gb (min), 4gb, 8gb |
|
||||
| controller_type | Digital Ocean droplet size | s-2vcpu-2gb | s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb, ... |
|
||||
| worker_count | Number of workers | 1 | 3 |
|
||||
| worker_type | Digital Ocean droplet size | 512mb | 512mb, 1gb, 2gb, 4gb |
|
||||
| worker_type | Digital Ocean droplet size | s-1vcpu-1gb | s-1vcpu-1gb, s-1vcpu-2gb, s-2vcpu-2gb, ... |
|
||||
| networking | Choice of networking provider | "flannel" | "flannel" |
|
||||
| pod_cidr | CIDR range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
|
||||
| service_cidr | CIDR range to assgin to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| service_cidr | CIDR range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by kube-dns. | "cluster.local" | "k8s.example.com" |
|
||||
|
||||
You can see all valid droplet sizes [on DigitalOcean's website](https://developers.digitalocean.com/documentation/changelog/api-v2/new-size-slugs-for-droplet-plan-changes/) or by [using their `doctl` command-line tool](https://github.com/digitalocean/doctl) via `doctl compute size list`.
|
||||
|
||||
!!! warning
|
||||
Do not choose a `controller_type` smaller than `2gb`. The `1gb` droplet is not sufficient for running a controller and bootstrapping will fail.
|
||||
Do not choose a `controller_type` smaller than 2GB. Smaller droplets are not sufficient for running a controller and bootstrapping will fail.
|
||||
|
||||
!!! bug
|
||||
Digital Ocean firewalls do not yet support the IP tunneling (IP in IP) protocol used by Calico. You can try using "calico" for `networking`, but it will only work if the cloud firewall is removed (unsafe).
|
||||
|
@ -1,6 +1,6 @@
|
||||
# Google Cloud
|
||||
|
||||
In this tutorial, we'll create a Kubernetes v1.8.3 cluster on Google Compute Engine (not GKE).
|
||||
In this tutorial, we'll create a Kubernetes v1.9.3 cluster on Google Compute Engine (not GKE).
|
||||
|
||||
We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module. On apply, a network, firewall rules, managed instance groups of Kubernetes controllers and workers, network load balancers for controllers and workers, and health checks will be created.
|
||||
|
||||
@ -10,15 +10,15 @@ Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube
|
||||
|
||||
* Google Cloud Account and Service Account
|
||||
* Google Cloud DNS Zone (registered Domain Name or delegated subdomain)
|
||||
* Terraform v0.10.4+ and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
|
||||
* Terraform v0.11.x and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
|
||||
|
||||
## Terraform Setup
|
||||
|
||||
Install [Terraform](https://www.terraform.io/downloads.html) v0.9.2+ on your system.
|
||||
Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your system.
|
||||
|
||||
```sh
|
||||
$ terraform version
|
||||
Terraform v0.10.7
|
||||
Terraform v0.11.1
|
||||
```
|
||||
|
||||
Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system.
|
||||
@ -57,10 +57,33 @@ Configure the Google Cloud provider to use your service account key, project-id,
|
||||
|
||||
```tf
|
||||
provider "google" {
|
||||
version = "1.2"
|
||||
alias = "default"
|
||||
|
||||
credentials = "${file("~/.config/google-cloud/terraform.json")}"
|
||||
project = "project-id"
|
||||
region = "us-central1"
|
||||
}
|
||||
|
||||
provider "local" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "null" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "template" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "tls" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
```
|
||||
|
||||
Additional configuration options are described in the `google` provider [docs](https://www.terraform.io/docs/providers/google/index.html).
|
||||
@ -74,13 +97,21 @@ Define a Kubernetes cluster using the module `google-cloud/container-linux/kuber
|
||||
|
||||
```tf
|
||||
module "google-cloud-yavin" {
|
||||
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes"
|
||||
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.9.3"
|
||||
|
||||
providers = {
|
||||
google = "google.default"
|
||||
local = "local.default"
|
||||
null = "null.default"
|
||||
template = "template.default"
|
||||
tls = "tls.default"
|
||||
}
|
||||
|
||||
# Google Cloud
|
||||
region = "us-central1"
|
||||
dns_zone = "example.com"
|
||||
dns_zone_name = "example-zone"
|
||||
os_image = "coreos-stable-1520-6-0-v20171012"
|
||||
os_image = "coreos-stable-1576-5-0-v20180105"
|
||||
|
||||
cluster_name = "yavin"
|
||||
controller_count = 1
|
||||
@ -104,7 +135,7 @@ ssh-add -L
|
||||
```
|
||||
|
||||
!!! warning
|
||||
`terrafrom apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
|
||||
`terraform apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
|
||||
|
||||
## Apply
|
||||
|
||||
@ -120,7 +151,7 @@ Get or update Terraform modules.
|
||||
$ terraform get # downloads missing modules
|
||||
$ terraform get --update # updates all modules
|
||||
Get: git::https://github.com/poseidon/typhoon (update)
|
||||
Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.8.2 (update)
|
||||
Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.10.0 (update)
|
||||
```
|
||||
|
||||
Plan the resources to be created.
|
||||
@ -151,12 +182,12 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
|
||||
[Install kubectl](https://coreos.com/kubernetes/docs/latest/configure-kubectl.html) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.
|
||||
|
||||
```
|
||||
$ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
|
||||
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
|
||||
$ kubectl get nodes
|
||||
NAME STATUS AGE VERSION
|
||||
yavin-controller-0.c.example-com.internal Ready 6m v1.8.3
|
||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.8.3
|
||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.8.3
|
||||
yavin-controller-0.c.example-com.internal Ready 6m v1.9.3
|
||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.9.3
|
||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.9.3
|
||||
```
|
||||
|
||||
List the pods.
|
||||
@ -181,7 +212,7 @@ kube-system pod-checkpointer-l6lrt 1/1 Running 0
|
||||
|
||||
## Going Further
|
||||
|
||||
Learn about [version pinning](concepts.md#versioning), maintenance, and [addons](addons/overview.md).
|
||||
Learn about [version pinning](concepts.md#versioning), [maintenance](topics/maintenance.md), and [addons](addons/overview.md).
|
||||
|
||||
!!! note
|
||||
On Container Linux clusters, install the `container-linux-update-operator` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
|
||||
@ -197,7 +228,7 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]
|
||||
| dns_zone | Google Cloud DNS zone | "google-cloud.example.com" |
|
||||
| dns_zone_name | Google Cloud DNS zone name | "example-zone" |
|
||||
| ssh_authorized_key | SSH public key for ~/.ssh_authorized_keys | "ssh-rsa AAAAB3NZ..." |
|
||||
| os_image | OS image for compute instances | "coreos-stable-1465-6-0-v20170817" |
|
||||
| os_image | OS image for compute instances | "coreos-stable-1576-5-0-v20180105" |
|
||||
| asset_dir | Path to a directory where generated assets should be placed (contains secrets) | "/home/user/.secrets/clusters/yavin" |
|
||||
|
||||
Check the list of valid [regions](https://cloud.google.com/compute/docs/regions-zones/regions-zones) and list Container Linux [images](https://cloud.google.com/compute/docs/images) with `gcloud compute images list | grep coreos`.
|
||||
@ -206,7 +237,7 @@ Check the list of valid [regions](https://cloud.google.com/compute/docs/regions-
|
||||
|
||||
Clusters create a DNS A record `${cluster_name}.${dns_zone}` to resolve a network load balancer backed by controller instances. This FQDN is used by workers and `kubectl` to access the apiserver. In this example, the cluster's apiserver would be accessible at `yavin.google-cloud.example.com`.
|
||||
|
||||
You'll need a registered domain name or subdomain registered in a Google Cloud DNS zone. You can set this up once and create many clusters with unqiue names.
|
||||
You'll need a registered domain name or subdomain registered in a Google Cloud DNS zone. You can set this up once and create many clusters with unique names.
|
||||
|
||||
```tf
|
||||
resource "google_dns_managed_zone" "zone-for-clusters" {
|
||||
@ -229,7 +260,8 @@ resource "google_dns_managed_zone" "zone-for-clusters" {
|
||||
| worker_preemptible | If enabled, Compute Engine will terminate controllers randomly within 24 hours | false | true |
|
||||
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
|
||||
| pod_cidr | CIDR range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
|
||||
| service_cidr | CIDR range to assgin to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| service_cidr | CIDR range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
|
||||
| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by kube-dns. | "cluster.local" | "k8s.example.com" |
|
||||
|
||||
Check the list of valid [machine types](https://cloud.google.com/compute/docs/machine-types).
|
||||
|
||||
|
BIN
docs/img/grafana-capacity.png
Normal file
After Width: | Height: | Size: 234 KiB |
BIN
docs/img/grafana-control-plane.png
Normal file
After Width: | Height: | Size: 259 KiB |
Before Width: | Height: | Size: 212 KiB |
BIN
docs/img/grafana-node.png
Normal file
After Width: | Height: | Size: 240 KiB |
Before Width: | Height: | Size: 404 KiB After Width: | Height: | Size: 181 KiB |
BIN
docs/img/typhoon-logo.png
Normal file
After Width: | Height: | Size: 20 KiB |
BIN
docs/img/typhoon.png
Normal file
After Width: | Height: | Size: 22 KiB |
@ -1,4 +1,4 @@
|
||||
# Typhoon <img align="right" src="https://storage.googleapis.com/dghubble/spin.png">
|
||||
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
|
||||
|
||||
Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
|
||||
|
||||
## Features
|
||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||
|
||||
* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Kubernetes v1.9.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
* Ready for Ingress, Dashboards, Metrics and other optional [addons](addons/overview.md)
|
||||
@ -44,12 +44,20 @@ Define a Kubernetes cluster by using the Terraform module for your chosen platfo
|
||||
```tf
|
||||
module "google-cloud-yavin" {
|
||||
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes"
|
||||
|
||||
providers = {
|
||||
google = "google.default"
|
||||
local = "local.default"
|
||||
null = "null.default"
|
||||
template = "template.default"
|
||||
tls = "tls.default"
|
||||
}
|
||||
|
||||
# Google Cloud
|
||||
region = "us-central1"
|
||||
dns_zone = "example.com"
|
||||
dns_zone_name = "example-zone"
|
||||
os_image = "coreos-stable-1465-6-0-v20170817"
|
||||
os_image = "coreos-stable-1576-5-0-v20180105"
|
||||
|
||||
cluster_name = "yavin"
|
||||
controller_count = 1
|
||||
@ -74,12 +82,12 @@ Apply complete! Resources: 64 added, 0 changed, 0 destroyed.
|
||||
In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.
|
||||
|
||||
```
|
||||
$ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
|
||||
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
|
||||
$ kubectl get nodes
|
||||
NAME STATUS AGE VERSION
|
||||
yavin-controller-0.c.example-com.internal Ready 6m v1.8.3
|
||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.8.3
|
||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.8.3
|
||||
yavin-controller-0.c.example-com.internal Ready 6m v1.9.3
|
||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.9.3
|
||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.9.3
|
||||
```
|
||||
|
||||
List the pods.
|
||||
@ -118,4 +126,4 @@ Typhoon is not a product, trial, or free-tier. It is not run by a company, does
|
||||
|
||||
Typhoon clusters will contain only [free](https://www.debian.org/intro/free) components. Cluster components will not collect data on users without their permission.
|
||||
|
||||
*Disclosure: The author works for CoreOS and previously wrote Matchbox and original Tectonic for bare-metal and AWS. This project is not associated with CoreOS.*
|
||||
*Disclosure: The author works for Red Hat (prev CoreOS), but Typhoon is unassociated and maintained independently.*
|
||||
|
@ -2,9 +2,9 @@
|
||||
|
||||
While bare-metal Kubernetes clusters have no special hardware requirements (beyond the [min reqs](/bare-metal.md#requirements)), Typhoon does ensure certain router and server hardware integrates well with Kubernetes.
|
||||
|
||||
## Ubiquitiy
|
||||
## Ubiquiti
|
||||
|
||||
Ubiquity EdgeRouters work well with bare-metal Kubernetes clusters. Knowledge about how to setup an EdgeRouter and use the CLI is required.
|
||||
Ubiquiti EdgeRouters work well with bare-metal Kubernetes clusters. Knowledge about how to setup an EdgeRouter and use the CLI is required.
|
||||
|
||||
### PXE
|
||||
|
||||
@ -50,7 +50,7 @@ Add `dnsmasq` command line options to enable the TFTP file server.
|
||||
|
||||
```
|
||||
configure
|
||||
show servce dns forwarding
|
||||
show service dns forwarding
|
||||
set service dns forwarding options enable-tftp
|
||||
set service dns forwarding options tftp-root=/var/lib/tftpboot
|
||||
commit-confirm
|
||||
@ -143,7 +143,7 @@ commit-confirm
|
||||
|
||||
### BGP
|
||||
|
||||
Add the EdgeRouter as a global BGP peer for nodes in a Kubernetes cluster (requires Calico). Neighbors will exchange `podCIDR` routes and individual pods will become routeable on the LAN.
|
||||
Add the EdgeRouter as a global BGP peer for nodes in a Kubernetes cluster (requires Calico). Neighbors will exchange `podCIDR` routes and individual pods will become routable on the LAN.
|
||||
|
||||
Configure node(s) as BGP neighbors.
|
||||
|
||||
|
204
docs/topics/maintenance.md
Normal file
@ -0,0 +1,204 @@
|
||||
# Maintenance
|
||||
|
||||
## Best Practices
|
||||
|
||||
* Run multiple Kubernetes clusters. Run across platforms. Plan for regional and cloud outages.
|
||||
* Require applications be platform agnostic. Moving an application between a Kubernetes AWS cluster and a Kubernetes bare-metal cluster should be normal.
|
||||
* Strive to make single-cluster outages tolerable. Practice performing failovers.
|
||||
* Strive to make single-cluster outages a non-event. Load balance applications between multiple clusters, automate failover behaviors, and adjust alerting behaviors.
|
||||
|
||||
## Versioning
|
||||
|
||||
Typhoon provides tagged releases to allow clusters to be versioned using ordinary Terraform configs.
|
||||
|
||||
```
|
||||
module "google-cloud-yavin" {
|
||||
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.8.6"
|
||||
...
|
||||
}
|
||||
|
||||
module "bare-metal-mercury" {
|
||||
source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes?ref=v1.9.3"
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Master is updated regularly, so it is recommended to [pin](https://www.terraform.io/docs/modules/sources.html) modules to a [release tag](https://github.com/poseidon/typhoon/releases) or [commit](https://github.com/poseidon/typhoon/commits/master) hash. Pinning ensures `terraform get --update` only fetches the desired version.
|
||||
|
||||
## Upgrades
|
||||
|
||||
Typhoon recommends upgrading clusters using a blue-green replacement strategy and migrating workloads.
|
||||
|
||||
1. Launch new (candidate) clusters from tagged releases
|
||||
2. Apply workloads from existing cluster(s)
|
||||
3. Evaluate application health and performance
|
||||
4. Migrate application traffic to the new cluster
|
||||
5. Compare metrics and delete old cluster when ready
|
||||
|
||||
Blue-green replacement reduces risk for clusters running critical applications. Candidate clusters allow baseline properties of clusters to be assessed (e.g. pod-to-pod bandwidth). Applying application workloads allows health to be assessed before being subjected to traffic (e.g. detect any changes in Kubernetes behavior between versions). Migration to the new cluster can be controlled according to requirements. Migration may mean updating DNS records to resolve the new cluster's ingress or may involve a load balancer gradually shifting traffic to the new cluster "backend". Retain the old cluster for a time to compare metrics or for fallback if issues arise.
|
||||
|
||||
Blue-green replacement provides some subtler benefits as well:
|
||||
|
||||
* Encourages investment in tooling for traffic migration and failovers. When a cluster incident arises, shifting applications to a healthy cluster will be second nature.
|
||||
* Discourages reliance on in-place opqaue state. Retain confidence in your ability to create infrastructure from scratch.
|
||||
* Allows Typhoon to make architecture changes between releases and eases the burden on Typhoon maintainers. By contrast, distros promising in-place upgrades get stuck with their mistakes or require complex and error-prone migrations.
|
||||
|
||||
### Bare-Metal
|
||||
|
||||
Typhoon bare-metal clusters are provisioned by a PXE-enabled network boot environment and a [Matchbox](https://github.com/coreos/matchbox) service. To upgrade, re-provision machines into a new cluster.
|
||||
|
||||
Failover application workloads to another cluster (varies).
|
||||
|
||||
```
|
||||
kubectl config use-context other-context
|
||||
kubectl apply -f mercury -R
|
||||
# DNS or load balancer changes
|
||||
```
|
||||
|
||||
Power off bare-metal machines and set their next boot device to PXE.
|
||||
|
||||
```
|
||||
ipmitool -H node1.example.com -U USER -P PASS power off
|
||||
ipmitool -H node1.example.com -U USER -P PASS chassis bootdev pxe
|
||||
```
|
||||
|
||||
Delete or comment the Terraform config for the cluster.
|
||||
|
||||
```
|
||||
- module "bare-metal-mercury" {
|
||||
- source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes"
|
||||
- ...
|
||||
-}
|
||||
```
|
||||
|
||||
Apply to delete old provisioning configs from Matchbox.
|
||||
|
||||
```
|
||||
$ terraform apply
|
||||
Apply complete! Resources: 0 added, 0 changed, 55 destroyed.
|
||||
```
|
||||
|
||||
Re-provision a new cluster by following the bare-metal [tutorial](../bare-metal.md#cluster).
|
||||
|
||||
### Cloud
|
||||
|
||||
Create a new cluster following the tutorials. Failover application workloads to the new cluster (varies).
|
||||
|
||||
```
|
||||
kubectl config use-context other-context
|
||||
kubectl apply -f mercury -R
|
||||
# DNS or load balancer changes
|
||||
```
|
||||
|
||||
Once you're confident in the new cluster, delete the Terraform config for the old cluster.
|
||||
|
||||
```
|
||||
- module "google-cloud-yavin" {
|
||||
- source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes"
|
||||
- ...
|
||||
-}
|
||||
```
|
||||
|
||||
Apply to delete the cluster.
|
||||
|
||||
```
|
||||
$ terraform apply
|
||||
Apply complete! Resources: 0 added, 0 changed, 55 destroyed.
|
||||
```
|
||||
|
||||
### Alternatives
|
||||
|
||||
#### In-place Edits
|
||||
|
||||
Typhoon uses a self-hosted Kubernetes control plane which allows certain manifest upgrades to be performed in-place. Components like `apiserver`, `controller-manager`, `scheduler`, `flannel`/`calico`, `kube-dns`, and `kube-proxy` are run on Kubernetes itself and can be edited via `kubectl`. If you're interested, see the bootkube [upgrade docs](https://github.com/kubernetes-incubator/bootkube/blob/master/Documentation/upgrading.md).
|
||||
|
||||
In certain scenarios, in-place edits can be useful for quickly rolling out security patches (e.g. bumping `kube-dns`) or prioritizing speed over the safety of a proper cluster re-provision and transition.
|
||||
|
||||
!!! note
|
||||
Rarely, we may test certain security in-place edits and mention them as an option in release notes.
|
||||
|
||||
!!! warning
|
||||
Typhoon does not support or document in-place edits as an upgrade strategy. They involve inherent risks and we choose not to make recommendations or guarentees about the safety of different in-place upgrades. Its explicitly a non-goal.
|
||||
|
||||
#### Node Replacement
|
||||
|
||||
Typhoon supports multi-controller clusters, so it is possible to upgrade a cluster by deleting and replacing nodes one by one.
|
||||
|
||||
!!! warning
|
||||
Typhoon does not support or document node replacement as an upgrade strategy. It limits Typhoon's ability to make infrastructure and architectural changes between tagged releases.
|
||||
|
||||
## Terraform v0.11.x
|
||||
|
||||
Terraform v0.10.x to v0.11.x introduced breaking changes in the provider and module inheritance relationship that you MUST be aware of when upgrading to the v0.11.x `terraform` binary. Terraform now allows multiple named (i.e. aliased) copies of a provider to exist (e.g `aws.default`, `aws.somename`). Terraform now also requires providers be explicitly passed to modules in order to satisfy module version contraints (which Typhoon modules define). Full details can be found in [typhoon#77](https://github.com/poseidon/typhoon/issues/77) and [hashicorp#16824](https://github.com/hashicorp/terraform/issues/16824).
|
||||
|
||||
In particular, after upgrading to the v0.11.x `terraform` binary, you'll notice:
|
||||
|
||||
* `terraform plan` does not succeed and prompts for variables when it didn't before
|
||||
* `terraform plan` does not succeed and mentions "provider configuration block is required for all operations"
|
||||
* `terraform apply` fails when you comment or remove a module usage in order to delete a cluster
|
||||
|
||||
### New users
|
||||
|
||||
New users can start with Terraform v0.11.x and follow the Typhoon docs without issue.
|
||||
|
||||
### Existing
|
||||
|
||||
Users who used modules to create clusters with Terraform v0.10.x and still manage those clusters via Terraform must explicitly add each provider used in `provider.tf`:
|
||||
|
||||
```
|
||||
provider "local" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "null" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "template" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
|
||||
provider "tls" {
|
||||
version = "~> 1.0"
|
||||
alias = "default"
|
||||
}
|
||||
```
|
||||
|
||||
Modify the `google`, `aws`, or `digitalocean` provider section to specify an explicit `alias` name.
|
||||
|
||||
```
|
||||
provider "digitalocean" {
|
||||
version = "0.1.2"
|
||||
token = "${chomp(file("~/.config/digital-ocean/token"))}"
|
||||
alias = "default"
|
||||
}
|
||||
```
|
||||
|
||||
!!! note
|
||||
In these examples, we've chosen to name each provider "default", though the point of the Terraform changes is that other possibilities are possible.
|
||||
|
||||
Edit each instance (i.e. usage) of a module and explicitly pass the providers.
|
||||
|
||||
```
|
||||
module "aws-cluster" {
|
||||
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes"
|
||||
|
||||
providers = {
|
||||
aws = "aws.default"
|
||||
local = "local.default"
|
||||
null = "null.default"
|
||||
template = "template.default"
|
||||
tls = "tls.default"
|
||||
}
|
||||
|
||||
cluster_name = "somename"
|
||||
```
|
||||
|
||||
Re-run `terraform plan`. Plan will claim there are no changes to apply. Run `terraform apply` anyway as this will update Terraform state to be aware of the explicit provider versions.
|
||||
|
||||
### Verify
|
||||
|
||||
You should now be able to run `terraform plan` without errors. When you choose, you may comment or delete a module from Terraform configs and `terraform apply` should destroy the cluster correctly.
|
@ -19,7 +19,7 @@ Notes:
|
||||
|
||||
## Network Performance
|
||||
|
||||
Network performance varies based on the platform and CNI plugin. `iperf` was used to measture the bandwidth between different hosts and different pods. Host-to-host shows typical bandwidth between host machines. Pod-to-pod shows the bandwidth between two `iperf` containers.
|
||||
Network performance varies based on the platform and CNI plugin. `iperf` was used to measure the bandwidth between different hosts and different pods. Host-to-host shows typical bandwidth between host machines. Pod-to-pod shows the bandwidth between two `iperf` containers.
|
||||
|
||||
| Platform / Plugin | Theory | Host to Host | Pod to Pod |
|
||||
|----------------------------|-------:|-------------:|-------------:|
|
||||
@ -36,7 +36,7 @@ Network performance varies based on the platform and CNI plugin. `iperf` was use
|
||||
|
||||
Notes:
|
||||
|
||||
* Calico and Flannel have comparable performance. Platform and configuration differenes dominate.
|
||||
* Calico and Flannel have comparable performance. Platform and configuration differences dominate.
|
||||
* Neither CNI provider seems to be able to leverage bonded NICs (bare-metal)
|
||||
* AWS and Digital Ocean network bandwidth fluctuates more than on other platforms.
|
||||
* Only [certain AWS EC2 instance types](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html#jumbo_frame_instances) allow jumbo frames. This is why the default MTU on AWS must be 1480.
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Typhoon
|
||||
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
|
||||
|
||||
Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.
|
||||
|
||||
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
|
||||
|
||||
## Features
|
||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||
|
||||
* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Kubernetes v1.9.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||
|
@ -1,13 +1,14 @@
|
||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||
module "bootkube" {
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
|
||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=203b90169ead2380f74cc64ea1f02c109806c9bc"
|
||||
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||
etcd_servers = "${module.controllers.etcd_fqdns}"
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = 1440
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_name = "${var.cluster_name}"
|
||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||
etcd_servers = "${module.controllers.etcd_fqdns}"
|
||||
asset_dir = "${var.asset_dir}"
|
||||
networking = "${var.networking}"
|
||||
network_mtu = 1440
|
||||
pod_cidr = "${var.pod_cidr}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
}
|
||||
|
@ -15,6 +15,7 @@ module "controllers" {
|
||||
# configuration
|
||||
networking = "${var.networking}"
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
kubeconfig_ca_cert = "${module.bootkube.ca_cert}"
|
||||
kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
|
||||
kubeconfig_kubelet_key = "${module.bootkube.kubelet_key}"
|
||||
@ -36,6 +37,7 @@ module "workers" {
|
||||
|
||||
# configuration
|
||||
service_cidr = "${var.service_cidr}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
kubeconfig_ca_cert = "${module.bootkube.ca_cert}"
|
||||
kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
|
||||
kubeconfig_kubelet_key = "${module.bootkube.kubelet_key}"
|
||||
|
@ -7,7 +7,7 @@ systemd:
|
||||
- name: 40-etcd-cluster.conf
|
||||
contents: |
|
||||
[Service]
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.0"
|
||||
Environment="ETCD_IMAGE_TAG=v3.2.15"
|
||||
Environment="ETCD_NAME=${etcd_name}"
|
||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
||||
@ -41,11 +41,12 @@ systemd:
|
||||
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
|
||||
[Install]
|
||||
RequiredBy=kubelet.service
|
||||
RequiredBy=etcd-member.service
|
||||
- name: kubelet.service
|
||||
enable: true
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -73,7 +74,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--kubeconfig=/etc/kubernetes/kubeconfig \
|
||||
@ -129,7 +130,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.3
|
||||
- path: /etc/sysctl.d/max-user-watches.conf
|
||||
filesystem: root
|
||||
contents:
|
||||
@ -148,11 +149,9 @@ storage:
|
||||
# Wrapper for bootkube start
|
||||
set -e
|
||||
# Move experimental manifests
|
||||
[ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
[ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
|
||||
[ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
|
||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
|
||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.10.0}"
|
||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
||||
exec /usr/bin/rkt run \
|
||||
--trust-keys-from-https \
|
||||
|
@ -48,7 +48,7 @@ resource "google_compute_instance" "controllers" {
|
||||
}
|
||||
|
||||
can_ip_forward = true
|
||||
tags = ["${var.cluster_name}-controller"]
|
||||
tags = ["${var.cluster_name}-controller"]
|
||||
}
|
||||
|
||||
# Controller Container Linux Config
|
||||
@ -66,6 +66,7 @@ data "template_file" "controller_config" {
|
||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
|
||||
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
kubeconfig_ca_cert = "${var.kubeconfig_ca_cert}"
|
||||
kubeconfig_kubelet_cert = "${var.kubeconfig_kubelet_cert}"
|
||||
|
@ -69,6 +69,12 @@ EOD
|
||||
default = "10.3.0.0/16"
|
||||
}
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
||||
// kubeconfig
|
||||
|
||||
variable "kubeconfig_ca_cert" {
|
||||
|
@ -14,7 +14,7 @@ resource "google_compute_firewall" "allow-ssh" {
|
||||
}
|
||||
|
||||
source_ranges = ["0.0.0.0/0"]
|
||||
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
|
||||
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
|
||||
}
|
||||
|
||||
resource "google_compute_firewall" "allow-apiserver" {
|
||||
@ -27,10 +27,9 @@ resource "google_compute_firewall" "allow-apiserver" {
|
||||
}
|
||||
|
||||
source_ranges = ["0.0.0.0/0"]
|
||||
target_tags = ["${var.cluster_name}-controller"]
|
||||
target_tags = ["${var.cluster_name}-controller"]
|
||||
}
|
||||
|
||||
|
||||
resource "google_compute_firewall" "allow-ingress" {
|
||||
name = "${var.cluster_name}-allow-ingress"
|
||||
network = "${google_compute_network.network.name}"
|
||||
@ -41,7 +40,7 @@ resource "google_compute_firewall" "allow-ingress" {
|
||||
}
|
||||
|
||||
source_ranges = ["0.0.0.0/0"]
|
||||
target_tags = ["${var.cluster_name}-worker"]
|
||||
target_tags = ["${var.cluster_name}-worker"]
|
||||
}
|
||||
|
||||
resource "google_compute_firewall" "internal-etcd" {
|
||||
|
@ -80,3 +80,9 @@ EOD
|
||||
type = "string"
|
||||
default = "10.3.0.0/16"
|
||||
}
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
@ -22,7 +22,7 @@ systemd:
|
||||
enable: true
|
||||
contents: |
|
||||
[Unit]
|
||||
Description=Kubelet via Hyperkube ACI
|
||||
Description=Kubelet via Hyperkube
|
||||
Wants=rpc-statd.service
|
||||
[Service]
|
||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||
@ -50,7 +50,7 @@ systemd:
|
||||
--anonymous-auth=false \
|
||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||
--cluster_dns=${k8s_dns_service_ip} \
|
||||
--cluster_domain=cluster.local \
|
||||
--cluster_domain=${cluster_domain_suffix} \
|
||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||
--exit-on-lock-contention \
|
||||
--kubeconfig=/etc/kubernetes/kubeconfig \
|
||||
@ -104,7 +104,7 @@ storage:
|
||||
contents:
|
||||
inline: |
|
||||
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
|
||||
KUBELET_IMAGE_TAG=v1.8.3
|
||||
KUBELET_IMAGE_TAG=v1.9.3
|
||||
- path: /etc/sysctl.d/max-user-watches.conf
|
||||
filesystem: root
|
||||
contents:
|
||||
@ -122,7 +122,7 @@ storage:
|
||||
--volume config,kind=host,source=/etc/kubernetes \
|
||||
--mount volume=config,target=/etc/kubernetes \
|
||||
--insecure-options=image \
|
||||
docker://gcr.io/google_containers/hyperkube:v1.8.3 \
|
||||
docker://gcr.io/google_containers/hyperkube:v1.9.3 \
|
||||
--net=host \
|
||||
--dns=host \
|
||||
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
|
||||
|
@ -59,6 +59,12 @@ EOD
|
||||
default = "10.3.0.0/16"
|
||||
}
|
||||
|
||||
variable "cluster_domain_suffix" {
|
||||
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
|
||||
type = "string"
|
||||
default = "cluster.local"
|
||||
}
|
||||
|
||||
# kubeconfig
|
||||
|
||||
variable "kubeconfig_ca_cert" {
|
||||
|
@ -24,6 +24,7 @@ data "template_file" "worker_config" {
|
||||
vars = {
|
||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||
k8s_etcd_service_ip = "${cidrhost(var.service_cidr, 15)}"
|
||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||
kubeconfig_ca_cert = "${var.kubeconfig_ca_cert}"
|
||||
kubeconfig_kubelet_cert = "${var.kubeconfig_kubelet_cert}"
|
||||
|