Add v1.9.6 heading to CHANGES.md

addons: Update heapster from v1.5.1 to v1.5.2
* https://github.com/kubernetes/heapster/releases/tag/v1.5.2
2025-08-03 00:51:35 +02:00 · 2018-03-22 22:01:29 -07:00 · 2018-03-21 20:32:01 -07:00 · 2018-03-21 20:29:52 -07:00 · 2018-03-19 23:17:17 -07:00 · 2018-03-19 00:25:44 -07:00
130 changed files with 10167 additions and 1337 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@ -4,7 +4,7 @@

 ### Environment

-* Platform: bare-metal, google-cloud, digital-ocean
+* Platform: aws, bare-metal, google-cloud, digital-ocean
 * OS: container-linux, fedora-cloud
 * Terraform: `terraform version`
 * Plugins: Provider plugin versions
--- a/CHANGES.md
+++ b/CHANGES.md
@ -4,6 +4,196 @@ Notable changes between versions.

 ## Latest

+## v1.9.6
+
+* Kubernetes [v1.9.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v196)
+* Update Calico from v3.0.3 to v3.0.4
+
+#### Addons
+
+* Update heapster from v1.5.1 to v1.5.2
+
+## v1.9.5
+
+* Kubernetes [v1.9.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v195)
+  * Fix `subPath` volume mounts regression ([kubernetes#61076](https://github.com/kubernetes/kubernetes/issues/61076))
+* Introduce [Container Linux Config snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) on cloud platforms ([#145](https://github.com/poseidon/typhoon/pull/145))
+  * Validate and additively merge custom Container Linux Configs during `terraform plan`
+  * Define files, systemd units, dropins, networkd configs, mounts, users, and more
+  * Require updating `terraform-provider-ct` plugin from v0.2.0 to v0.2.1
+* Add `node-role.kubernetes.io/controller="true"` node label to controllers ([#160](https://github.com/poseidon/typhoon/pull/160))
+
+#### AWS
+
+* [Require](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action required!)
+
+#### Digital Ocean
+
+* [Require](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action required!)
+
+#### Google Cloud
+
+* [Require](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action required!)
+* Relax `os_image` to optional. Default to "coreos-stable".
+
+#### Addons
+
+* Update nginx-ingress from 0.11.0 to 0.12.0
+* Update Prometheus from 2.2.0 to 2.2.1
+
+## v1.9.4
+
+* Kubernetes [v1.9.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v194)
+  * Secret, configMap, downward API, and projected volumes now read-only (breaking, [kubernetes#58720](https://github.com/kubernetes/kubernetes/pull/58720))
+  * Regressed `subPath` volume mounts (regression, [kubernetes#61076](https://github.com/kubernetes/kubernetes/issues/61076))
+  * Mitigated `subPath` [CVE-2017-1002101](https://github.com/kubernetes/kubernetes/issues/60813)
+* Introduce [worker pools](https://typhoon.psdn.io/advanced/worker-pools/) for AWS and Google Cloud for joining heterogeneous workers to existing clusters.
+* Use new Network Load Balancers and cross zone load balancing on AWS
+* Allow flexvolume plugins to be used on any Typhoon cluster (not just bare-metal)
+* Upgrade etcd from v3.2.15 to v3.3.2
+* Update Calico from v3.0.2 to v3.0.3
+* Use kubernetes-incubator/bootkube v0.10.0
+* [Recommend](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action recommended)
+
+#### AWS
+
+* Promote AWS platform to stable
+* Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) ([#150](https://github.com/poseidon/typhoon/pull/150))
+* Replace the apiserver elastic load balancer with a network load balancer ([#136](https://github.com/poseidon/typhoon/pull/136))
+* Replace the Ingress elastic load balancer with a network load balancer ([#141](https://github.com/poseidon/typhoon/pull/141))
+  * AWS [NLBs](https://aws.amazon.com/blogs/aws/new-network-load-balancer-effortless-scaling-to-millions-of-requests-per-second/) can handle millions of RPS with high throughput and low latency.
+  * Require `terraform-provider-aws` 1.7.0 or higher
+* Enable NLB [cross-zone](https://aws.amazon.com/about-aws/whats-new/2018/02/network-load-balancer-now-supports-cross-zone-load-balancing/) load balancing ([#159](https://github.com/poseidon/typhoon/pull/159))
+  * Requests are automatically evenly distributed to targets regardless of AZ
+  * Require `terraform-provider-aws` 1.11.0 or higher
+* Add kubelet `--volume-plugin-dir` flag to allow flexvolume plugins ([#142](https://github.com/poseidon/typhoon/pull/142))
+* Fix controller and worker launch configs to ignore AMI changes ([#126](https://github.com/poseidon/typhoon/pull/126), [#158](https://github.com/poseidon/typhoon/pull/158))
+
+#### Digital Ocean
+
+* Add kubelet `--volume-plugin-dir` flag to allow flexvolume plugins ([#142](https://github.com/poseidon/typhoon/pull/142))
+* Fix to pass `ssh_fingerprints` as a list to droplets ([#143](https://github.com/poseidon/typhoon/pull/143))
+
+#### Google Cloud
+
+* Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) ([#148](https://github.com/poseidon/typhoon/pull/148))
+* Add kubelet `--volume-plugin-dir` flag to allow flexvolume plugins ([#142](https://github.com/poseidon/typhoon/pull/142))
+* Add `kubeconfig` variable to `controllers` and `workers` submodules ([#147](https://github.com/poseidon/typhoon/pull/147))
+* Remove `kubeconfig_*` variables from `controllers` and `workers` submodules ([#147](https://github.com/poseidon/typhoon/pull/147))
+* Allow initial experimentation with accelerators (i.e. GPUs) on workers ([#161](https://github.com/poseidon/typhoon/pull/161)) (unofficial)
+  * Require `terraform-provider-google` v1.6.0
+
+#### Addons
+
+* Update Prometheus from 2.1.0 to 2.2.0 ([#153](https://github.com/poseidon/typhoon/pull/153))
+  * Scrape Prometheus itself to enable alerts about Prometheus itself
+  * Adjust KubeletDown rule to fire when 10% of kubelets are down
+* Update heapster from v1.5.0 to v1.5.1 ([#131](https://github.com/poseidon/typhoon/pull/131))
+  * Use separate service account
+* Update nginx-ingress from 0.10.2 to 0.11.0
+
+## v1.9.3
+
+* Kubernetes [v1.9.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v193)
+* Network improvements and fixes ([#104](https://github.com/poseidon/typhoon/pull/104))
+  * Switch from Calico v2.6.6 to v3.0.2
+  * Add Calico GlobalNetworkSet CRD
+  * Update flannel from v0.9.0 to v0.10.0
+  * Use separate service account for flannel
+* Update etcd from v3.2.14 to v3.2.15
+
+#### Digital Ocean
+
+* Use new Droplet [types](https://developers.digitalocean.com/documentation/changelog/api-v2/new-size-slugs-for-droplet-plan-changes/) which offer more CPU/memory, at lower cost. ([#105](https://github.com/poseidon/typhoon/pull/105))
+  * A small Digital Ocean cluster costs less than $25 a month!
+
+#### Addons
+
+* Update Prometheus from v2.0.0 to v2.1.0 ([#113](https://github.com/poseidon/typhoon/pull/113))
+  * Improve alerting rules
+  * Relabel discovered kubelet, endpoint, service, and apiserver scrapes
+  * Use separate service accounts
+  * Update node-exporter and kube-state-metrics
+* Include Grafana dashboards for Kubernetes admins ([#113](https://github.com/poseidon/typhoon/pull/113))
+  * Add grafana-watcher to load bundled upstream dashboards
+* Update nginx-ingress from 0.9.0 to 0.10.2
+* Update CLUO from v0.5.0 to v0.6.0
+* Switch manifests to use `apps/v1` Deployments and Daemonsets ([#120](https://github.com/poseidon/typhoon/pull/120))
+* Remove Kubernetes Dashboard manifests ([#121](https://github.com/poseidon/typhoon/pull/121))
+
+## v1.9.2
+
+* Kubernetes [v1.9.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v192)
+* Add Terraform v0.11.x support
+  * Add explicit "providers" section to modules for Terraform v0.11.x
+  * Retain support for Terraform v0.10.4+
+* Add [migration guide](https://typhoon.psdn.io/topics/maintenance/#terraform-v011x) from Terraform v0.10.x to v0.11.x (**action required!**)
+* Update etcd from 3.2.13 to 3.2.14
+* Update calico from 2.6.5 to 2.6.6
+* Update kube-dns from v1.14.7 to v1.14.8
+* Use separate service account for kube-dns
+* Use kubernetes-incubator/bootkube v0.10.0
+
+#### Bare-Metal
+
+* Use per-node Container Linux install profiles ([#97](https://github.com/poseidon/typhoon/pull/97))
+  * Allow Container Linux channel/version to be chosen per-cluster
+  * Fix issue where cluster deletion could require `terraform apply` multiple times
+
+#### Digital Ocean
+
+* Relax `digitalocean` provider version constraint
+* Fix bug with `terraform plan` always showing a firewall diff to be applied ([#3](https://github.com/poseidon/typhoon/issues/3))
+
+#### Addons
+
+* Update CLUO to v0.5.0 to fix compatibility with Kubernetes 1.9 (**important**)
+  * Earlier versions can't roll out Container Linux updates on Kubernetes 1.9 nodes ([cluo#163](https://github.com/coreos/container-linux-update-operator/issues/163))
+* Update kube-state-metrics from v1.1.0 to v1.2.0
+* Fix RBAC cluster role for kube-state-metrics
+
+## v1.9.1
+
+* Kubernetes [v1.9.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v191)
+* Update kube-dns from 1.14.5 to v1.14.7
+* Update etcd from 3.2.0 to 3.2.13
+* Update Calico from v2.6.4 to v2.6.5
+* Enable portmap to fix hostPort with Calico
+* Use separate service account for controller-manager
+
+## v1.8.6
+
+* Kubernetes [v1.8.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#v186)
+* Update Calico from v2.6.3 to v2.6.4
+
+## v1.8.5
+
+* Kubernetes [v1.8.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#v185)
+* Recommend Container Linux [images](https://coreos.com/releases/) with Docker 17.09
+  * Container Linux stable, beta, and alpha now provide Docker 17.09 (instead
+  of 1.12)
+  * Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09
+* Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69)) 
+* Add optional `cluster_domain_suffix` variable (#74)
+* Use kubernetes-incubator/bootkube v0.9.1
+
+#### Bare-Metal
+
+* Add kubelet `--volume-plugin-dir` flag to allow flexvolume providers ([#61](https://github.com/poseidon/typhoon/pull/61))
+
+#### Addons
+
+* Discourage deploying the Kubernetes Dashboard (security)
+
+## v1.8.4
+
+* Kubernetes v1.8.4
+* Calico related bug fixes
+* Update Calico from v2.6.1 to v2.6.3
+* Update flannel from v0.9.0 to v0.9.1
+* Service accounts for kube-proxy and pod-checkpointer
+* Use kubernetes-incubator/bootkube v0.9.0
+
 ## v1.8.3

 * Kubernetes v1.8.3
@ -86,7 +276,7 @@ Notable changes between versions.
 ## v1.7.3

 * Kubernetes v1.7.3
-* Use kubernete-incubator/bootkube v0.6.1
+* Use kubernetes-incubator/bootkube v0.6.1

 #### Digital Ocean

@ -96,7 +286,7 @@ Notable changes between versions.
 ## v1.7.1

 * Kubernetes v1.7.1
-* Use kubernete-incubator/bootkube v0.6.0
+* Use kubernetes-incubator/bootkube v0.6.0
 * Add Bare-Metal Terraform module (stable)
 * Add Digital Ocean Terraform module (beta)

@ -109,12 +299,12 @@ Notable changes between versions.
 ## v1.6.7

 * Kubernetes v1.6.7
-* Use kubernete-incubator/bootkube v0.5.1
+* Use kubernetes-incubator/bootkube v0.5.1

 ## v1.6.6

 * Kubernetes v1.6.6
-* Use kubernete-incubator/bootkube v0.4.5
+* Use kubernetes-incubator/bootkube v0.4.5
 * Disable locksmithd on hosts, in favor of [CLUO](https://github.com/coreos/container-linux-update-operator).

 ## v1.6.4
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -2,4 +2,4 @@

 ## Developer Certificate of Origin

-By contributing, you agree to the Linux Foundation's Developer Certificate of Origin ([DOC](DCO)). The DCO is a statement that you, the contributor, have the legal right to make your contribution and understand the contribution will be distributed as part of this project.
+By contributing, you agree to the Linux Foundation's Developer Certificate of Origin ([DCO](DCO)). The DCO is a statement that you, the contributor, have the legal right to make your contribution and understand the contribution will be distributed as part of this project.
--- a/README.md
+++ b/README.md
@ -1,4 +1,4 @@
-# Typhoon [![IRC](https://img.shields.io/badge/freenode-%23typhoon-0099ef.svg)]() <img align="right" src="https://storage.googleapis.com/dghubble/spin.png">
+# Typhoon [![IRC](https://img.shields.io/badge/freenode-%23typhoon-0099ef.svg)]() <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">

 Typhoon is a minimal and free Kubernetes distribution.

@ -9,12 +9,13 @@ Typhoon is a minimal and free Kubernetes distribution.

 Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.

-## Features
+## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.9.6 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
-* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
+* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/) and [preemption](https://typhoon.psdn.io/google-cloud/#preemption) (varies by platform)
+* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)

 ## Modules

@ -22,7 +23,7 @@ Typhoon provides a Terraform Module for each supported operating system and plat

 | Platform      | Operating System | Terraform Module | Status |
 |---------------|------------------|------------------|--------|
-| AWS           | Container Linux  | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | beta |
+| AWS           | Container Linux  | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable |
 | Bare-Metal    | Container Linux  | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
 | Digital Ocean | Container Linux  | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta |
 | Google Cloud  | Container Linux  | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | beta |
@ -43,13 +44,21 @@ Define a Kubernetes cluster by using the Terraform module for your chosen platfo

 ```tf
 module "google-cloud-yavin" {
-  source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes"
+  source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.9.6"
+  
+  providers = {
+    google = "google.default"
+    local = "local.default"
+    null = "null.default"
+    template = "template.default"
+    tls = "tls.default"
+  }

  # Google Cloud
  region        = "us-central1"
  dns_zone      = "example.com"
  dns_zone_name = "example-zone"
-  os_image      = "coreos-stable-1465-6-0-v20170817"
+  os_image      = "coreos-stable"

  cluster_name       = "yavin"
  controller_count   = 1
@ -75,12 +84,12 @@ Apply complete! Resources: 37 added, 0 changed, 0 destroyed.
 In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.

 ```sh
-$ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
+$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
 $ kubectl get nodes
 NAME                                          STATUS   AGE    VERSION
-yavin-controller-0.c.example-com.internal     Ready    6m     v1.8.3
-yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.8.3
-yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.8.3
+yavin-controller-0.c.example-com.internal     Ready    6m     v1.9.6
+yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.9.6
+yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.9.6
 ```

 List the pods.
@ -115,11 +124,11 @@ Typhoon is strict about minimalism, maturity, and scope. These are not in scope:

 Ask questions on the IRC #typhoon channel on [freenode.net](http://freenode.net/).

-## Background
+## Motivation

 Typhoon powers the author's cloud and colocation clusters. The project has evolved through operational experience and Kubernetes changes. Typhoon is shared under a free license to allow others to use the work freely and contribute to its upkeep.

-Typhoon addresses real world needs, which you may share. It is honest about limitations or areas that aren't mature yet. It avoids buzzword bingo and hype. It does not aim to be the one-solution-fits-all distro. An ecosystem of free (or enterprise) Kubernetes distros is healthy.
+Typhoon addresses real world needs, which you may share. It is honest about limitations or areas that aren't mature yet. It avoids buzzword bingo and hype. It does not aim to be the one-solution-fits-all distro. An ecosystem of Kubernetes distributions is healthy.

 ## Social Contract

@ -127,4 +136,6 @@ Typhoon is not a product, trial, or free-tier. It is not run by a company, does

 Typhoon clusters will contain only [free](https://www.debian.org/intro/free) components. Cluster components will not collect data on users without their permission.

-*Disclosure: The author works for CoreOS and previously wrote Matchbox and original Tectonic for bare-metal and AWS. This project is not associated with CoreOS.*
+## Donations
+
+Typhoon does not accept money donations. Instead, we encourage you to donate to one of [these organizations](https://github.com/poseidon/typhoon/wiki/Donations) to show your appreciation.
--- a/addons/cluo/0-namespace.yaml
+++ b/addons/cluo/0-namespace.yaml
--- a/addons/cluo/cluster-role-binding.yaml
+++ b/addons/cluo/cluster-role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: reboot-coordinator
 roleRef:
--- a/addons/cluo/cluster-role.yaml
+++ b/addons/cluo/cluster-role.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: reboot-coordinator
--- a/addons/cluo/update-agent.yaml
+++ b/addons/cluo/update-agent.yaml
@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: DaemonSet
 metadata:
  name: container-linux-update-agent
@ -8,6 +8,9 @@ spec:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
+  selector:
+    matchLabels:
+      app: container-linux-update-agent
  template:
    metadata:
      labels:
@ -15,7 +18,7 @@ spec:
    spec:
      containers:
      - name: update-agent
-        image: quay.io/coreos/container-linux-update-operator:v0.4.1
+        image: quay.io/coreos/container-linux-update-operator:v0.6.0
        command:
        - "/bin/update-agent"
        volumeMounts:
--- a/addons/cluo/update-operator.yaml
+++ b/addons/cluo/update-operator.yaml
@ -1,10 +1,13 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: container-linux-update-operator
  namespace: reboot-coordinator
 spec:
  replicas: 1
+  selector:
+    matchLabels:
+      app: container-linux-update-operator
  template:
    metadata:
      labels:
@ -12,7 +15,7 @@ spec:
    spec:
      containers:
      - name: update-operator
-        image: quay.io/coreos/container-linux-update-operator:v0.4.1
+        image: quay.io/coreos/container-linux-update-operator:v0.6.0
        command:
        - "/bin/update-operator"
        env:
--- a/addons/dashboard/deployment.yaml
+++ b/addons/dashboard/deployment.yaml
@ -1,32 +0,0 @@
-apiVersion: extensions/v1beta1
-kind: Deployment
-metadata:
-  name: kubernetes-dashboard
-  namespace: kube-system
-spec:
-  replicas: 1
-  template:
-    metadata:
-      labels:
-        name: kubernetes-dashboard
-        phase: prod
-    spec:
-      containers:
-        - name: kubernetes-dashboard
-          image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.1
-          ports:
-            - name: http
-              containerPort: 9090
-          resources:
-            limits:
-              cpu: 100m
-              memory: 300Mi
-            requests:
-              cpu: 100m
-              memory: 100Mi
-          livenessProbe:
-            httpGet:
-              path: /
-              port: 9090
-            initialDelaySeconds: 30
-            timeoutSeconds: 30
--- a/addons/dashboard/service.yaml
+++ b/addons/dashboard/service.yaml
@ -1,15 +0,0 @@
-apiVersion: v1
-kind: Service
-metadata:
-  name: kubernetes-dashboard
-  namespace: kube-system
-spec:
-  type: ClusterIP
-  selector:
-    name: kubernetes-dashboard
-    phase: prod
-  ports:
-    - name: http
-      protocol: TCP
-      port: 80
-      targetPort: 9090
--- a/addons/grafana/config.yaml
+++ b/addons/grafana/config.yaml
@ -0,0 +1,7499 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grafana-dashboards
+  namespace: monitoring
+data:
+  deployment-dashboard.json: |+
+    {
+      "dashboard":
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 1,
+      "hideControls": false,
+      "links": [],
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "200px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 8,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "cores",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "CPU",
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 9,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "GB",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "80%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(container_memory_usage_bytes{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}) / 1024^3",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Memory",
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "Bps",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": false
+              },
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Network",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "100px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": false
+              },
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "metric": "kube_deployment_spec_replicas",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Desired Replicas",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 6,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Available Replicas",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 3,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_deployment_status_observed_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Observed Generation",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_deployment_metadata_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Metadata Generation",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "350px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 1,
+              "isNew": true,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "max(kube_deployment_status_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "current replicas",
+                  "refId": "A",
+                  "step": 30
+                },
+                {
+                  "expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "available",
+                  "refId": "B",
+                  "step": 30
+                },
+                {
+                  "expr": "max(kube_deployment_status_replicas_unavailable{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "unavailable",
+                  "refId": "C",
+                  "step": 30
+                },
+                {
+                  "expr": "min(kube_deployment_status_replicas_updated{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "updated",
+                  "refId": "D",
+                  "step": 30
+                },
+                {
+                  "expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "desired",
+                  "refId": "E",
+                  "step": 30
+                }
+              ],
+              "title": "Replicas",
+              "tooltip": {
+                "msResolution": true,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "none",
+                  "label": "",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": "",
+                  "logBase": 1,
+                  "show": false
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": [
+          {
+            "allValue": ".*",
+            "current": {},
+            "datasource": "${DS_PROMETHEUS}",
+            "hide": 0,
+            "includeAll": false,
+            "label": "Namespace",
+            "multi": false,
+            "name": "deployment_namespace",
+            "options": [],
+            "query": "label_values(kube_deployment_metadata_generation, namespace)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": null,
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          },
+          {
+            "allValue": null,
+            "current": {},
+            "datasource": "${DS_PROMETHEUS}",
+            "hide": 0,
+            "includeAll": false,
+            "label": "Deployment",
+            "multi": false,
+            "name": "deployment_name",
+            "options": [],
+            "query": "label_values(kube_deployment_metadata_generation{namespace=\"$deployment_namespace\"}, deployment)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "deployment",
+            "type": "query",
+            "useTags": false
+          }
+        ]
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Deployment",
+      "version": 1
+    }
+    ,
+      "inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "type": "datasource",
+          "value": "prometheus"
+        }
+      ],
+      "overwrite": true
+    }
+  etcd-dashboard.json: |+
+    {
+      "dashboard":
+    {
+      "__inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "label": "prometheus",
+          "description": "",
+          "type": "datasource",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus"
+        }
+      ],
+      "__requires": [
+        {
+          "type": "grafana",
+          "id": "grafana",
+          "name": "Grafana",
+          "version": "4.5.2"
+        },
+        {
+          "type": "panel",
+          "id": "graph",
+          "name": "Graph",
+          "version": ""
+        },
+        {
+          "type": "datasource",
+          "id": "prometheus",
+          "name": "Prometheus",
+          "version": "1.0.0"
+        },
+        {
+          "type": "panel",
+          "id": "singlestat",
+          "name": "Singlestat",
+          "version": ""
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "description": "etcd sample Grafana dashboard with Prometheus",
+      "editable": false,
+      "gnetId": null,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "id": null,
+      "links": [],
+      "refresh": false,
+      "rows": [
+        {
+          "collapse": false,
+          "height": "250px",
+          "panels": [
+            {
+              "cacheTimeout": null,
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 28,
+              "interval": null,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "nullText": null,
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "tableColumn": "",
+              "targets": [
+                {
+                  "expr": "sum(etcd_server_has_leader)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "metric": "etcd_server_has_leader",
+                  "refId": "A",
+                  "step": 20
+                }
+              ],
+              "thresholds": "",
+              "title": "Up",
+              "type": "singlestat",
+              "valueFontSize": "200%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 23,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 5,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(grpc_server_started_total{grpc_type=\"unary\"}[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "RPC Rate",
+                  "metric": "grpc_server_started_total",
+                  "refId": "A",
+                  "step": 4
+                },
+                {
+                  "expr": "sum(rate(grpc_server_handled_total{grpc_type=\"unary\",grpc_code!=\"OK\"}[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "RPC Failed Rate",
+                  "metric": "grpc_server_handled_total",
+                  "refId": "B",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "RPC Rate",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "ops",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 41,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 4,
+              "stack": true,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Watch Streams",
+                  "metric": "grpc_server_handled_total",
+                  "refId": "A",
+                  "step": 4
+                },
+                {
+                  "expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Lease Streams",
+                  "metric": "grpc_server_handled_total",
+                  "refId": "B",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Active Streams",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "label": "",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "repeat": null,
+          "repeatIteration": null,
+          "repeatRowId": null,
+          "showTitle": false,
+          "title": "Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "decimals": null,
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "grid": {},
+              "id": 1,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 4,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "etcd_debugging_mvcc_db_total_size_in_bytes",
+                  "format": "time_series",
+                  "hide": false,
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} DB Size",
+                  "metric": "",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "DB Size",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": false
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "grid": {},
+              "id": 3,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 1,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 4,
+              "stack": false,
+              "steppedLine": true,
+              "targets": [
+                {
+                  "expr": "histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))",
+                  "format": "time_series",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} WAL fsync",
+                  "metric": "etcd_disk_wal_fsync_duration_seconds_bucket",
+                  "refId": "A",
+                  "step": 4
+                },
+                {
+                  "expr": "histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} DB fsync",
+                  "metric": "etcd_disk_backend_commit_duration_seconds_bucket",
+                  "refId": "B",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Disk Sync Duration",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "s",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": false
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 29,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 4,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "process_resident_memory_bytes",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Resident Memory",
+                  "metric": "process_resident_memory_bytes",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Memory",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "repeat": null,
+          "repeatIteration": null,
+          "repeatRowId": null,
+          "showTitle": false,
+          "title": "New row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 5,
+              "id": 22,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 3,
+              "stack": true,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "rate(etcd_network_client_grpc_received_bytes_total[5m])",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Client Traffic In",
+                  "metric": "etcd_network_client_grpc_received_bytes_total",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Client Traffic In",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "Bps",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 5,
+              "id": 21,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 3,
+              "stack": true,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "rate(etcd_network_client_grpc_sent_bytes_total[5m])",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Client Traffic Out",
+                  "metric": "etcd_network_client_grpc_sent_bytes_total",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Client Traffic Out",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "Bps",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 20,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 3,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Peer Traffic In",
+                  "metric": "etcd_network_peer_received_bytes_total",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Peer Traffic In",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "Bps",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "decimals": null,
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "grid": {},
+              "id": 16,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 3,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance)",
+                  "format": "time_series",
+                  "hide": false,
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Peer Traffic Out",
+                  "metric": "etcd_network_peer_sent_bytes_total",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Peer Traffic Out",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "Bps",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "repeat": null,
+          "repeatIteration": null,
+          "repeatRowId": null,
+          "showTitle": false,
+          "title": "New row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 40,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(etcd_server_proposals_failed_total[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Proposal Failure Rate",
+                  "metric": "etcd_server_proposals_failed_total",
+                  "refId": "A",
+                  "step": 2
+                },
+                {
+                  "expr": "sum(etcd_server_proposals_pending)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Proposal Pending Total",
+                  "metric": "etcd_server_proposals_pending",
+                  "refId": "B",
+                  "step": 2
+                },
+                {
+                  "expr": "sum(rate(etcd_server_proposals_committed_total[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Proposal Commit Rate",
+                  "metric": "etcd_server_proposals_committed_total",
+                  "refId": "C",
+                  "step": 2
+                },
+                {
+                  "expr": "sum(rate(etcd_server_proposals_applied_total[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Proposal Apply Rate",
+                  "refId": "D",
+                  "step": 2
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Raft Proposals",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "label": "",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "decimals": 0,
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 19,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "changes(etcd_server_leader_changes_seen_total[1d])",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Total Leader Elections Per Day",
+                  "metric": "etcd_server_leader_changes_seen_total",
+                  "refId": "A",
+                  "step": 2
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Total Leader Elections Per Day",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "repeat": null,
+          "repeatIteration": null,
+          "repeatRowId": null,
+          "showTitle": false,
+          "title": "New row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-15m",
+        "to": "now"
+      },
+      "timepicker": {
+        "now": true,
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "etcd",
+      "version": 4
+    }
+    ,
+      "inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "type": "datasource",
+          "value": "prometheus"
+        }
+      ],
+      "overwrite": true
+    }
+  kubernetes-capacity-planning-dashboard.json: |+
+    {
+      "dashboard":
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "gnetId": 22,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "refresh": false,
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 3,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(node_cpu{mode=\"idle\"}[2m])) * 100",
+                  "hide": false,
+                  "intervalFactor": 10,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 50
+                }
+              ],
+              "title": "Idle CPU",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "percent",
+                  "label": "cpu usage",
+                  "logBase": 1,
+                  "min": 0,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 9,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(node_load1)",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 1m",
+                  "refId": "A",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(node_load5)",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 5m",
+                  "refId": "B",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(node_load15)",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 15m",
+                  "refId": "C",
+                  "step": 20,
+                  "target": ""
+                }
+              ],
+              "title": "System Load",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "percentunit",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 4,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": true,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory usage",
+                  "metric": "memo",
+                  "refId": "A",
+                  "step": 10,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(node_memory_Buffers)",
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory buffers",
+                  "metric": "memo",
+                  "refId": "B",
+                  "step": 10,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(node_memory_Cached)",
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory cached",
+                  "metric": "memo",
+                  "refId": "C",
+                  "step": 10,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(node_memory_MemFree)",
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory free",
+                  "metric": "memo",
+                  "refId": "D",
+                  "step": 10,
+                  "target": ""
+                }
+              ],
+              "title": "Memory Usage",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "min": "0",
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100",
+                  "intervalFactor": 2,
+                  "metric": "",
+                  "refId": "A",
+                  "step": 60,
+                  "target": ""
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Memory Usage",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "246px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 6,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "read",
+                  "yaxis": 1
+                },
+                {
+                  "alias": "{instance=\"172.17.0.1:9100\"}",
+                  "yaxis": 2
+                },
+                {
+                  "alias": "io time",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(node_disk_bytes_read[5m]))",
+                  "hide": false,
+                  "intervalFactor": 4,
+                  "legendFormat": "read",
+                  "refId": "A",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(rate(node_disk_bytes_written[5m]))",
+                  "intervalFactor": 4,
+                  "legendFormat": "written",
+                  "refId": "B",
+                  "step": 20
+                },
+                {
+                  "expr": "sum(rate(node_disk_io_time_ms[5m]))",
+                  "intervalFactor": 4,
+                  "legendFormat": "io time",
+                  "refId": "C",
+                  "step": 20
+                }
+              ],
+              "title": "Disk I/O",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "ms",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percentunit",
+              "gauge": {
+                "maxValue": 1,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 12,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 60,
+                  "target": ""
+                }
+              ],
+              "thresholds": "0.75, 0.9",
+              "title": "Disk Space Usage",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 8,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "transmitted",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(node_network_receive_bytes{device!~\"lo\"}[5m]))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 10,
+                  "target": ""
+                }
+              ],
+              "title": "Network Received",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 10,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "transmitted",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(node_network_transmit_bytes{device!~\"lo\"}[5m]))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "B",
+                  "step": 10,
+                  "target": ""
+                }
+              ],
+              "title": "Network Transmitted",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "276px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 11,
+              "isNew": true,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 11,
+              "span": 9,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(kube_pod_info)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Current number of Pods",
+                  "refId": "A",
+                  "step": 10
+                },
+                {
+                  "expr": "sum(kube_node_status_capacity_pods)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Maximum capacity of pods",
+                  "refId": "B",
+                  "step": 10
+                }
+              ],
+              "title": "Cluster Pod Utilization",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 60,
+                  "target": ""
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Pod Utilization",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-1h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Kubernetes Capacity Planning",
+      "version": 4
+    }
+    ,
+      "inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "type": "datasource",
+          "value": "prometheus"
+        }
+      ],
+      "overwrite": true
+    }
+  kubernetes-cluster-health-dashboard.json: |+
+    {
+      "dashboard":
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "refresh": "10s",
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "254px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 1,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Control Plane Components Down",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "Everything UP and healthy",
+                  "value": "null"
+                },
+                {
+                  "op": "=",
+                  "text": "",
+                  "value": ""
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Alerts Firing",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 3,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(ALERTS{alertstate=\"pending\",alertname!=\"DeadMansSwitch\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "3, 5",
+              "title": "Alerts Pending",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 4,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "count(increase(kube_pod_container_status_restarts[1h]) > 5)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Crashlooping Pods",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": false,
+          "title": "Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(kube_node_status_condition{condition=\"Ready\",status!=\"true\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Node Not Ready",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 6,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(kube_node_status_condition{condition=\"DiskPressure\",status=\"true\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Node Disk Pressure",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Node Memory Pressure",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 8,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(kube_node_spec_unschedulable)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Nodes Unschedulable",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": false,
+          "title": "Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Kubernetes Cluster Health",
+      "version": 9
+    }
+    ,
+      "inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "type": "datasource",
+          "value": "prometheus"
+        }
+      ],
+      "overwrite": true
+    }
+  kubernetes-cluster-status-dashboard.json: |+
+    {
+      "dashboard":
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "129px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 6,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Control Plane UP",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "UP",
+                  "value": "null"
+                }
+              ],
+              "valueName": "total"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 6,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 6,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "3, 5",
+              "title": "Alerts Firing",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": true,
+          "title": "Cluster Health",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "168px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 1,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"apiserver\"} == 1) / count(up{job=\"apiserver\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "API Servers UP",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / count(up{job=\"kube-controller-manager\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "Controller Managers UP",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 3,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"kube-scheduler\"} == 1) / count(up{job=\"kube-scheduler\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "Schedulers UP",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 4,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "count(increase(kube_pod_container_status_restarts{namespace=~\"kube-system|tectonic-system\"}[1h]) > 5)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Crashlooping Control Plane Pods",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": true,
+          "title": "Control Plane Status",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "158px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 8,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(100 - (avg by (instance) (rate(node_cpu{job=\"node-exporter\",mode=\"idle\"}[5m])) * 100)) / count(node_cpu{job=\"node-exporter\",mode=\"idle\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "CPU Utilization",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Memory Utilization",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 9,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Filesystem Utilization",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 10,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Pod Utilization",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": true,
+          "title": "Capacity Planning",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Kubernetes Cluster Status",
+      "version": 3
+    }
+    ,
+      "inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "type": "datasource",
+          "value": "prometheus"
+        }
+      ],
+      "overwrite": true
+    }
+  kubernetes-control-plane-status-dashboard.json: |+
+    {
+      "dashboard":
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 1,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"apiserver\"} == 1) / sum(up{job=\"apiserver\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "API Servers UP",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / sum(up{job=\"kube-controller-manager\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "Controller Managers UP",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 3,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"kube-scheduler\"} == 1) / sum(up{job=\"kube-scheduler\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "Schedulers UP",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 4,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(sum by(instance) (rate(apiserver_request_count{code=~\"5..\"}[5m])) / sum by(instance) (rate(apiserver_request_count[5m]))) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "5, 10",
+              "title": "API Server Request Error Rate",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 7,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 1,
+              "links": [],
+              "nullPointMode": "null",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum by(verb) (rate(apiserver_latency_seconds:quantile[5m]) >= 0)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 30
+                }
+              ],
+              "title": "API Server Request Latency",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 5,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 1,
+              "links": [],
+              "nullPointMode": "null",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "cluster:scheduler_e2e_scheduling_latency_seconds:quantile",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 60
+                }
+              ],
+              "title": "End to End Scheduling Latency",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "dtdurations",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 6,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 1,
+              "links": [],
+              "nullPointMode": "null",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum by(instance) (rate(apiserver_request_count{code!~\"2..\"}[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Error Rate",
+                  "refId": "A",
+                  "step": 60
+                },
+                {
+                  "expr": "sum by(instance) (rate(apiserver_request_count[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Request Rate",
+                  "refId": "B",
+                  "step": 60
+                }
+              ],
+              "title": "API Server Request Rates",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Kubernetes Control Plane Status",
+      "version": 3
+    }
+    ,
+      "inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "type": "datasource",
+          "value": "prometheus"
+        }
+      ],
+      "overwrite": true
+    }
+  kubernetes-resource-requests-dashboard.json: |+
+    {
+      "dashboard":
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "refresh": false,
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "300px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "description": "This represents the total [CPU resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) in the cluster.\nFor comparison the total [allocatable CPU cores](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 1,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 1,
+              "links": [],
+              "nullPointMode": "null",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "min(sum(kube_node_status_allocatable_cpu_cores) by (instance))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "Allocatable CPU Cores",
+                  "refId": "A",
+                  "step": 20
+                },
+                {
+                  "expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "Requested CPU Cores",
+                  "refId": "B",
+                  "step": 20
+                }
+              ],
+              "title": "CPU Cores",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "label": "CPU Cores",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance)) / min(sum(kube_node_status_allocatable_cpu_cores) by (instance)) * 100",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 240
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "CPU Cores",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "CPU Cores",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "300px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "description": "This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) in the cluster.\nFor comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 3,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 1,
+              "links": [],
+              "nullPointMode": "null",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "min(sum(kube_node_status_allocatable_memory_bytes) by (instance))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "Allocatable Memory",
+                  "refId": "A",
+                  "step": 20
+                },
+                {
+                  "expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "Requested Memory",
+                  "refId": "B",
+                  "step": 20
+                }
+              ],
+              "title": "Memory",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "label": "Memory",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 4,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance)) / min(sum(kube_node_status_allocatable_memory_bytes) by (instance)) * 100",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 240
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Memory",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Memory",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-3h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Kubernetes Resource Requests",
+      "version": 2
+    }
+    ,
+      "inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "type": "datasource",
+          "value": "prometheus"
+        }
+      ],
+      "overwrite": true
+    }
+  nodes-dashboard.json: |+
+    {
+      "dashboard":
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "description": "Dashboard to get an overview of one server",
+      "editable": false,
+      "gnetId": 22,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "refresh": false,
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 3,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "100 - (avg by (cpu) (irate(node_cpu{mode=\"idle\", instance=\"$server\"}[5m])) * 100)",
+                  "hide": false,
+                  "intervalFactor": 10,
+                  "legendFormat": "{{cpu}}",
+                  "refId": "A",
+                  "step": 50
+                }
+              ],
+              "title": "Idle CPU",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "percent",
+                  "label": "cpu usage",
+                  "logBase": 1,
+                  "max": 100,
+                  "min": 0,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 9,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "node_load1{instance=\"$server\"}",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 1m",
+                  "refId": "A",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "node_load5{instance=\"$server\"}",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 5m",
+                  "refId": "B",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "node_load15{instance=\"$server\"}",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 15m",
+                  "refId": "C",
+                  "step": 20,
+                  "target": ""
+                }
+              ],
+              "title": "System Load",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "percentunit",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 4,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": true,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}",
+                  "hide": false,
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory used",
+                  "metric": "",
+                  "refId": "C",
+                  "step": 10
+                },
+                {
+                  "expr": "node_memory_Buffers{instance=\"$server\"}",
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory buffers",
+                  "metric": "",
+                  "refId": "E",
+                  "step": 10
+                },
+                {
+                  "expr": "node_memory_Cached{instance=\"$server\"}",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory cached",
+                  "metric": "",
+                  "refId": "F",
+                  "step": 10
+                },
+                {
+                  "expr": "node_memory_MemFree{instance=\"$server\"}",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory free",
+                  "metric": "",
+                  "refId": "D",
+                  "step": 10
+                }
+              ],
+              "title": "Memory Usage",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "min": "0",
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "((node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"}  - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}) / node_memory_MemTotal{instance=\"$server\"}) * 100",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 60,
+                  "target": ""
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Memory Usage",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 6,
+              "isNew": true,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "read",
+                  "yaxis": 1
+                },
+                {
+                  "alias": "{instance=\"172.17.0.1:9100\"}",
+                  "yaxis": 2
+                },
+                {
+                  "alias": "io time",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum by (instance) (rate(node_disk_bytes_read{instance=\"$server\"}[2m]))",
+                  "hide": false,
+                  "intervalFactor": 4,
+                  "legendFormat": "read",
+                  "refId": "A",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "sum by (instance) (rate(node_disk_bytes_written{instance=\"$server\"}[2m]))",
+                  "intervalFactor": 4,
+                  "legendFormat": "written",
+                  "refId": "B",
+                  "step": 20
+                },
+                {
+                  "expr": "sum by (instance) (rate(node_disk_io_time_ms{instance=\"$server\"}[2m]))",
+                  "intervalFactor": 4,
+                  "legendFormat": "io time",
+                  "refId": "C",
+                  "step": 20
+                }
+              ],
+              "title": "Disk I/O",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "ms",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "percentunit",
+              "gauge": {
+                "maxValue": 1,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"}) - sum(node_filesystem_free{device!=\"rootfs\",instance=\"$server\"})) / sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"})",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 60,
+                  "target": ""
+                }
+              ],
+              "thresholds": "0.75, 0.9",
+              "title": "Disk Space Usage",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 8,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "transmitted",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "rate(node_network_receive_bytes{instance=\"$server\",device!~\"lo\"}[5m])",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "{{device}}",
+                  "refId": "A",
+                  "step": 10,
+                  "target": ""
+                }
+              ],
+              "title": "Network Received",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 10,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "transmitted",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "rate(node_network_transmit_bytes{instance=\"$server\",device!~\"lo\"}[5m])",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "{{device}}",
+                  "refId": "B",
+                  "step": 10,
+                  "target": ""
+                }
+              ],
+              "title": "Network Transmitted",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": [
+          {
+            "allValue": null,
+            "current": {},
+            "datasource": "${DS_PROMETHEUS}",
+            "hide": 0,
+            "includeAll": false,
+            "label": null,
+            "multi": false,
+            "name": "server",
+            "options": [],
+            "query": "label_values(node_boot_time, instance)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          }
+        ]
+      },
+      "time": {
+        "from": "now-1h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Nodes",
+      "version": 2
+    }
+    ,
+      "inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "type": "datasource",
+          "value": "prometheus"
+        }
+      ],
+      "overwrite": true
+    }
+  pods-dashboard.json: |+
+    {
+      "dashboard":
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 1,
+      "hideControls": false,
+      "links": [],
+      "refresh": false,
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 1,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": true,
+                "avg": true,
+                "current": true,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": true,
+                "show": true,
+                "total": false,
+                "values": true
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum by(container_name) (container_memory_usage_bytes{pod_name=\"$pod\", container_name=~\"$container\", container_name!=\"POD\"})",
+                  "interval": "10s",
+                  "intervalFactor": 1,
+                  "legendFormat": "Current: {{ container_name }}",
+                  "metric": "container_memory_usage_bytes",
+                  "refId": "A",
+                  "step": 15
+                },
+                {
+                  "expr": "kube_pod_container_resource_requests_memory_bytes{pod=\"$pod\", container=~\"$container\"}",
+                  "interval": "10s",
+                  "intervalFactor": 2,
+                  "legendFormat": "Requested: {{ container }}",
+                  "metric": "kube_pod_container_resource_requests_memory_bytes",
+                  "refId": "B",
+                  "step": 20
+                },
+                {
+                  "expr": "kube_pod_container_resource_limits_memory_bytes{pod=\"$pod\", container=~\"$container\"}",
+                  "interval": "10s",
+                  "intervalFactor": 2,
+                  "legendFormat": "Limit: {{ container }}",
+                  "metric": "kube_pod_container_resource_limits_memory_bytes",
+                  "refId": "C",
+                  "step": 20
+                }
+              ],
+              "title": "Memory Usage",
+              "tooltip": {
+                "msResolution": true,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 2,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": true,
+                "avg": true,
+                "current": true,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": true,
+                "show": true,
+                "total": false,
+                "values": true
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum by (container_name)(rate(container_cpu_usage_seconds_total{image!=\"\",container_name!=\"POD\",pod_name=\"$pod\"}[1m]))",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{ container_name }}",
+                  "refId": "A",
+                  "step": 30
+                },
+                {
+                  "expr": "kube_pod_container_resource_requests_cpu_cores{pod=\"$pod\", container=~\"$container\"}",
+                  "interval": "10s",
+                  "intervalFactor": 2,
+                  "legendFormat": "Requested: {{ container }}",
+                  "metric": "kube_pod_container_resource_requests_cpu_cores",
+                  "refId": "B",
+                  "step": 20
+                },
+                {
+                  "expr": "kube_pod_container_resource_limits_cpu_cores{pod=\"$pod\", container=~\"$container\"}",
+                  "interval": "10s",
+                  "intervalFactor": 2,
+                  "legendFormat": "Limit: {{ container }}",
+                  "metric": "kube_pod_container_resource_limits_memory_bytes",
+                  "refId": "C",
+                  "step": 20
+                }
+              ],
+              "title": "CPU Usage",
+              "tooltip": {
+                "msResolution": true,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 3,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": true,
+                "avg": true,
+                "current": true,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": true,
+                "show": true,
+                "total": false,
+                "values": true
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{pod_name=\"$pod\"}[1m])))",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{ pod_name }}",
+                  "refId": "A",
+                  "step": 30
+                }
+              ],
+              "title": "Network I/O",
+              "tooltip": {
+                "msResolution": true,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": [
+          {
+            "allValue": ".*",
+            "current": {},
+            "datasource": "${DS_PROMETHEUS}",
+            "hide": 0,
+            "includeAll": true,
+            "label": "Namespace",
+            "multi": false,
+            "name": "namespace",
+            "options": [],
+            "query": "label_values(kube_pod_info, namespace)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          },
+          {
+            "allValue": null,
+            "current": {},
+            "datasource": "${DS_PROMETHEUS}",
+            "hide": 0,
+            "includeAll": false,
+            "label": "Pod",
+            "multi": false,
+            "name": "pod",
+            "options": [],
+            "query": "label_values(kube_pod_info{namespace=~\"$namespace\"}, pod)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          },
+          {
+            "allValue": ".*",
+            "current": {},
+            "datasource": "${DS_PROMETHEUS}",
+            "hide": 0,
+            "includeAll": true,
+            "label": "Container",
+            "multi": false,
+            "name": "container",
+            "options": [],
+            "query": "label_values(kube_pod_container_info{namespace=\"$namespace\", pod=\"$pod\"}, container)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          }
+        ]
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Pods",
+      "version": 1
+    }
+    ,
+      "inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "type": "datasource",
+          "value": "prometheus"
+        }
+      ],
+      "overwrite": true
+    }
+  statefulset-dashboard.json: |+
+    {
+      "dashboard":
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 1,
+      "hideControls": false,
+      "links": [],
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "200px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 8,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "cores",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "CPU",
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 9,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "GB",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "80%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(container_memory_usage_bytes{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}) / 1024^3",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Memory",
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "Bps",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": false
+              },
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Network",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "100px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": false
+              },
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "metric": "kube_statefulset_replicas",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Desired Replicas",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 6,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Available Replicas",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 3,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_statefulset_status_observed_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Observed Generation",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_statefulset_metadata_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Metadata Generation",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "350px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "${DS_PROMETHEUS}",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 1,
+              "isNew": true,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "available",
+                  "refId": "B",
+                  "step": 30
+                },
+                {
+                  "expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "desired",
+                  "refId": "E",
+                  "step": 30
+                }
+              ],
+              "title": "Replicas",
+              "tooltip": {
+                "msResolution": true,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "none",
+                  "label": "",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": "",
+                  "logBase": 1,
+                  "show": false
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": [
+          {
+            "allValue": ".*",
+            "current": {},
+            "datasource": "${DS_PROMETHEUS}",
+            "hide": 0,
+            "includeAll": false,
+            "label": "Namespace",
+            "multi": false,
+            "name": "statefulset_namespace",
+            "options": [],
+            "query": "label_values(kube_statefulset_metadata_generation, namespace)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": null,
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          },
+          {
+            "allValue": null,
+            "current": {},
+            "datasource": "${DS_PROMETHEUS}",
+            "hide": 0,
+            "includeAll": false,
+            "label": "StatefulSet",
+            "multi": false,
+            "name": "statefulset_name",
+            "options": [],
+            "query": "label_values(kube_statefulset_metadata_generation{namespace=\"$statefulset_namespace\"}, statefulset)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "statefulset",
+            "type": "query",
+            "useTags": false
+          }
+        ]
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "StatefulSet",
+      "version": 1
+    }
+    ,
+      "inputs": [
+        {
+          "name": "DS_PROMETHEUS",
+          "pluginId": "prometheus",
+          "type": "datasource",
+          "value": "prometheus"
+        }
+      ],
+      "overwrite": true
+    }
+  prometheus-datasource.json: |+
+    {
+        "access": "proxy",
+        "basicAuth": false,
+        "name": "prometheus",
+        "type": "prometheus",
+        "url": "http://prometheus.monitoring.svc"
+    }
+---
--- a/addons/grafana/deployment.yaml
+++ b/addons/grafana/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: apps/v1beta2
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: grafana
@ -21,7 +21,7 @@ spec:
    spec:
      containers:
        - name: grafana
-          image: grafana/grafana:4.6.1
+          image: grafana/grafana:4.6.3
          env:
            - name: GF_SERVER_HTTP_PORT
              value: "8080"
@ -41,6 +41,22 @@ spec:
            limits:
              memory: 200Mi
              cpu: 200m
+        - name: grafana-watcher
+          image: quay.io/coreos/grafana-watcher:v0.0.8
+          args:
+            - '--watch-dir=/etc/grafana/dashboards'
+            - '--grafana-url=http://localhost:8080'
+          resources:
+            requests:
+              memory: "16Mi"
+              cpu: "50m"
+            limits:
+              memory: "32Mi"
+              cpu: "100m"
+          volumeMounts:
+          - name: dashboards
+            mountPath: /etc/grafana/dashboards
      volumes:
-        - name: grafana-storage
-          emptyDir: {}
+        - name: dashboards
+          configMap:
+            name: grafana-dashboards
--- a/addons/heapster/cluster-role-binding.yaml
+++ b/addons/heapster/cluster-role-binding.yaml
@ -0,0 +1,12 @@
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: heapster
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: system:heapster
+subjects:
+- kind: ServiceAccount
+  name: heapster
+  namespace: kube-system
--- a/addons/heapster/deployment.yaml
+++ b/addons/heapster/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: heapster
@ -14,12 +14,11 @@ spec:
      labels:
        name: heapster
        phase: prod
-      annotations:
-        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
+      serviceAccountName: heapster
      containers:
        - name: heapster
-          image: gcr.io/google_containers/heapster-amd64:v1.4.3
+          image: k8s.gcr.io/heapster-amd64:v1.5.2
          command:
            - /heapster
            - --source=kubernetes.summary_api:''
@ -31,16 +30,18 @@ spec:
            initialDelaySeconds: 180
            timeoutSeconds: 5
        - name: heapster-nanny
-          image: gcr.io/google_containers/addon-resizer:2.0
+          image: k8s.gcr.io/addon-resizer:1.7
          command:
            - /pod_nanny
            - --cpu=80m
            - --extra-cpu=0.5m
            - --memory=140Mi
            - --extra-memory=4Mi
+            - --threshold=5
            - --deployment=heapster
            - --container=heapster
            - --poll-period=300000
+            - --estimator=exponential
          env:
            - name: MY_POD_NAME
              valueFrom:
--- a/addons/heapster/role-binding.yaml
+++ b/addons/heapster/role-binding.yaml
@ -0,0 +1,13 @@
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: heapster
+  namespace: kube-system
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: system:pod-nanny
+subjects:
+- kind: ServiceAccount
+  name: heapster
+  namespace: kube-system
--- a/addons/heapster/role.yaml
+++ b/addons/heapster/role.yaml
@ -0,0 +1,19 @@
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: system:pod-nanny
+  namespace: kube-system
+rules:
+- apiGroups:
+  - ""
+  resources:
+  - pods
+  verbs:
+  - get
+- apiGroups:
+  - "extensions"
+  resources:
+  - deployments
+  verbs:
+  - get
+  - update
--- a/addons/heapster/service-account.yaml
+++ b/addons/heapster/service-account.yaml
@ -0,0 +1,5 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: heapster
+  namespace: kube-system
--- a/addons/nginx-ingress/aws/0-namespace.yaml
+++ b/addons/nginx-ingress/aws/0-namespace.yaml
--- a/addons/nginx-ingress/aws/default-backend/deployment.yaml
+++ b/addons/nginx-ingress/aws/default-backend/deployment.yaml
@ -1,10 +1,14 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: default-backend
  namespace: ingress
 spec:
  replicas: 1
+  selector:
+    matchLabels:
+      name: default-backend
+      phase: prod
  template:
    metadata:
      labels:
--- a/addons/nginx-ingress/aws/deployment.yaml
+++ b/addons/nginx-ingress/aws/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: nginx-ingress-controller
@ -8,6 +8,10 @@ spec:
  strategy:
    rollingUpdate:
      maxUnavailable: 1
+  selector:
+    matchLabels:
+      name: nginx-ingress-controller
+      phase: prod
  template:
    metadata:
      labels:
@ -19,7 +23,7 @@ spec:
      hostNetwork: true
      containers:
        - name: nginx-ingress-controller
-          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
+          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
--- a/addons/nginx-ingress/aws/rbac/cluster-role-binding.yaml
+++ b/addons/nginx-ingress/aws/rbac/cluster-role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
 roleRef:
--- a/addons/nginx-ingress/aws/rbac/cluster-role.yaml
+++ b/addons/nginx-ingress/aws/rbac/cluster-role.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: ingress
--- a/addons/nginx-ingress/aws/rbac/role-binding.yaml
+++ b/addons/nginx-ingress/aws/rbac/role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: RoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/nginx-ingress/aws/rbac/role.yaml
+++ b/addons/nginx-ingress/aws/rbac/role.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: Role
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/nginx-ingress/digital-ocean/0-namespace.yaml
+++ b/addons/nginx-ingress/digital-ocean/0-namespace.yaml
--- a/addons/nginx-ingress/digital-ocean/daemonset.yaml
+++ b/addons/nginx-ingress/digital-ocean/daemonset.yaml
@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: DaemonSet
 metadata:
  name: nginx-ingress-controller
@ -8,6 +8,10 @@ spec:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
+  selector:
+    matchLabels:
+      name: nginx-ingress-controller
+      phase: prod
  template:
    metadata:
      labels:
@ -19,7 +23,7 @@ spec:
      hostNetwork: true
      containers:
        - name: nginx-ingress-controller
-          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
+          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
--- a/addons/nginx-ingress/digital-ocean/default-backend/deployment.yaml
+++ b/addons/nginx-ingress/digital-ocean/default-backend/deployment.yaml
@ -1,10 +1,14 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: default-backend
  namespace: ingress
 spec:
  replicas: 1
+  selector:
+    matchLabels:
+      name: default-backend
+      phase: prod
  template:
    metadata:
      labels:
--- a/addons/nginx-ingress/digital-ocean/rbac/cluster-role-binding.yaml
+++ b/addons/nginx-ingress/digital-ocean/rbac/cluster-role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
 roleRef:
--- a/addons/nginx-ingress/digital-ocean/rbac/cluster-role.yaml
+++ b/addons/nginx-ingress/digital-ocean/rbac/cluster-role.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: ingress
--- a/addons/nginx-ingress/digital-ocean/rbac/role-binding.yaml
+++ b/addons/nginx-ingress/digital-ocean/rbac/role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: RoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/nginx-ingress/digital-ocean/rbac/role.yaml
+++ b/addons/nginx-ingress/digital-ocean/rbac/role.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: Role
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/nginx-ingress/google-cloud/0-namespace.yaml
+++ b/addons/nginx-ingress/google-cloud/0-namespace.yaml
--- a/addons/nginx-ingress/google-cloud/default-backend/deployment.yaml
+++ b/addons/nginx-ingress/google-cloud/default-backend/deployment.yaml
@ -1,10 +1,14 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: default-backend
  namespace: ingress
 spec:
  replicas: 1
+  selector:
+    matchLabels:
+      name: default-backend
+      phase: prod
  template:
    metadata:
      labels:
--- a/addons/nginx-ingress/google-cloud/deployment.yaml
+++ b/addons/nginx-ingress/google-cloud/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: nginx-ingress-controller
@ -8,6 +8,10 @@ spec:
  strategy:
    rollingUpdate:
      maxUnavailable: 1
+  selector:
+    matchLabels:
+      name: nginx-ingress-controller
+      phase: prod
  template:
    metadata:
      labels:
@ -19,7 +23,7 @@ spec:
      hostNetwork: true
      containers:
        - name: nginx-ingress-controller
-          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
+          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
--- a/addons/nginx-ingress/google-cloud/rbac/cluster-role-binding.yaml
+++ b/addons/nginx-ingress/google-cloud/rbac/cluster-role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
 roleRef:
--- a/addons/nginx-ingress/google-cloud/rbac/cluster-role.yaml
+++ b/addons/nginx-ingress/google-cloud/rbac/cluster-role.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: ingress
--- a/addons/nginx-ingress/google-cloud/rbac/role-binding.yaml
+++ b/addons/nginx-ingress/google-cloud/rbac/role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: RoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/nginx-ingress/google-cloud/rbac/role.yaml
+++ b/addons/nginx-ingress/google-cloud/rbac/role.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: Role
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/prometheus/0-namespace.yaml
+++ b/addons/prometheus/0-namespace.yaml
--- a/addons/prometheus/config.yaml
+++ b/addons/prometheus/config.yaml
@ -39,7 +39,7 @@ data:
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        # Using endpoints to discover kube-apiserver targets finds the pod IP
-        # (host IP since apiserver is uses host network) which is not used in
+        # (host IP since apiserver uses host network) which is not used in
        # the server certificate.
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
@ -51,6 +51,9 @@ data:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
+      - replacement: apiserver
+        action: replace
+        target_label: job

    # Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
    # metrics from a node by scraping kubelet (127.0.0.1:10255/metrics).
@ -59,7 +62,7 @@ data:
    # Kubernetes apiserver.  This means it will work if Prometheus is running out of
    # cluster, or can't connect to nodes for some other reason (e.g. because of
    # firewalling).
-    - job_name: 'kubernetes-nodes'
+    - job_name: 'kubelet'
      kubernetes_sd_configs:
      - role: node
      
@ -149,7 +152,7 @@ data:
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
-        target_label: kubernetes_name
+        target_label: job

    # Example scrape config for probing services via the Blackbox Exporter.
    #
@ -181,7 +184,7 @@ data:
      - source_labels: [__meta_kubernetes_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
-        target_label: kubernetes_name
+        target_label: job

    # Example scrape config for pods
    #
--- a/addons/prometheus/deployment.yaml
+++ b/addons/prometheus/deployment.yaml
@ -1,22 +1,24 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: prometheus
  namespace: monitoring
 spec:
  replicas: 1
-  strategy:
-    rollingUpdate:
-      maxUnavailable: 1
+  selector:
+    matchLabels:
+      name: prometheus
+      phase: prod
  template:
    metadata:
      labels:
        name: prometheus
        phase: prod
    spec:
+      serviceAccountName: prometheus
      containers:
      - name: prometheus
-        image: quay.io/prometheus/prometheus:v2.0.0
+        image: quay.io/prometheus/prometheus:v2.2.1
        args:
          - '--config.file=/etc/prometheus/prometheus.yaml'
        ports:
--- a/addons/prometheus/exporters/kube-state-metrics/cluster-role.yaml
+++ b/addons/prometheus/exporters/kube-state-metrics/cluster-role.yaml
@ -12,7 +12,9 @@ rules:
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
+  - persistentvolumes
  - namespaces
+  - endpoints
  verbs: ["list", "watch"]
 - apiGroups: ["extensions"]
  resources:
@ -29,3 +31,7 @@ rules:
  - cronjobs
  - jobs
  verbs: ["list", "watch"]
+- apiGroups: ["autoscaling"]
+  resources:
+  - horizontalpodautoscalers
+  verbs: ["list", "watch"]
--- a/addons/prometheus/exporters/kube-state-metrics/deployment.yaml
+++ b/addons/prometheus/exporters/kube-state-metrics/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: apps/v1beta2
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: kube-state-metrics
@ -22,7 +22,7 @@ spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
-        image: quay.io/coreos/kube-state-metrics:v1.1.0
+        image: quay.io/coreos/kube-state-metrics:v1.2.0
        ports:
          - name: metrics
            containerPort: 8080
@ -33,7 +33,7 @@ spec:
          initialDelaySeconds: 5
          timeoutSeconds: 5
      - name: addon-resizer
-        image: gcr.io/google_containers/addon-resizer:1.0
+        image: gcr.io/google_containers/addon-resizer:1.7
        resources:
          limits:
            cpu: 100m
--- a/addons/prometheus/exporters/kube-state-metrics/service.yaml
+++ b/addons/prometheus/exporters/kube-state-metrics/service.yaml
@ -15,5 +15,5 @@ spec:
  ports:
    - name: metrics
      protocol: TCP
-      port: 80
+      port: 8080
      targetPort: 8080
--- a/addons/prometheus/exporters/node-exporter/daemonset.yaml
+++ b/addons/prometheus/exporters/node-exporter/daemonset.yaml
@ -1,4 +1,4 @@
-apiVersion: apps/v1beta2
+apiVersion: apps/v1
 kind: DaemonSet
 metadata:
  name: node-exporter
@ -18,11 +18,15 @@ spec:
        name: node-exporter
        phase: prod
    spec:
+      serviceAccountName: node-exporter
+      securityContext:
+        runAsNonRoot: true
+        runAsUser: 65534
      hostNetwork: true
      hostPID: true
      containers:
      - name: node-exporter
-        image: quay.io/prometheus/node-exporter:v0.15.0
+        image: quay.io/prometheus/node-exporter:v0.15.2
        args:
          - "--path.procfs=/host/proc"
          - "--path.sysfs=/host/sys"
@ -45,9 +49,8 @@ spec:
            mountPath: /host/sys
            readOnly: true
      tolerations:
-        - key: node-role.kubernetes.io/master
+        - effect: NoSchedule
          operator: Exists
-          effect: NoSchedule
      volumes:
        - name: proc
          hostPath:
--- a/addons/prometheus/exporters/node-exporter/service-account.yaml
+++ b/addons/prometheus/exporters/node-exporter/service-account.yaml
@ -0,0 +1,5 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: node-exporter
+  namespace: monitoring
--- a/addons/prometheus/rbac/cluster-role-binding.yaml
+++ b/addons/prometheus/rbac/cluster-role-binding.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
 metadata:
  name: prometheus
@ -8,5 +8,5 @@ roleRef:
  name: prometheus
 subjects:
 - kind: ServiceAccount
-  name: default
+  name: prometheus
  namespace: monitoring
--- a/addons/prometheus/rbac/cluster-role.yaml
+++ b/addons/prometheus/rbac/cluster-role.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: prometheus
--- a/addons/prometheus/rules.yaml
+++ b/addons/prometheus/rules.yaml
@ -4,10 +4,9 @@ metadata:
  name: prometheus-rules
  namespace: monitoring
 data:
-  # Rules adapted from those provided by coreos/prometheus-operator and SoundCloud
-  alertmanager.rules.yaml: |+
+  alertmanager.rules.yaml: |
    groups:
-    - name: ./alertmanager.rules
+    - name: alertmanager.rules
      rules:
      - alert: AlertmanagerConfigInconsistent
        expr: count_values("config_hash", alertmanager_config_hash) BY (service) / ON(service)
@ -19,7 +18,6 @@ data:
        annotations:
          description: The configuration of the instances of the Alertmanager cluster
            `{{$labels.service}}` are out of sync.
-          summary: Alertmanager configurations are inconsistent
      - alert: AlertmanagerDownOrMissing
        expr: label_replace(prometheus_operator_alertmanager_spec_replicas, "job", "alertmanager-$1",
          "alertmanager", "(.*)") / ON(job) GROUP_RIGHT() sum(up) BY (job) != 1
@ -29,8 +27,7 @@ data:
        annotations:
          description: An unexpected number of Alertmanagers are scraped or Alertmanagers
            disappeared from discovery.
-          summary: Alertmanager down or not discovered
-      - alert: FailedReload
+      - alert: AlertmanagerFailedReload
        expr: alertmanager_config_last_reload_successful == 0
        for: 10m
        labels:
@ -38,8 +35,7 @@ data:
        annotations:
          description: Reloading Alertmanager's configuration has failed for {{ $labels.namespace
            }}/{{ $labels.pod}}.
-          summary: Alertmanager configuration reload has failed
-  etcd3.rules.yaml: |+
+  etcd3.rules.yaml: |
    groups:
    - name: ./etcd3.rules
      rules:
@ -68,8 +64,8 @@ data:
            changes within the last hour
          summary: a high number of leader changes within the etcd cluster are happening
      - alert: HighNumberOfFailedGRPCRequests
-        expr: sum(rate(etcd_grpc_requests_failed_total{job="etcd"}[5m])) BY (grpc_method)
-          / sum(rate(etcd_grpc_total{job="etcd"}[5m])) BY (grpc_method) > 0.01
+        expr: sum(rate(grpc_server_handled_total{grpc_code!="OK",job="etcd"}[5m])) BY (grpc_service, grpc_method)
+          / sum(rate(grpc_server_handled_total{job="etcd"}[5m])) BY (grpc_service, grpc_method) > 0.01
        for: 10m
        labels:
          severity: warning
@ -78,8 +74,8 @@ data:
            on etcd instance {{ $labels.instance }}'
          summary: a high number of gRPC requests are failing
      - alert: HighNumberOfFailedGRPCRequests
-        expr: sum(rate(etcd_grpc_requests_failed_total{job="etcd"}[5m])) BY (grpc_method)
-          / sum(rate(etcd_grpc_total{job="etcd"}[5m])) BY (grpc_method) > 0.05
+        expr: sum(rate(grpc_server_handled_total{grpc_code!="OK",job="etcd"}[5m])) BY (grpc_service, grpc_method)
+          / sum(rate(grpc_server_handled_total{job="etcd"}[5m])) BY (grpc_service, grpc_method) > 0.05
        for: 5m
        labels:
          severity: critical
@ -88,7 +84,7 @@ data:
            on etcd instance {{ $labels.instance }}'
          summary: a high number of gRPC requests are failing
      - alert: GRPCRequestsSlow
-        expr: histogram_quantile(0.99, rate(etcd_grpc_unary_requests_duration_seconds_bucket[5m]))
+        expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job="etcd",grpc_type="unary"}[5m])) by (grpc_service, grpc_method, le))
          > 0.15
        for: 10m
        labels:
@ -128,7 +124,7 @@ data:
            }} are slow
          summary: slow HTTP requests
      - alert: EtcdMemberCommunicationSlow
-        expr: histogram_quantile(0.99, rate(etcd_network_member_round_trip_time_seconds_bucket[5m]))
+        expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[5m]))
          > 0.15
        for: 10m
        labels:
@ -163,9 +159,9 @@ data:
        annotations:
          description: etcd instance {{ $labels.instance }} commit durations are high
          summary: high commit durations
-  general.rules.yaml: |+
+  general.rules.yaml: |
    groups:
-    - name: ./general.rules
+    - name: general.rules
      rules:
      - alert: TargetDown
        expr: 100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10
@ -173,66 +169,34 @@ data:
        labels:
          severity: warning
        annotations:
-          description: '{{ $value }}% or more of {{ $labels.job }} targets are down.'
+          description: '{{ $value }}% of {{ $labels.job }} targets are down.'
          summary: Targets are down
-      - alert: TooManyOpenFileDescriptors
-        expr: 100 * (process_open_fds / process_max_fds) > 95
-        for: 10m
-        labels:
-          severity: critical
-        annotations:
-          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
-            $labels.instance }}) is using {{ $value }}% of the available file/socket descriptors.'
-          summary: too many open file descriptors
-      - record: instance:fd_utilization
+      - record: fd_utilization
        expr: process_open_fds / process_max_fds
      - alert: FdExhaustionClose
-        expr: predict_linear(instance:fd_utilization[1h], 3600 * 4) > 1
+        expr: predict_linear(fd_utilization[1h], 3600 * 4) > 1
        for: 10m
        labels:
          severity: warning
        annotations:
-          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
-            $labels.instance }}) instance will exhaust in file/socket descriptors soon'
+          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
+            will exhaust in file/socket descriptors within the next 4 hours'
          summary: file descriptors soon exhausted
      - alert: FdExhaustionClose
-        expr: predict_linear(instance:fd_utilization[10m], 3600) > 1
+        expr: predict_linear(fd_utilization[10m], 3600) > 1
        for: 10m
        labels:
          severity: critical
        annotations:
-          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
-            $labels.instance }}) instance will exhaust in file/socket descriptors soon'
+          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
+            will exhaust in file/socket descriptors within the next hour'
          summary: file descriptors soon exhausted
-  kube-apiserver.rules.yaml: |+
+  kube-controller-manager.rules.yaml: |
    groups:
-    - name: ./kube-apiserver.rules
-      rules:
-      - alert: K8SApiserverDown
-        expr: absent(up{job="kubernetes-apiservers"} == 1)
-        for: 5m
-        labels:
-          severity: critical
-        annotations:
-          description: Prometheus failed to scrape API server(s), or all API servers have
-            disappeared from service discovery.
-          summary: API server unreachable
-      - alert: K8SApiServerLatency
-        expr: histogram_quantile(0.99, sum(apiserver_request_latencies_bucket{subresource!="log",verb!~"^(?:CONNECT|WATCHLIST|WATCH|PROXY)$"})
-          WITHOUT (instance, resource)) / 1e+06 > 1
-        for: 10m
-        labels:
-          severity: warning
-        annotations:
-          description: 99th percentile Latency for {{ $labels.verb }} requests to the
-            kube-apiserver is higher than 1s.
-          summary: Kubernetes apiserver latency is high
-  kube-controller-manager.rules.yaml: |+
-    groups:
-    - name: ./kube-controller-manager.rules
+    - name: kube-controller-manager.rules
      rules:
      - alert: K8SControllerManagerDown
-        expr: absent(up{kubernetes_name="kube-controller-manager"} == 1)
+        expr: absent(up{job="kube-controller-manager"} == 1)
        for: 5m
        labels:
          severity: critical
@ -240,12 +204,57 @@ data:
          description: There is no running K8S controller manager. Deployments and replication
            controllers are not making progress.
          summary: Controller manager is down
-  kube-scheduler.rules.yaml: |+
+  kube-scheduler.rules.yaml: |
    groups:
-    - name: ./kube-scheduler.rules
+    - name: kube-scheduler.rules
      rules:
+      - record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
+        expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.99"
+      - record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
+        expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.9"
+      - record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
+        expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.5"
+      - record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
+        expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.99"
+      - record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
+        expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.9"
+      - record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
+        expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.5"
+      - record: cluster:scheduler_binding_latency_seconds:quantile
+        expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.99"
+      - record: cluster:scheduler_binding_latency_seconds:quantile
+        expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.9"
+      - record: cluster:scheduler_binding_latency_seconds:quantile
+        expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.5"
      - alert: K8SSchedulerDown
-        expr: absent(up{kubernetes_name="kube-scheduler"} == 1)
+        expr: absent(up{job="kube-scheduler"} == 1)
        for: 5m
        labels:
          severity: critical
@ -253,9 +262,69 @@ data:
          description: There is no running K8S scheduler. New pods are not being assigned
            to nodes.
          summary: Scheduler is down
-  kubelet.rules.yaml: |+
+  kube-state-metrics.rules.yaml: |
    groups:
-    - name: ./kubelet.rules
+    - name: kube-state-metrics.rules
+      rules:
+      - alert: DeploymentGenerationMismatch
+        expr: kube_deployment_status_observed_generation != kube_deployment_metadata_generation
+        for: 15m
+        labels:
+          severity: warning
+        annotations:
+          description: Observed deployment generation does not match expected one for
+            deployment {{$labels.namespaces}}/{{$labels.deployment}}
+          summary: Deployment is outdated
+      - alert: DeploymentReplicasNotUpdated
+        expr: ((kube_deployment_status_replicas_updated != kube_deployment_spec_replicas)
+          or (kube_deployment_status_replicas_available != kube_deployment_spec_replicas))
+          unless (kube_deployment_spec_paused == 1)
+        for: 15m
+        labels:
+          severity: warning
+        annotations:
+          description: Replicas are not updated and available for deployment {{$labels.namespaces}}/{{$labels.deployment}}
+          summary: Deployment replicas are outdated
+      - alert: DaemonSetRolloutStuck
+        expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled
+          * 100 < 100
+        for: 15m
+        labels:
+          severity: warning
+        annotations:
+          description: Only {{$value}}% of desired pods scheduled and ready for daemon
+            set {{$labels.namespaces}}/{{$labels.daemonset}}
+          summary: DaemonSet is missing pods
+      - alert: K8SDaemonSetsNotScheduled
+        expr: kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled
+          > 0
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: A number of daemonsets are not scheduled.
+          summary: Daemonsets are not scheduled correctly
+      - alert: DaemonSetsMissScheduled
+        expr: kube_daemonset_status_number_misscheduled > 0
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: A number of daemonsets are running where they are not supposed
+            to run.
+          summary: Daemonsets are not scheduled correctly
+      - alert: PodFrequentlyRestarting
+        expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: Pod {{$labels.namespaces}}/{{$labels.pod}} is was restarted {{$value}}
+            times within the last hour
+          summary: Pod is restarting frequently
+  kubelet.rules.yaml: |
+    groups:
+    - name: kubelet.rules
      rules:
      - alert: K8SNodeNotReady
        expr: kube_node_status_condition{condition="Ready",status="true"} == 0
@ -274,20 +343,17 @@ data:
        labels:
          severity: critical
        annotations:
-          description: '{{ $value }} Kubernetes nodes (more than 10% are in the NotReady
-            state).'
-          summary: Many Kubernetes nodes are Not Ready
+          description: '{{ $value }}% of Kubernetes nodes are not ready'
      - alert: K8SKubeletDown
-        expr: count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"}) > 0.03
+        expr: count(up{job="kubelet"} == 0) / count(up{job="kubelet"}) * 100 > 3
        for: 1h
        labels:
          severity: warning
        annotations:
          description: Prometheus failed to scrape {{ $value }}% of kubelets.
-          summary: Many Kubelets cannot be scraped
      - alert: K8SKubeletDown
-        expr: absent(up{job="kubernetes-nodes"} == 1) or count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"})
-          > 0.1
+        expr: (absent(up{job="kubelet"} == 1) or count(up{job="kubelet"} == 0) / count(up{job="kubelet"}))
+          * 100 > 10
        for: 1h
        labels:
          severity: critical
@ -297,176 +363,236 @@ data:
          summary: Many Kubelets cannot be scraped
      - alert: K8SKubeletTooManyPods
        expr: kubelet_running_pod_count > 100
+        for: 10m
        labels:
          severity: warning
        annotations:
          description: Kubelet {{$labels.instance}} is running {{$value}} pods, close
            to the limit of 110
          summary: Kubelet is close to pod limit
-  kubernetes.rules.yaml: |+
+  kubernetes.rules.yaml: |
    groups:
-    - name: ./kubernetes.rules
+    - name: kubernetes.rules
      rules:
-      - record: cluster_namespace_controller_pod_container:spec_memory_limit_bytes
-        expr: sum(label_replace(container_spec_memory_limit_bytes{container_name!=""},
-          "controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
-          controller, pod_name, container_name)
-      - record: cluster_namespace_controller_pod_container:spec_cpu_shares
-        expr: sum(label_replace(container_spec_cpu_shares{container_name!=""}, "controller",
-          "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
-          container_name)
-      - record: cluster_namespace_controller_pod_container:cpu_usage:rate
-        expr: sum(label_replace(irate(container_cpu_usage_seconds_total{container_name!=""}[5m]),
-          "controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
-          controller, pod_name, container_name)
-      - record: cluster_namespace_controller_pod_container:memory_usage:bytes
-        expr: sum(label_replace(container_memory_usage_bytes{container_name!=""}, "controller",
-          "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
-          container_name)
-      - record: cluster_namespace_controller_pod_container:memory_working_set:bytes
-        expr: sum(label_replace(container_memory_working_set_bytes{container_name!=""},
-          "controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
-          controller, pod_name, container_name)
-      - record: cluster_namespace_controller_pod_container:memory_rss:bytes
-        expr: sum(label_replace(container_memory_rss{container_name!=""}, "controller",
-          "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
-          container_name)
-      - record: cluster_namespace_controller_pod_container:memory_cache:bytes
-        expr: sum(label_replace(container_memory_cache{container_name!=""}, "controller",
-          "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
-          container_name)
-      - record: cluster_namespace_controller_pod_container:disk_usage:bytes
-        expr: sum(label_replace(container_disk_usage_bytes{container_name!=""}, "controller",
-          "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
-          container_name)
-      - record: cluster_namespace_controller_pod_container:memory_pagefaults:rate
-        expr: sum(label_replace(irate(container_memory_failures_total{container_name!=""}[5m]),
-          "controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
-          controller, pod_name, container_name, scope, type)
-      - record: cluster_namespace_controller_pod_container:memory_oom:rate
-        expr: sum(label_replace(irate(container_memory_failcnt{container_name!=""}[5m]),
-          "controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
-          controller, pod_name, container_name, scope, type)
-      - record: cluster:memory_allocation:percent
-        expr: 100 * sum(container_spec_memory_limit_bytes{pod_name!=""}) BY (cluster)
-          / sum(machine_memory_bytes) BY (cluster)
-      - record: cluster:memory_used:percent
-        expr: 100 * sum(container_memory_usage_bytes{pod_name!=""}) BY (cluster) / sum(machine_memory_bytes)
-          BY (cluster)
-      - record: cluster:cpu_allocation:percent
-        expr: 100 * sum(container_spec_cpu_shares{pod_name!=""}) BY (cluster) / sum(container_spec_cpu_shares{id="/"}
-          * ON(cluster, instance) machine_cpu_cores) BY (cluster)
-      - record: cluster:node_cpu_use:percent
-        expr: 100 * sum(rate(node_cpu{mode!="idle"}[5m])) BY (cluster) / sum(machine_cpu_cores)
-          BY (cluster)
-      - record: cluster_resource_verb:apiserver_latency:quantile_seconds
-        expr: histogram_quantile(0.99, sum(apiserver_request_latencies_bucket) BY (le,
-          cluster, job, resource, verb)) / 1e+06
+      - record: pod_name:container_memory_usage_bytes:sum
+        expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
+          (pod_name)
+      - record: pod_name:container_spec_cpu_shares:sum
+        expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) BY (pod_name)
+      - record: pod_name:container_cpu_usage:sum
+        expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
+          BY (pod_name)
+      - record: pod_name:container_fs_usage_bytes:sum
+        expr: sum(container_fs_usage_bytes{container_name!="POD",pod_name!=""}) BY (pod_name)
+      - record: namespace:container_memory_usage_bytes:sum
+        expr: sum(container_memory_usage_bytes{container_name!=""}) BY (namespace)
+      - record: namespace:container_spec_cpu_shares:sum
+        expr: sum(container_spec_cpu_shares{container_name!=""}) BY (namespace)
+      - record: namespace:container_cpu_usage:sum
+        expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m]))
+          BY (namespace)
+      - record: cluster:memory_usage:ratio
+        expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
+          (cluster) / sum(machine_memory_bytes) BY (cluster)
+      - record: cluster:container_spec_cpu_shares:ratio
+        expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) / 1000
+          / sum(machine_cpu_cores)
+      - record: cluster:container_cpu_usage:ratio
+        expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
+          / sum(machine_cpu_cores)
+      - record: apiserver_latency_seconds:quantile
+        expr: histogram_quantile(0.99, rate(apiserver_request_latencies_bucket[5m])) /
+          1e+06
        labels:
          quantile: "0.99"
-      - record: cluster_resource_verb:apiserver_latency:quantile_seconds
-        expr: histogram_quantile(0.9, sum(apiserver_request_latencies_bucket) BY (le,
-          cluster, job, resource, verb)) / 1e+06
+      - record: apiserver_latency:quantile_seconds
+        expr: histogram_quantile(0.9, rate(apiserver_request_latencies_bucket[5m])) /
+          1e+06
        labels:
          quantile: "0.9"
-      - record: cluster_resource_verb:apiserver_latency:quantile_seconds
-        expr: histogram_quantile(0.5, sum(apiserver_request_latencies_bucket) BY (le,
-          cluster, job, resource, verb)) / 1e+06
+      - record: apiserver_latency_seconds:quantile
+        expr: histogram_quantile(0.5, rate(apiserver_request_latencies_bucket[5m])) /
+          1e+06
        labels:
          quantile: "0.5"
-      - record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
-        expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+      - alert: APIServerLatencyHigh
+        expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
+          > 1
+        for: 10m
        labels:
-          quantile: "0.99"
-      - record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
-        expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+          severity: warning
+        annotations:
+          description: the API server has a 99th percentile latency of {{ $value }} seconds
+            for {{$labels.verb}} {{$labels.resource}}
+      - alert: APIServerLatencyHigh
+        expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
+          > 4
+        for: 10m
        labels:
-          quantile: "0.9"
-      - record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
-        expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+          severity: critical
+        annotations:
+          description: the API server has a 99th percentile latency of {{ $value }} seconds
+            for {{$labels.verb}} {{$labels.resource}}
+      - alert: APIServerErrorsHigh
+        expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
+          * 100 > 2
+        for: 10m
        labels:
-          quantile: "0.5"
-      - record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
-        expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+          severity: warning
+        annotations:
+          description: API server returns errors for {{ $value }}% of requests
+      - alert: APIServerErrorsHigh
+        expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
+          * 100 > 5
+        for: 10m
        labels:
-          quantile: "0.99"
-      - record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
-        expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+          severity: critical
+        annotations:
+          description: API server returns errors for {{ $value }}% of requests
+      - alert: K8SApiserverDown
+        expr: absent(up{job="apiserver"} == 1)
+        for: 20m
        labels:
-          quantile: "0.9"
-      - record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
-        expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+          severity: critical
+        annotations:
+          description: No API servers are reachable or all have disappeared from service
+            discovery
+
+      - alert: K8sCertificateExpirationNotice
        labels:
-          quantile: "0.5"
-      - record: cluster:scheduler_binding_latency:quantile_seconds
-        expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+          severity: warning
+        annotations:
+          description: Kubernetes API Certificate is expiring soon (less than 7 days)
+        expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="604800"}) > 0
+
+      - alert: K8sCertificateExpirationNotice
        labels:
-          quantile: "0.99"
-      - record: cluster:scheduler_binding_latency:quantile_seconds
-        expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
-        labels:
-          quantile: "0.9"
-      - record: cluster:scheduler_binding_latency:quantile_seconds
-        expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
-        labels:
-          quantile: "0.5"
-  node.rules.yaml: |+
+          severity: critical
+        annotations:
+          description: Kubernetes API Certificate is expiring in less than 1 day
+        expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="86400"}) > 0
+  node.rules.yaml: |
    groups:
-    - name: ./node.rules
+    - name: node.rules
      rules:
+      - record: instance:node_cpu:rate:sum
+        expr: sum(rate(node_cpu{mode!="idle",mode!="iowait",mode!~"^(?:guest.*)$"}[3m]))
+          BY (instance)
+      - record: instance:node_filesystem_usage:sum
+        expr: sum((node_filesystem_size{mountpoint="/"} - node_filesystem_free{mountpoint="/"}))
+          BY (instance)
+      - record: instance:node_network_receive_bytes:rate:sum
+        expr: sum(rate(node_network_receive_bytes[3m])) BY (instance)
+      - record: instance:node_network_transmit_bytes:rate:sum
+        expr: sum(rate(node_network_transmit_bytes[3m])) BY (instance)
+      - record: instance:node_cpu:ratio
+        expr: sum(rate(node_cpu{mode!="idle"}[5m])) WITHOUT (cpu, mode) / ON(instance)
+          GROUP_LEFT() count(sum(node_cpu) BY (instance, cpu)) BY (instance)
+      - record: cluster:node_cpu:sum_rate5m
+        expr: sum(rate(node_cpu{mode!="idle"}[5m]))
+      - record: cluster:node_cpu:ratio
+        expr: cluster:node_cpu:rate5m / count(sum(node_cpu) BY (instance, cpu))
      - alert: NodeExporterDown
-        expr: absent(up{kubernetes_name="node-exporter"} == 1)
+        expr: absent(up{job="node-exporter"} == 1)
        for: 10m
        labels:
          severity: warning
        annotations:
          description: Prometheus could not scrape a node-exporter for more than 10m,
-            or node-exporters have disappeared from discovery.
-          summary: node-exporter cannot be scraped
-      - alert: K8SNodeOutOfDisk
-        expr: kube_node_status_condition{condition="OutOfDisk",status="true"} == 1
+            or node-exporters have disappeared from discovery
+      - alert: NodeDiskRunningFull
+        expr: predict_linear(node_filesystem_free[6h], 3600 * 24) < 0
+        for: 30m
+        labels:
+          severity: warning
+        annotations:
+          description: device {{$labels.device}} on node {{$labels.instance}} is running
+            full within the next 24 hours (mounted at {{$labels.mountpoint}})
+      - alert: NodeDiskRunningFull
+        expr: predict_linear(node_filesystem_free[30m], 3600 * 2) < 0
+        for: 10m
        labels:
-          service: k8s
          severity: critical
        annotations:
-          description: '{{ $labels.node }} has run out of disk space.'
-          summary: Node ran out of disk space.
-      - alert: K8SNodeMemoryPressure
-        expr: kube_node_status_condition{condition="MemoryPressure",status="true"} ==
-          1
-        labels:
-          service: k8s
-          severity: warning
-        annotations:
-          description: '{{ $labels.node }} is under memory pressure.'
-          summary: Node is under memory pressure.
-      - alert: K8SNodeDiskPressure
-        expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
-        labels:
-          service: k8s
-          severity: warning
-        annotations:
-          description: '{{ $labels.node }} is under disk pressure.'
-          summary: Node is under disk pressure.
-  prometheus.rules.yaml: |+
+          description: device {{$labels.device}} on node {{$labels.instance}} is running
+            full within the next 2 hours (mounted at {{$labels.mountpoint}})
+  prometheus.rules.yaml: |
    groups:
-    - name: ./prometheus.rules
+    - name: prometheus.rules
      rules:
-      - alert: FailedReload
+      - alert: PrometheusConfigReloadFailed
        expr: prometheus_config_last_reload_successful == 0
        for: 10m
        labels:
          severity: warning
        annotations:
-          description: Reloading Prometheus' configuration has failed for {{ $labels.namespace
-            }}/{{ $labels.pod}}.
-          summary: Prometheus configuration reload has failed
+          description: Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}}
+      - alert: PrometheusNotificationQueueRunningFull
+        expr: predict_linear(prometheus_notifications_queue_length[5m], 60 * 30) > prometheus_notifications_queue_capacity
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: Prometheus' alert notification queue is running full for {{$labels.namespace}}/{{
+            $labels.pod}}
+      - alert: PrometheusErrorSendingAlerts
+        expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
+          > 0.01
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
+            $labels.pod}} to Alertmanager {{$labels.Alertmanager}}
+      - alert: PrometheusErrorSendingAlerts
+        expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
+          > 0.03
+        for: 10m
+        labels:
+          severity: critical
+        annotations:
+          description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
+            $labels.pod}} to Alertmanager {{$labels.Alertmanager}}
+      - alert: PrometheusNotConnectedToAlertmanagers
+        expr: prometheus_notifications_alertmanagers_discovered < 1
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected
+            to any Alertmanagers
+      - alert: PrometheusTSDBReloadsFailing
+        expr: increase(prometheus_tsdb_reloads_failures_total[2h]) > 0
+        for: 12h
+        labels:
+          severity: warning
+        annotations:
+          description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}
+            reload failures over the last four hours.'
+          summary: Prometheus has issues reloading data blocks from disk
+      - alert: PrometheusTSDBCompactionsFailing
+        expr: increase(prometheus_tsdb_compactions_failed_total[2h]) > 0
+        for: 12h
+        labels:
+          severity: warning
+        annotations:
+          description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}
+            compaction failures over the last four hours.'
+          summary: Prometheus has issues compacting sample blocks
+      - alert: PrometheusTSDBWALCorruptions
+        expr: tsdb_wal_corruptions_total > 0
+        for: 4h
+        labels:
+          severity: warning
+        annotations:
+          description: '{{$labels.job}} at {{$labels.instance}} has a corrupted write-ahead
+            log (WAL).'
+          summary: Prometheus write-ahead log is corrupted
+      - alert: PrometheusNotIngestingSamples
+        expr: rate(prometheus_tsdb_head_samples_appended_total[5m]) <= 0
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: "Prometheus {{ $labels.namespace }}/{{ $labels.pod}} isn't ingesting samples."
+          summary: "Prometheus isn't ingesting samples"
--- a/addons/prometheus/service-account.yaml
+++ b/addons/prometheus/service-account.yaml
@ -0,0 +1,5 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: prometheus
+  namespace: monitoring
--- a/addons/prometheus/service.yaml
+++ b/addons/prometheus/service.yaml
@ -3,6 +3,8 @@ kind: Service
 metadata:
  name: prometheus
  namespace: monitoring
+  annotations:
+    prometheus.io/scrape: 'true'
 spec:
  type: ClusterIP
  selector:
--- a/aws/container-linux/kubernetes/README.md
+++ b/aws/container-linux/kubernetes/README.md
@ -1,4 +1,4 @@
-# Typhoon
+# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">

 Typhoon is a minimal and free Kubernetes distribution.

@ -9,12 +9,13 @@ Typhoon is a minimal and free Kubernetes distribution.

 Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.

-## Features
+## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.9.6 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
-* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
+* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/)
+* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)

 ## Docs

--- a/aws/container-linux/kubernetes/apiserver.tf
+++ b/aws/container-linux/kubernetes/apiserver.tf
@ -0,0 +1,69 @@
+# kube-apiserver Network Load Balancer DNS Record
+resource "aws_route53_record" "apiserver" {
+  zone_id = "${var.dns_zone_id}"
+
+  name = "${format("%s.%s.", var.cluster_name, var.dns_zone)}"
+  type = "A"
+
+  # AWS recommends their special "alias" records for ELBs
+  alias {
+    name                   = "${aws_lb.apiserver.dns_name}"
+    zone_id                = "${aws_lb.apiserver.zone_id}"
+    evaluate_target_health = true
+  }
+}
+
+# Network Load Balancer for apiservers
+resource "aws_lb" "apiserver" {
+  name               = "${var.cluster_name}-apiserver"
+  load_balancer_type = "network"
+  internal           = false
+
+  subnets = ["${aws_subnet.public.*.id}"]
+
+  enable_cross_zone_load_balancing = true
+}
+
+# Forward HTTP traffic to controllers
+resource "aws_lb_listener" "apiserver-https" {
+  load_balancer_arn = "${aws_lb.apiserver.arn}"
+  protocol          = "TCP"
+  port              = "443"
+
+  default_action {
+    type             = "forward"
+    target_group_arn = "${aws_lb_target_group.controllers.arn}"
+  }
+}
+
+# Target group of controllers
+resource "aws_lb_target_group" "controllers" {
+  name        = "${var.cluster_name}-controllers"
+  vpc_id      = "${aws_vpc.network.id}"
+  target_type = "instance"
+
+  protocol = "TCP"
+  port     = 443
+
+  # Kubelet HTTP health check
+  health_check {
+    protocol = "TCP"
+    port     = 443
+
+    # NLBs required to use same healthy and unhealthy thresholds
+    healthy_threshold   = 3
+    unhealthy_threshold = 3
+
+    # Interval between health checks required to be 10 or 30
+    interval = 10
+  }
+}
+
+# Attach controller instances to apiserver NLB
+resource "aws_lb_target_group_attachment" "controllers" {
+  count = "${var.controller_count}"
+
+  target_group_arn = "${aws_lb_target_group.controllers.arn}"
+  target_id        = "${element(aws_instance.controllers.*.id, count.index)}"
+  port             = 443
+}
--- a/aws/container-linux/kubernetes/bootkube.tf
+++ b/aws/container-linux/kubernetes/bootkube.tf
@ -1,13 +1,14 @@
 # Self-hosted Kubernetes assets (kubeconfig, manifests)
 module "bootkube" {
-  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
+  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=88b361207d42ec3121930a4add6b64ba7cf18360"

-  cluster_name = "${var.cluster_name}"
-  api_servers  = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
-  etcd_servers = ["${aws_route53_record.etcds.*.fqdn}"]
-  asset_dir    = "${var.asset_dir}"
-  networking   = "${var.networking}"
-  network_mtu  = "${var.network_mtu}"
-  pod_cidr     = "${var.pod_cidr}"
-  service_cidr = "${var.service_cidr}"
+  cluster_name          = "${var.cluster_name}"
+  api_servers           = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
+  etcd_servers          = ["${aws_route53_record.etcds.*.fqdn}"]
+  asset_dir             = "${var.asset_dir}"
+  networking            = "${var.networking}"
+  network_mtu           = "${var.network_mtu}"
+  pod_cidr              = "${var.pod_cidr}"
+  service_cidr          = "${var.service_cidr}"
+  cluster_domain_suffix = "${var.cluster_domain_suffix}"
 }
--- a/aws/container-linux/kubernetes/cl/controller.yaml.tmpl
+++ b/aws/container-linux/kubernetes/cl/controller.yaml.tmpl
@ -7,7 +7,7 @@ systemd:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
-            Environment="ETCD_IMAGE_TAG=v3.2.0"
+            Environment="ETCD_IMAGE_TAG=v3.3.2"
            Environment="ETCD_NAME=${etcd_name}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -41,11 +41,12 @@ systemd:
        ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
        [Install]
        RequiredBy=kubelet.service
+        RequiredBy=etcd-member.service
    - name: kubelet.service
      enable: true
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -65,6 +66,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -72,15 +74,17 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --kubeconfig=/etc/kubernetes/kubeconfig \
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/master \
+          --node-labels=node-role.kubernetes.io/controller="true" \
          --pod-manifest-path=/etc/kubernetes/manifests \
-          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule
+          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=10
@ -106,29 +110,14 @@ storage:
      mode: 0644
      contents:
        inline: |
-          apiVersion: v1
-          kind: Config
-          clusters:
-          - name: local
-            cluster:
-              server: ${kubeconfig_server}
-              certificate-authority-data: ${kubeconfig_ca_cert}
-          users:
-          - name: kubelet
-            user:
-              client-certificate-data: ${kubeconfig_kubelet_cert}
-              client-key-data: ${kubeconfig_kubelet_key}
-          contexts:
-          - context:
-              cluster: local
-              user: kubelet
+          ${kubeconfig}
    - path: /etc/kubernetes/kubelet.env
      filesystem: root
      mode: 0644
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.6
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -147,11 +136,9 @@ storage:
          # Wrapper for bootkube start
          set -e
          # Move experimental manifests
-          [ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
-          [ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
-          [ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
+          [ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
          BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
-          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
+          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.11.0}"
          BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
          exec /usr/bin/rkt run \
            --trust-keys-from-https \
--- a/aws/container-linux/kubernetes/controllers.tf
+++ b/aws/container-linux/kubernetes/controllers.tf
@ -36,6 +36,10 @@ resource "aws_instance" "controllers" {
  associate_public_ip_address = true
  subnet_id                   = "${element(aws_subnet.public.*.id, count.index)}"
  vpc_security_group_ids      = ["${aws_security_group.controller.id}"]
+
+  lifecycle {
+    ignore_changes = ["ami"]
+  }
 }

 # Controller Container Linux Config
@ -52,12 +56,10 @@ data "template_file" "controller_config" {
    # etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
    etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"

-    k8s_dns_service_ip      = "${cidrhost(var.service_cidr, 10)}"
-    ssh_authorized_key      = "${var.ssh_authorized_key}"
-    kubeconfig_ca_cert      = "${module.bootkube.ca_cert}"
-    kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
-    kubeconfig_kubelet_key  = "${module.bootkube.kubelet_key}"
-    kubeconfig_server       = "${module.bootkube.server}"
+    k8s_dns_service_ip    = "${cidrhost(var.service_cidr, 10)}"
+    ssh_authorized_key    = "${var.ssh_authorized_key}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
+    kubeconfig            = "${indent(10, module.bootkube.kubeconfig)}"
  }
 }

@ -76,186 +78,5 @@ data "ct_config" "controller_ign" {
  count        = "${var.controller_count}"
  content      = "${element(data.template_file.controller_config.*.rendered, count.index)}"
  pretty_print = false
-}
-
-# Security Group (instance firewall)
-
-resource "aws_security_group" "controller" {
-  name        = "${var.cluster_name}-controller"
-  description = "${var.cluster_name} controller security group"
-
-  vpc_id = "${aws_vpc.network.id}"
-
-  tags = "${map("Name", "${var.cluster_name}-controller")}"
-}
-
-resource "aws_security_group_rule" "controller-icmp" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type        = "ingress"
-  protocol    = "icmp"
-  from_port   = 0
-  to_port     = 0
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "controller-ssh" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 22
-  to_port     = 22
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "controller-apiserver" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 443
-  to_port     = 443
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "controller-etcd" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 2379
-  to_port   = 2380
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-flannel" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = "udp"
-  from_port                = 8472
-  to_port                  = 8472
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-flannel-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = "udp"
-  from_port = 8472
-  to_port   = 8472
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-node-exporter" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 9100
-  to_port                  = 9100
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-kubelet-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 10250
-  to_port   = 10250
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-kubelet-read" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 10255
-  to_port                  = 10255
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-kubelet-read-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 10255
-  to_port   = 10255
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-bgp" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 179
-  to_port                  = 179
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-bgp-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 179
-  to_port   = 179
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-ipip" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = 4
-  from_port                = 0
-  to_port                  = 0
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-ipip-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = 4
-  from_port = 0
-  to_port   = 0
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-ipip-legacy" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = 94
-  from_port                = 0
-  to_port                  = 0
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-ipip-legacy-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = 94
-  from_port = 0
-  to_port   = 0
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-egress" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type             = "egress"
-  protocol         = "-1"
-  from_port        = 0
-  to_port          = 0
-  cidr_blocks      = ["0.0.0.0/0"]
-  ipv6_cidr_blocks = ["::/0"]
+  snippets     = ["${var.controller_clc_snippets}"]
 }
--- a/aws/container-linux/kubernetes/elb.tf
+++ b/aws/container-linux/kubernetes/elb.tf
@ -1,43 +0,0 @@
-# kube-apiserver Network Load Balancer DNS Record
-resource "aws_route53_record" "apiserver" {
-  zone_id = "${var.dns_zone_id}"
-
-  name = "${format("%s.%s.", var.cluster_name, var.dns_zone)}"
-  type = "A"
-
-  # AWS recommends their special "alias" records for ELBs
-  alias {
-    name                   = "${aws_elb.apiserver.dns_name}"
-    zone_id                = "${aws_elb.apiserver.zone_id}"
-    evaluate_target_health = true
-  }
-}
-
-# Controller Network Load Balancer
-resource "aws_elb" "apiserver" {
-  name            = "${var.cluster_name}-apiserver"
-  subnets         = ["${aws_subnet.public.*.id}"]
-  security_groups = ["${aws_security_group.controller.id}"]
-
-  listener {
-    lb_port           = 443
-    lb_protocol       = "tcp"
-    instance_port     = 443
-    instance_protocol = "tcp"
-  }
-
-  instances = ["${aws_instance.controllers.*.id}"]
-
-  # Kubelet HTTP health check
-  health_check {
-    target              = "SSL:443"
-    healthy_threshold   = 2
-    unhealthy_threshold = 4
-    timeout             = 5
-    interval            = 6
-  }
-
-  idle_timeout                = 3600
-  connection_draining         = true
-  connection_draining_timeout = 300
-}
--- a/aws/container-linux/kubernetes/ingress.tf
+++ b/aws/container-linux/kubernetes/ingress.tf
@ -1,32 +0,0 @@
-# Ingress Network Load Balancer
-resource "aws_elb" "ingress" {
-  name            = "${var.cluster_name}-ingress"
-  subnets         = ["${aws_subnet.public.*.id}"]
-  security_groups = ["${aws_security_group.worker.id}"]
-
-  listener {
-    lb_port           = 80
-    lb_protocol       = "tcp"
-    instance_port     = 80
-    instance_protocol = "tcp"
-  }
-
-  listener {
-    lb_port           = 443
-    lb_protocol       = "tcp"
-    instance_port     = 443
-    instance_protocol = "tcp"
-  }
-
-  # Ingress Controller HTTP health check
-  health_check {
-    target              = "HTTP:10254/healthz"
-    healthy_threshold   = 2
-    unhealthy_threshold = 4
-    timeout             = 5
-    interval            = 6
-  }
-
-  connection_draining         = true
-  connection_draining_timeout = 300
-}
--- a/aws/container-linux/kubernetes/outputs.tf
+++ b/aws/container-linux/kubernetes/outputs.tf
@ -1,4 +1,25 @@
 output "ingress_dns_name" {
-  value       = "${aws_elb.ingress.dns_name}"
-  description = "DNS name of the ELB for distributing traffic to Ingress controllers"
+  value       = "${module.workers.ingress_dns_name}"
+  description = "DNS name of the network load balancer for distributing traffic to Ingress controllers"
+}
+
+# Outputs for worker pools
+
+output "vpc_id" {
+  value       = "${aws_vpc.network.id}"
+  description = "ID of the VPC for creating worker instances"
+}
+
+output "subnet_ids" {
+  value       = ["${aws_subnet.public.*.id}"]
+  description = "List of subnet IDs for creating worker instances"
+}
+
+output "worker_security_groups" {
+  value       = ["${aws_security_group.worker.id}"]
+  description = "List of worker security group IDs"
+}
+
+output "kubeconfig" {
+  value = "${module.bootkube.kubeconfig}"
 }
--- a/aws/container-linux/kubernetes/require.tf
+++ b/aws/container-linux/kubernetes/require.tf
@ -5,7 +5,7 @@ terraform {
 }

 provider "aws" {
-  version = "~> 1.0"
+  version = "~> 1.11"
 }

 provider "local" {
--- a/aws/container-linux/kubernetes/security.tf
+++ b/aws/container-linux/kubernetes/security.tf
@ -0,0 +1,385 @@
+# Security Groups (instance firewalls)
+
+# Controller security group
+
+resource "aws_security_group" "controller" {
+  name        = "${var.cluster_name}-controller"
+  description = "${var.cluster_name} controller security group"
+
+  vpc_id = "${aws_vpc.network.id}"
+
+  tags = "${map("Name", "${var.cluster_name}-controller")}"
+}
+
+resource "aws_security_group_rule" "controller-icmp" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type        = "ingress"
+  protocol    = "icmp"
+  from_port   = 0
+  to_port     = 0
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "controller-ssh" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 22
+  to_port     = 22
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "controller-apiserver" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 443
+  to_port     = 443
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "controller-etcd" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 2379
+  to_port   = 2380
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-flannel" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = "udp"
+  from_port                = 8472
+  to_port                  = 8472
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-flannel-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = "udp"
+  from_port = 8472
+  to_port   = 8472
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-node-exporter" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 9100
+  to_port                  = 9100
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-kubelet-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 10250
+  to_port   = 10250
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-kubelet-read" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 10255
+  to_port                  = 10255
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-kubelet-read-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 10255
+  to_port   = 10255
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-bgp" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 179
+  to_port                  = 179
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-bgp-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 179
+  to_port   = 179
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-ipip" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = 4
+  from_port                = 0
+  to_port                  = 0
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-ipip-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = 4
+  from_port = 0
+  to_port   = 0
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-ipip-legacy" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = 94
+  from_port                = 0
+  to_port                  = 0
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-ipip-legacy-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = 94
+  from_port = 0
+  to_port   = 0
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-egress" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type             = "egress"
+  protocol         = "-1"
+  from_port        = 0
+  to_port          = 0
+  cidr_blocks      = ["0.0.0.0/0"]
+  ipv6_cidr_blocks = ["::/0"]
+}
+
+# Worker security group
+
+resource "aws_security_group" "worker" {
+  name        = "${var.cluster_name}-worker"
+  description = "${var.cluster_name} worker security group"
+
+  vpc_id = "${aws_vpc.network.id}"
+
+  tags = "${map("Name", "${var.cluster_name}-worker")}"
+}
+
+resource "aws_security_group_rule" "worker-icmp" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type        = "ingress"
+  protocol    = "icmp"
+  from_port   = 0
+  to_port     = 0
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "worker-ssh" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 22
+  to_port     = 22
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "worker-http" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 80
+  to_port     = 80
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "worker-https" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 443
+  to_port     = 443
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "worker-flannel" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = "udp"
+  from_port                = 8472
+  to_port                  = 8472
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-flannel-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = "udp"
+  from_port = 8472
+  to_port   = 8472
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-node-exporter" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 9100
+  to_port   = 9100
+  self      = true
+}
+
+resource "aws_security_group_rule" "ingress-health" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 10254
+  to_port     = 10254
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "worker-kubelet" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 10250
+  to_port                  = 10250
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-kubelet-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 10250
+  to_port   = 10250
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-kubelet-read" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 10255
+  to_port                  = 10255
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-kubelet-read-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 10255
+  to_port   = 10255
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-bgp" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 179
+  to_port                  = 179
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-bgp-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 179
+  to_port   = 179
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-ipip" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = 4
+  from_port                = 0
+  to_port                  = 0
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-ipip-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = 4
+  from_port = 0
+  to_port   = 0
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-ipip-legacy" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = 94
+  from_port                = 0
+  to_port                  = 0
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-ipip-legacy-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = 94
+  from_port = 0
+  to_port   = 0
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-egress" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type             = "egress"
+  protocol         = "-1"
+  from_port        = 0
+  to_port          = 0
+  cidr_blocks      = ["0.0.0.0/0"]
+  ipv6_cidr_blocks = ["::/0"]
+}
--- a/aws/container-linux/kubernetes/variables.tf
+++ b/aws/container-linux/kubernetes/variables.tf
@ -60,6 +60,18 @@ variable "worker_type" {
  description = "Worker EC2 instance type"
 }

+variable "controller_clc_snippets" {
+  type        = "list"
+  description = "Controller Container Linux Config snippets"
+  default     = []
+}
+
+variable "worker_clc_snippets" {
+  type        = "list"
+  description = "Worker Container Linux Config snippets"
+  default     = []
+}
+
 # bootkube assets

 variable "asset_dir" {
@ -94,3 +106,9 @@ EOD
  type    = "string"
  default = "10.3.0.0/16"
 }
+
+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
--- a/aws/container-linux/kubernetes/workers.tf
+++ b/aws/container-linux/kubernetes/workers.tf
@ -1,274 +1,20 @@
-# Workers AutoScaling Group
-resource "aws_autoscaling_group" "workers" {
-  name           = "${var.cluster_name}-worker ${aws_launch_configuration.worker.name}"
-  load_balancers = ["${aws_elb.ingress.id}"]
+module "workers" {
+  source = "workers"
+  name   = "${var.cluster_name}"

-  # count
-  desired_capacity          = "${var.worker_count}"
-  min_size                  = "${var.worker_count}"
-  max_size                  = "${var.worker_count + 2}"
-  default_cooldown          = 30
-  health_check_grace_period = 30
-
-  # network
-  vpc_zone_identifier = ["${aws_subnet.public.*.id}"]
-
-  # template
-  launch_configuration = "${aws_launch_configuration.worker.name}"
-
-  lifecycle {
-    # override the default destroy and replace update behavior
-    create_before_destroy = true
-    ignore_changes        = ["image_id"]
-  }
-
-  tags = [{
-    key                 = "Name"
-    value               = "${var.cluster_name}-worker"
-    propagate_at_launch = true
-  }]
-}
-
-# Worker template
-resource "aws_launch_configuration" "worker" {
-  image_id      = "${data.aws_ami.coreos.image_id}"
-  instance_type = "${var.worker_type}"
-
-  user_data = "${data.ct_config.worker_ign.rendered}"
-
-  # storage
-  root_block_device {
-    volume_type = "standard"
-    volume_size = "${var.disk_size}"
-  }
-
-  # network
+  # AWS
+  vpc_id          = "${aws_vpc.network.id}"
+  subnet_ids      = ["${aws_subnet.public.*.id}"]
  security_groups = ["${aws_security_group.worker.id}"]
+  count           = "${var.worker_count}"
+  instance_type   = "${var.worker_type}"
+  os_channel      = "${var.os_channel}"
+  disk_size       = "${var.disk_size}"

-  lifecycle {
-    // Override the default destroy and replace update behavior
-    create_before_destroy = true
-  }
-}
-
-# Worker Container Linux Config
-data "template_file" "worker_config" {
-  template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
-
-  vars = {
-    k8s_dns_service_ip      = "${cidrhost(var.service_cidr, 10)}"
-    k8s_etcd_service_ip     = "${cidrhost(var.service_cidr, 15)}"
-    ssh_authorized_key      = "${var.ssh_authorized_key}"
-    kubeconfig_ca_cert      = "${module.bootkube.ca_cert}"
-    kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
-    kubeconfig_kubelet_key  = "${module.bootkube.kubelet_key}"
-    kubeconfig_server       = "${module.bootkube.server}"
-  }
-}
-
-data "ct_config" "worker_ign" {
-  content      = "${data.template_file.worker_config.rendered}"
-  pretty_print = false
-}
-
-# Security Group (instance firewall)
-
-resource "aws_security_group" "worker" {
-  name        = "${var.cluster_name}-worker"
-  description = "${var.cluster_name} worker security group"
-
-  vpc_id = "${aws_vpc.network.id}"
-
-  tags = "${map("Name", "${var.cluster_name}-worker")}"
-}
-
-resource "aws_security_group_rule" "worker-icmp" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type        = "ingress"
-  protocol    = "icmp"
-  from_port   = 0
-  to_port     = 0
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "worker-ssh" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 22
-  to_port     = 22
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "worker-http" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 80
-  to_port     = 80
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "worker-https" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 443
-  to_port     = 443
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "worker-flannel" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = "udp"
-  from_port                = 8472
-  to_port                  = 8472
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-flannel-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = "udp"
-  from_port = 8472
-  to_port   = 8472
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-node-exporter" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 9100
-  to_port     = 9100
-  self = true
-}
-
-resource "aws_security_group_rule" "worker-kubelet" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 10250
-  to_port                  = 10250
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-kubelet-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 10250
-  to_port   = 10250
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-kubelet-read" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 10255
-  to_port                  = 10255
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-kubelet-read-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 10255
-  to_port   = 10255
-  self      = true
-}
-
-resource "aws_security_group_rule" "ingress-health-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 10254
-  to_port   = 10254
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-bgp" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 179
-  to_port                  = 179
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-bgp-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 179
-  to_port   = 179
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-ipip" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = 4
-  from_port                = 0
-  to_port                  = 0
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-ipip-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = 4
-  from_port = 0
-  to_port   = 0
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-ipip-legacy" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = 94
-  from_port                = 0
-  to_port                  = 0
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-ipip-legacy-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = 94
-  from_port = 0
-  to_port   = 0
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-egress" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type             = "egress"
-  protocol         = "-1"
-  from_port        = 0
-  to_port          = 0
-  cidr_blocks      = ["0.0.0.0/0"]
-  ipv6_cidr_blocks = ["::/0"]
+  # configuration
+  kubeconfig            = "${module.bootkube.kubeconfig}"
+  ssh_authorized_key    = "${var.ssh_authorized_key}"
+  service_cidr          = "${var.service_cidr}"
+  cluster_domain_suffix = "${var.cluster_domain_suffix}"
+  clc_snippets          = "${var.worker_clc_snippets}"
 }
--- a/aws/container-linux/kubernetes/workers/ami.tf
+++ b/aws/container-linux/kubernetes/workers/ami.tf
@ -0,0 +1,19 @@
+data "aws_ami" "coreos" {
+  most_recent = true
+  owners      = ["595879546273"]
+
+  filter {
+    name   = "architecture"
+    values = ["x86_64"]
+  }
+
+  filter {
+    name   = "virtualization-type"
+    values = ["hvm"]
+  }
+
+  filter {
+    name   = "name"
+    values = ["CoreOS-${var.os_channel}-*"]
+  }
+}
--- a/aws/container-linux/kubernetes/workers/cl/worker.yaml.tmpl
+++ b/aws/container-linux/kubernetes/workers/cl/worker.yaml.tmpl
@ -22,7 +22,7 @@ systemd:
      enable: true
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -42,6 +42,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -49,14 +50,15 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --kubeconfig=/etc/kubernetes/kubeconfig \
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/node \
-          --pod-manifest-path=/etc/kubernetes/manifests
+          --pod-manifest-path=/etc/kubernetes/manifests \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=5
@ -81,29 +83,14 @@ storage:
      mode: 0644
      contents:
        inline: |
-          apiVersion: v1
-          kind: Config
-          clusters:
-          - name: local
-            cluster:
-              server: ${kubeconfig_server}
-              certificate-authority-data: ${kubeconfig_ca_cert}
-          users:
-          - name: kubelet
-            user:
-              client-certificate-data: ${kubeconfig_kubelet_cert}
-              client-key-data: ${kubeconfig_kubelet_key}
-          contexts:
-          - context:
-              cluster: local
-              user: kubelet
+          ${kubeconfig}
    - path: /etc/kubernetes/kubelet.env
      filesystem: root
      mode: 0644
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.6
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -121,7 +108,7 @@ storage:
            --volume config,kind=host,source=/etc/kubernetes \
            --mount volume=config,target=/etc/kubernetes \
            --insecure-options=image \
-            docker://gcr.io/google_containers/hyperkube:v1.8.3 \
+            docker://gcr.io/google_containers/hyperkube:v1.9.6 \
            --net=host \
            --dns=host \
            --exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
--- a/aws/container-linux/kubernetes/workers/ingress.tf
+++ b/aws/container-linux/kubernetes/workers/ingress.tf
@ -0,0 +1,82 @@
+# Network Load Balancer for Ingress
+resource "aws_lb" "ingress" {
+  name               = "${var.name}-ingress"
+  load_balancer_type = "network"
+  internal           = false
+
+  subnets = ["${var.subnet_ids}"]
+
+  enable_cross_zone_load_balancing = true
+}
+
+# Forward HTTP traffic to workers
+resource "aws_lb_listener" "ingress-http" {
+  load_balancer_arn = "${aws_lb.ingress.arn}"
+  protocol          = "TCP"
+  port              = 80
+
+  default_action {
+    type             = "forward"
+    target_group_arn = "${aws_lb_target_group.workers-http.arn}"
+  }
+}
+
+# Forward HTTPS traffic to workers
+resource "aws_lb_listener" "ingress-https" {
+  load_balancer_arn = "${aws_lb.ingress.arn}"
+  protocol          = "TCP"
+  port              = 443
+
+  default_action {
+    type             = "forward"
+    target_group_arn = "${aws_lb_target_group.workers-https.arn}"
+  }
+}
+
+# Network Load Balancer target groups of instances
+
+resource "aws_lb_target_group" "workers-http" {
+  name        = "${var.name}-workers-http"
+  vpc_id      = "${var.vpc_id}"
+  target_type = "instance"
+
+  protocol = "TCP"
+  port     = 80
+
+  # Ingress Controller HTTP health check
+  health_check {
+    protocol = "HTTP"
+    port     = 10254
+    path     = "/healthz"
+
+    # NLBs required to use same healthy and unhealthy thresholds
+    healthy_threshold   = 3
+    unhealthy_threshold = 3
+
+    # Interval between health checks required to be 10 or 30
+    interval = 10
+  }
+}
+
+resource "aws_lb_target_group" "workers-https" {
+  name        = "${var.name}-workers-https"
+  vpc_id      = "${var.vpc_id}"
+  target_type = "instance"
+
+  protocol = "TCP"
+  port     = 443
+
+  # Ingress Controller HTTP health check
+  health_check {
+    protocol = "HTTP"
+    port     = 10254
+    path     = "/healthz"
+
+    # NLBs required to use same healthy and unhealthy thresholds
+    healthy_threshold   = 3
+    unhealthy_threshold = 3
+
+    # Interval between health checks required to be 10 or 30
+    interval = 10
+  }
+}
--- a/aws/container-linux/kubernetes/workers/outputs.tf
+++ b/aws/container-linux/kubernetes/workers/outputs.tf
@ -0,0 +1,4 @@
+output "ingress_dns_name" {
+  value       = "${aws_lb.ingress.dns_name}"
+  description = "DNS name of the network load balancer for distributing traffic to Ingress controllers"
+}
--- a/aws/container-linux/kubernetes/workers/variables.tf
+++ b/aws/container-linux/kubernetes/workers/variables.tf
@ -0,0 +1,79 @@
+variable "name" {
+  type        = "string"
+  description = "Unique name instance group"
+}
+
+variable "vpc_id" {
+  type        = "string"
+  description = "ID of the VPC for creating instances"
+}
+
+variable "subnet_ids" {
+  type        = "list"
+  description = "List of subnet IDs for creating instances"
+}
+
+variable "security_groups" {
+  type        = "list"
+  description = "List of security group IDs"
+}
+
+# instances
+
+variable "count" {
+  type        = "string"
+  default     = "1"
+  description = "Number of instances"
+}
+
+variable "instance_type" {
+  type        = "string"
+  default     = "t2.small"
+  description = "EC2 instance type"
+}
+
+variable "os_channel" {
+  type        = "string"
+  default     = "stable"
+  description = "Container Linux AMI channel (stable, beta, alpha)"
+}
+
+variable "disk_size" {
+  type        = "string"
+  default     = "40"
+  description = "Size of the disk in GB"
+}
+
+# configuration
+
+variable "kubeconfig" {
+  type        = "string"
+  description = "Generated Kubelet kubeconfig"
+}
+
+variable "ssh_authorized_key" {
+  type        = "string"
+  description = "SSH public key for user 'core'"
+}
+
+variable "service_cidr" {
+  description = <<EOD
+CIDR IPv4 range to assign Kubernetes services.
+The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for kube-dns.
+EOD
+
+  type    = "string"
+  default = "10.3.0.0/16"
+}
+
+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
+
+variable "clc_snippets" {
+  type        = "list"
+  description = "Container Linux Config snippets"
+  default     = []
+}
--- a/aws/container-linux/kubernetes/workers/workers.tf
+++ b/aws/container-linux/kubernetes/workers/workers.tf
@ -0,0 +1,75 @@
+# Workers AutoScaling Group
+resource "aws_autoscaling_group" "workers" {
+  name = "${var.name}-worker ${aws_launch_configuration.worker.name}"
+
+  # count
+  desired_capacity          = "${var.count}"
+  min_size                  = "${var.count}"
+  max_size                  = "${var.count + 2}"
+  default_cooldown          = 30
+  health_check_grace_period = 30
+
+  # network
+  vpc_zone_identifier = ["${var.subnet_ids}"]
+
+  # template
+  launch_configuration = "${aws_launch_configuration.worker.name}"
+
+  # target groups to which instances should be added
+  target_group_arns = [
+    "${aws_lb_target_group.workers-http.id}",
+    "${aws_lb_target_group.workers-https.id}",
+  ]
+
+  lifecycle {
+    # override the default destroy and replace update behavior
+    create_before_destroy = true
+  }
+
+  tags = [{
+    key                 = "Name"
+    value               = "${var.name}-worker"
+    propagate_at_launch = true
+  }]
+}
+
+# Worker template
+resource "aws_launch_configuration" "worker" {
+  image_id      = "${data.aws_ami.coreos.image_id}"
+  instance_type = "${var.instance_type}"
+
+  user_data = "${data.ct_config.worker_ign.rendered}"
+
+  # storage
+  root_block_device {
+    volume_type = "standard"
+    volume_size = "${var.disk_size}"
+  }
+
+  # network
+  security_groups = ["${var.security_groups}"]
+
+  lifecycle {
+    // Override the default destroy and replace update behavior
+    create_before_destroy = true
+    ignore_changes        = ["image_id"]
+  }
+}
+
+# Worker Container Linux Config
+data "template_file" "worker_config" {
+  template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
+
+  vars = {
+    kubeconfig            = "${indent(10, var.kubeconfig)}"
+    ssh_authorized_key    = "${var.ssh_authorized_key}"
+    k8s_dns_service_ip    = "${cidrhost(var.service_cidr, 10)}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
+  }
+}
+
+data "ct_config" "worker_ign" {
+  content      = "${data.template_file.worker_config.rendered}"
+  pretty_print = false
+  snippets     = ["${var.clc_snippets}"]
+}
--- a/bare-metal/container-linux/kubernetes/README.md
+++ b/bare-metal/container-linux/kubernetes/README.md
@ -1,4 +1,4 @@
-# Typhoon
+# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">

 Typhoon is a minimal and free Kubernetes distribution.

@ -9,12 +9,12 @@ Typhoon is a minimal and free Kubernetes distribution.

 Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.

-## Features
+## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.9.6 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
-* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
+* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)

 ## Docs

--- a/bare-metal/container-linux/kubernetes/bootkube.tf
+++ b/bare-metal/container-linux/kubernetes/bootkube.tf
@ -1,13 +1,14 @@
 # Self-hosted Kubernetes assets (kubeconfig, manifests)
 module "bootkube" {
-  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
+  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=88b361207d42ec3121930a4add6b64ba7cf18360"

-  cluster_name = "${var.cluster_name}"
-  api_servers  = ["${var.k8s_domain_name}"]
-  etcd_servers = ["${var.controller_domains}"]
-  asset_dir    = "${var.asset_dir}"
-  networking   = "${var.networking}"
-  network_mtu  = "${var.network_mtu}"
-  pod_cidr     = "${var.pod_cidr}"
-  service_cidr = "${var.service_cidr}"
+  cluster_name          = "${var.cluster_name}"
+  api_servers           = ["${var.k8s_domain_name}"]
+  etcd_servers          = ["${var.controller_domains}"]
+  asset_dir             = "${var.asset_dir}"
+  networking            = "${var.networking}"
+  network_mtu           = "${var.network_mtu}"
+  pod_cidr              = "${var.pod_cidr}"
+  service_cidr          = "${var.service_cidr}"
+  cluster_domain_suffix = "${var.cluster_domain_suffix}"
 }
--- a/bare-metal/container-linux/kubernetes/cl/controller.yaml.tmpl
+++ b/bare-metal/container-linux/kubernetes/cl/controller.yaml.tmpl
@ -7,7 +7,7 @@ systemd:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
-            Environment="ETCD_IMAGE_TAG=v3.2.0"
+            Environment="ETCD_IMAGE_TAG=v3.3.2"
            Environment="ETCD_NAME=${etcd_name}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
@ -50,10 +50,11 @@ systemd:
        ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
        [Install]
        RequiredBy=kubelet.service
+        RequiredBy=etcd-member.service
    - name: kubelet.service
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -73,6 +74,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -80,7 +82,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --hostname-override=${domain_name} \
@ -88,8 +90,10 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/master \
+          --node-labels=node-role.kubernetes.io/controller="true" \
          --pod-manifest-path=/etc/kubernetes/manifests \
-          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule
+          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=10
@ -114,7 +118,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.6
    - path: /etc/hostname
      filesystem: root
      mode: 0644
@ -139,11 +143,9 @@ storage:
          # Wrapper for bootkube start
          set -e
          # Move experimental manifests
-          [ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
-          [ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
-          [ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
+          [ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
          BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
-          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
+          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.11.0}"
          BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
          exec /usr/bin/rkt run \
            --trust-keys-from-https \
--- a/bare-metal/container-linux/kubernetes/cl/worker.yaml.tmpl
+++ b/bare-metal/container-linux/kubernetes/cl/worker.yaml.tmpl
@ -30,7 +30,7 @@ systemd:
    - name: kubelet.service
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -50,6 +50,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -57,7 +58,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --hostname-override=${domain_name} \
@ -65,7 +66,8 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/node \
-          --pod-manifest-path=/etc/kubernetes/manifests
+          --pod-manifest-path=/etc/kubernetes/manifests \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=5
@ -80,7 +82,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.6
    - path: /etc/hostname
      filesystem: root
      mode: 0644
--- a/bare-metal/container-linux/kubernetes/groups.tf
+++ b/bare-metal/container-linux/kubernetes/groups.tf
@ -3,7 +3,7 @@ resource "matchbox_group" "container-linux-install" {
  count = "${length(var.controller_names) + length(var.worker_names)}"

  name    = "${format("container-linux-install-%s", element(concat(var.controller_names, var.worker_names), count.index))}"
-  profile = "${var.cached_install == "true" ? matchbox_profile.cached-container-linux-install.name : matchbox_profile.container-linux-install.name}"
+  profile = "${var.cached_install == "true" ? element(matchbox_profile.cached-container-linux-install.*.name, count.index) : element(matchbox_profile.container-linux-install.*.name, count.index)}"

  selector {
    mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}"
--- a/bare-metal/container-linux/kubernetes/profiles.tf
+++ b/bare-metal/container-linux/kubernetes/profiles.tf
@ -1,6 +1,8 @@
 // Container Linux Install profile (from release.core-os.net)
 resource "matchbox_profile" "container-linux-install" {
-  name   = "container-linux-install"
+  count = "${length(var.controller_names) + length(var.worker_names)}"
+  name  = "${format("%s-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}"
+
  kernel = "http://${var.container_linux_channel}.release.core-os.net/amd64-usr/${var.container_linux_version}/coreos_production_pxe.vmlinuz"

  initrd = [
@ -8,6 +10,7 @@ resource "matchbox_profile" "container-linux-install" {
  ]

  args = [
+    "initrd=coreos_production_pxe_image.cpio.gz",
    "coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
    "coreos.first_boot=yes",
    "console=tty0",
@ -15,10 +18,12 @@ resource "matchbox_profile" "container-linux-install" {
    "${var.kernel_args}",
  ]

-  container_linux_config = "${data.template_file.container-linux-install-config.rendered}"
+  container_linux_config = "${element(data.template_file.container-linux-install-configs.*.rendered, count.index)}"
 }

-data "template_file" "container-linux-install-config" {
+data "template_file" "container-linux-install-configs" {
+  count = "${length(var.controller_names) + length(var.worker_names)}"
+
  template = "${file("${path.module}/cl/container-linux-install.yaml.tmpl")}"

  vars {
@ -36,7 +41,9 @@ data "template_file" "container-linux-install-config" {
 // Container Linux Install profile (from matchbox /assets cache)
 // Note: Admin must have downloaded container_linux_version into matchbox assets.
 resource "matchbox_profile" "cached-container-linux-install" {
-  name   = "cached-container-linux-install"
+  count = "${length(var.controller_names) + length(var.worker_names)}"
+  name  = "${format("%s-cached-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}"
+
  kernel = "/assets/coreos/${var.container_linux_version}/coreos_production_pxe.vmlinuz"

  initrd = [
@ -44,6 +51,7 @@ resource "matchbox_profile" "cached-container-linux-install" {
  ]

  args = [
+    "initrd=coreos_production_pxe_image.cpio.gz",
    "coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
    "coreos.first_boot=yes",
    "console=tty0",
@ -51,10 +59,12 @@ resource "matchbox_profile" "cached-container-linux-install" {
    "${var.kernel_args}",
  ]

-  container_linux_config = "${data.template_file.cached-container-linux-install-config.rendered}"
+  container_linux_config = "${element(data.template_file.cached-container-linux-install-configs.*.rendered, count.index)}"
 }

-data "template_file" "cached-container-linux-install-config" {
+data "template_file" "cached-container-linux-install-configs" {
+  count = "${length(var.controller_names) + length(var.worker_names)}"
+
  template = "${file("${path.module}/cl/container-linux-install.yaml.tmpl")}"

  vars {
@ -82,11 +92,12 @@ data "template_file" "controller-configs" {
  template = "${file("${path.module}/cl/controller.yaml.tmpl")}"

  vars {
-    domain_name          = "${element(var.controller_domains, count.index)}"
-    etcd_name            = "${element(var.controller_names, count.index)}"
-    etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
-    k8s_dns_service_ip   = "${module.bootkube.kube_dns_service_ip}"
-    ssh_authorized_key   = "${var.ssh_authorized_key}"
+    domain_name           = "${element(var.controller_domains, count.index)}"
+    etcd_name             = "${element(var.controller_names, count.index)}"
+    etcd_initial_cluster  = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
+    k8s_dns_service_ip    = "${module.bootkube.kube_dns_service_ip}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
+    ssh_authorized_key    = "${var.ssh_authorized_key}"

    # Terraform evaluates both sides regardless and element cannot be used on 0 length lists
    networkd_content = "${length(var.controller_networkds) == 0 ? "" : element(concat(var.controller_networkds, list("")), count.index)}"
@ -106,9 +117,10 @@ data "template_file" "worker-configs" {
  template = "${file("${path.module}/cl/worker.yaml.tmpl")}"

  vars {
-    domain_name        = "${element(var.worker_domains, count.index)}"
-    k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
-    ssh_authorized_key = "${var.ssh_authorized_key}"
+    domain_name           = "${element(var.worker_domains, count.index)}"
+    k8s_dns_service_ip    = "${module.bootkube.kube_dns_service_ip}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
+    ssh_authorized_key    = "${var.ssh_authorized_key}"

    # Terraform evaluates both sides regardless and element cannot be used on 0 length lists
    networkd_content = "${length(var.worker_networkds) == 0 ? "" : element(concat(var.worker_networkds, list("")), count.index)}"
--- a/bare-metal/container-linux/kubernetes/variables.tf
+++ b/bare-metal/container-linux/kubernetes/variables.tf
@ -24,7 +24,7 @@ variable "ssh_authorized_key" {
 }

 # Machines
-# Terraform's crude "type system" does properly support lists of maps so we do this.
+# Terraform's crude "type system" does not properly support lists of maps so we do this.

 variable "controller_names" {
  type = "list"
@ -92,6 +92,12 @@ EOD

 # optional

+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
+
 variable "cached_install" {
  type        = "string"
  default     = "false"
--- a/bare-metal/container-linux/pxe-worker/cl/bootkube-worker.yaml.tmpl
+++ b/bare-metal/container-linux/pxe-worker/cl/bootkube-worker.yaml.tmpl
@ -30,7 +30,7 @@ systemd:
    - name: kubelet.service
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -50,6 +50,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -57,7 +58,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns={{.k8s_dns_service_ip}} \
-          --cluster_domain=cluster.local \
+          --cluster_domain={{.cluster_domain_suffix}} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --hostname-override={{.domain_name}} \
@ -65,7 +66,8 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/node \
-          --pod-manifest-path=/etc/kubernetes/manifests
+          --pod-manifest-path=/etc/kubernetes/manifests \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=5
@ -96,7 +98,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.6
    - path: /etc/hostname
      filesystem: root
      mode: 0644
--- a/bare-metal/container-linux/pxe-worker/groups.tf
+++ b/bare-metal/container-linux/pxe-worker/groups.tf
@ -12,10 +12,8 @@ resource "matchbox_group" "workers" {
    domain_name    = "${element(var.worker_domains, count.index)}"
    etcd_endpoints = "${join(",", formatlist("%s:2379", var.controller_domains))}"

-    # TODO
-    etcd_on_host        = "true"
-    k8s_etcd_service_ip = "10.3.0.15"
-    k8s_dns_service_ip  = "${var.kube_dns_service_ip}"
-    ssh_authorized_key  = "${var.ssh_authorized_key}"
+    k8s_dns_service_ip    = "${var.kube_dns_service_ip}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
+    ssh_authorized_key    = "${var.ssh_authorized_key}"
  }
 }
--- a/bare-metal/container-linux/pxe-worker/profiles.tf
+++ b/bare-metal/container-linux/pxe-worker/profiles.tf
@ -8,6 +8,7 @@ resource "matchbox_profile" "bootkube-worker-pxe" {
  ]

  args = [
+    "initrd=coreos_production_pxe_image.cpio.gz",
    "coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
    "coreos.first_boot=yes",
    "console=tty0",
--- a/bare-metal/container-linux/pxe-worker/variables.tf
+++ b/bare-metal/container-linux/pxe-worker/variables.tf
@ -64,3 +64,9 @@ variable "kernel_args" {
    "root=/dev/sda1",
  ]
 }
+
+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
--- a/digital-ocean/container-linux/kubernetes/README.md
+++ b/digital-ocean/container-linux/kubernetes/README.md
@ -1,4 +1,4 @@
-# Typhoon
+# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">

 Typhoon is a minimal and free Kubernetes distribution.

@ -9,12 +9,12 @@ Typhoon is a minimal and free Kubernetes distribution.

 Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.

-## Features
+## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.9.6 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
-* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
+* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)

 ## Docs

--- a/digital-ocean/container-linux/kubernetes/bootkube.tf
+++ b/digital-ocean/container-linux/kubernetes/bootkube.tf
@ -1,13 +1,14 @@
 # Self-hosted Kubernetes assets (kubeconfig, manifests)
 module "bootkube" {
-  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
+  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=88b361207d42ec3121930a4add6b64ba7cf18360"

-  cluster_name = "${var.cluster_name}"
-  api_servers  = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
-  etcd_servers = "${digitalocean_record.etcds.*.fqdn}"
-  asset_dir    = "${var.asset_dir}"
-  networking   = "${var.networking}"
-  network_mtu  = 1440
-  pod_cidr     = "${var.pod_cidr}"
-  service_cidr = "${var.service_cidr}"
+  cluster_name          = "${var.cluster_name}"
+  api_servers           = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
+  etcd_servers          = "${digitalocean_record.etcds.*.fqdn}"
+  asset_dir             = "${var.asset_dir}"
+  networking            = "${var.networking}"
+  network_mtu           = 1440
+  pod_cidr              = "${var.pod_cidr}"
+  service_cidr          = "${var.service_cidr}"
+  cluster_domain_suffix = "${var.cluster_domain_suffix}"
 }
--- a/digital-ocean/container-linux/kubernetes/cl/controller.yaml.tmpl
+++ b/digital-ocean/container-linux/kubernetes/cl/controller.yaml.tmpl
@ -7,7 +7,7 @@ systemd:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
-            Environment="ETCD_IMAGE_TAG=v3.2.0"
+            Environment="ETCD_IMAGE_TAG=v3.3.2"
            Environment="ETCD_NAME=${etcd_name}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -50,10 +50,11 @@ systemd:
        ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
        [Install]
        RequiredBy=kubelet.service
+        RequiredBy=etcd-member.service
    - name: kubelet.service
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Requires=coreos-metadata.service
        After=coreos-metadata.service
        Wants=rpc-statd.service
@ -76,6 +77,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -83,7 +85,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
@ -91,8 +93,10 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/master \
+          --node-labels=node-role.kubernetes.io/controller="true" \
          --pod-manifest-path=/etc/kubernetes/manifests \
-          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule
+          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=10
@ -119,7 +123,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.6
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -138,11 +142,9 @@ storage:
          # Wrapper for bootkube start
          set -e
          # Move experimental manifests
-          [ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
-          [ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
-          [ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
+          [ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
          BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
-          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
+          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.11.0}"
          BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
          exec /usr/bin/rkt run \
            --trust-keys-from-https \
--- a/digital-ocean/container-linux/kubernetes/cl/worker.yaml.tmpl
+++ b/digital-ocean/container-linux/kubernetes/cl/worker.yaml.tmpl
@ -30,7 +30,7 @@ systemd:
    - name: kubelet.service
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Requires=coreos-metadata.service
        After=coreos-metadata.service
        Wants=rpc-statd.service
@ -53,6 +53,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -60,7 +61,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
@ -68,7 +69,8 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/node \
-          --pod-manifest-path=/etc/kubernetes/manifests
+          --pod-manifest-path=/etc/kubernetes/manifests \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=5
@ -94,7 +96,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.6
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -112,7 +114,7 @@ storage:
            --volume config,kind=host,source=/etc/kubernetes \
            --mount volume=config,target=/etc/kubernetes \
            --insecure-options=image \
-            docker://gcr.io/google_containers/hyperkube:v1.8.3 \
+            docker://gcr.io/google_containers/hyperkube:v1.9.6 \
            --net=host \
            --dns=host \
            --exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
--- a/digital-ocean/container-linux/kubernetes/controllers.tf
+++ b/digital-ocean/container-linux/kubernetes/controllers.tf
@ -45,7 +45,7 @@ resource "digitalocean_droplet" "controllers" {
  private_networking = true

  user_data = "${element(data.ct_config.controller_ign.*.rendered, count.index)}"
-  ssh_keys  = "${var.ssh_fingerprints}"
+  ssh_keys  = ["${var.ssh_fingerprints}"]

  tags = [
    "${digitalocean_tag.controllers.id}",
@ -69,8 +69,9 @@ data "template_file" "controller_config" {
    etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"

    # etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
-    etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
-    k8s_dns_service_ip   = "${cidrhost(var.service_cidr, 10)}"
+    etcd_initial_cluster  = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
+    k8s_dns_service_ip    = "${cidrhost(var.service_cidr, 10)}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
  }
 }

@ -89,4 +90,6 @@ data "ct_config" "controller_ign" {
  count        = "${var.controller_count}"
  content      = "${element(data.template_file.controller_config.*.rendered, count.index)}"
  pretty_print = false
+
+  snippets = ["${var.controller_clc_snippets}"]
 }
--- a/digital-ocean/container-linux/kubernetes/network.tf
+++ b/digital-ocean/container-linux/kubernetes/network.tf
@ -22,12 +22,12 @@ resource "digitalocean_firewall" "rules" {
    },
    {
      protocol    = "udp"
-      port_range  = "all"
+      port_range  = "1-65535"
      source_tags = ["${digitalocean_tag.controllers.name}", "${digitalocean_tag.workers.name}"]
    },
    {
      protocol    = "tcp"
-      port_range  = "all"
+      port_range  = "1-65535"
      source_tags = ["${digitalocean_tag.controllers.name}", "${digitalocean_tag.workers.name}"]
    },
  ]
@ -35,17 +35,18 @@ resource "digitalocean_firewall" "rules" {
  # allow all outbound traffic
  outbound_rule = [
    {
-      protocol              = "icmp"
+      protocol              = "tcp"
+      port_range            = "1-65535"
      destination_addresses = ["0.0.0.0/0", "::/0"]
    },
    {
      protocol              = "udp"
-      port_range            = "all"
+      port_range            = "1-65535"
      destination_addresses = ["0.0.0.0/0", "::/0"]
    },
    {
-      protocol              = "tcp"
-      port_range            = "all"
+      protocol              = "icmp"
+      port_range            = "1-65535"
      destination_addresses = ["0.0.0.0/0", "::/0"]
    },
  ]
--- a/digital-ocean/container-linux/kubernetes/require.tf
+++ b/digital-ocean/container-linux/kubernetes/require.tf
@ -5,7 +5,7 @@ terraform {
 }

 provider "digitalocean" {
-  version = "0.1.2"
+  version = "~> 0.1.2"
 }

 provider "local" {
--- a/digital-ocean/container-linux/kubernetes/variables.tf
+++ b/digital-ocean/container-linux/kubernetes/variables.tf
@ -27,8 +27,8 @@ variable "controller_count" {

 variable "controller_type" {
  type        = "string"
-  default     = "2gb"
-  description = "Digital Ocean droplet size (e.g. 2gb (min), 4gb, 8gb)."
+  default     = "s-2vcpu-2gb"
+  description = "Digital Ocean droplet size (e.g. s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb)."
 }

 variable "worker_count" {
@ -39,8 +39,8 @@ variable "worker_count" {

 variable "worker_type" {
  type        = "string"
-  default     = "512mb"
-  description = "Digital Ocean droplet size (e.g. 512mb, 1gb, 2gb, 4gb)"
+  default     = "s-1vcpu-1gb"
+  description = "Digital Ocean droplet size (e.g. s-1vcpu-1gb, s-1vcpu-2gb, s-2vcpu-2gb)"
 }

 variable "ssh_fingerprints" {
@ -48,6 +48,18 @@ variable "ssh_fingerprints" {
  description = "SSH public key fingerprints. (e.g. see `ssh-add -l -E md5`)"
 }

+variable "controller_clc_snippets" {
+  type        = "list"
+  description = "Controller Container Linux Config snippets"
+  default     = []
+}
+
+variable "worker_clc_snippets" {
+  type        = "list"
+  description = "Worker Container Linux Config snippets"
+  default     = []
+}
+
 # bootkube assets

 variable "asset_dir" {
@ -76,3 +88,9 @@ EOD
  type    = "string"
  default = "10.3.0.0/16"
 }
+
+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
--- a/digital-ocean/container-linux/kubernetes/workers.tf
+++ b/digital-ocean/container-linux/kubernetes/workers.tf
@ -26,7 +26,7 @@ resource "digitalocean_droplet" "workers" {
  private_networking = true

  user_data = "${data.ct_config.worker_ign.rendered}"
-  ssh_keys  = "${var.ssh_fingerprints}"
+  ssh_keys  = ["${var.ssh_fingerprints}"]

  tags = [
    "${digitalocean_tag.workers.id}",
@ -43,12 +43,13 @@ data "template_file" "worker_config" {
  template = "${file("${path.module}/cl/worker.yaml.tmpl")}"

  vars = {
-    k8s_dns_service_ip  = "${cidrhost(var.service_cidr, 10)}"
-    k8s_etcd_service_ip = "${cidrhost(var.service_cidr, 15)}"
+    k8s_dns_service_ip    = "${cidrhost(var.service_cidr, 10)}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
  }
 }

 data "ct_config" "worker_ign" {
  content      = "${data.template_file.worker_config.rendered}"
  pretty_print = false
+  snippets     = ["${var.worker_clc_snippets}"]
 }
--- a/docs/addons/cluo.md
+++ b/docs/addons/cluo.md
@ -12,13 +12,13 @@ kubectl apply -f addons/cluo -R

 ## Usage

-`update-agent` runs as a DaemonSet and annotates a node when `update-engine.service` indiates an update has been installed and a reboot is needed. It also adds additional labels and annotations to nodes.
+`update-agent` runs as a DaemonSet and annotates a node when `update-engine.service` indicates an update has been installed and a reboot is needed. It also adds additional labels and annotations to nodes.

 ```
 $ kubectl get nodes --show-labels
 ...
 container-linux-update.v1.coreos.com/group=stable
-container-linux-update.v1.coreos.com/version=1465.6.0
+container-linux-update.v1.coreos.com/version=1632.3.0
 ```

 `update-operator` ensures one node reboots at a time and that pods are drained prior to reboot.
--- a/docs/addons/dashboard.md
+++ b/docs/addons/dashboard.md
@ -1,24 +0,0 @@
-# Kubernetes Dashboard
-
-The Kubernetes [Dashboard](https://github.com/kubernetes/dashboard) provides a web UI to manage a Kubernetes cluster for those who prefer an alternative to `kubectl`.
-
-## Create
-
-Create the dashboard deployment and service.
-
-```
-kubectl apply -f addons/dashboard -R
-```
-
-## Access
-
-Use `kubectl` to authenticate to the apiserver and create a local port forward to the remote port on the dashboard pod.
-
-```sh
-kubectl get pods -n kube-system
-kubectl port-forward POD [LOCAL_PORT:]REMOTE_PORT
-kubectl port-forward kubernetes-dashboard-id 9090 -n kube-system
-```
-
-!!! tip
-    If you'd like to expose the Dashboard via Ingress and add authentication, use a suitable OAuth2 proxy sidecar and pick your favorite OAuth2 provider.
--- a/docs/addons/grafana.md
+++ b/docs/addons/grafana.md
@ -0,0 +1,20 @@
+## Grafana
+
+Grafana can be used to build dashboards and visualizations that use Prometheus as the datasource. Create the grafana deployment and service.
+
+```
+kubectl apply -f addons/grafana -R
+```
+
+Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Grafana pod.
+
+```
+kubectl port-forward grafana-POD-ID 8080 -n monitoring
+```
+
+Visit [127.0.0.1:8080](http://127.0.0.1:8080) to view the bundled dashboards.
+
+![Grafana Capacity Planning](/img/grafana-capacity.png)
+![Grafana Control Plane](/img/grafana-control-plane.png)
+![Grafana Node View](/img/grafana-node.png)
+
--- a/docs/addons/heapster.md
+++ b/docs/addons/heapster.md
@ -1,6 +1,6 @@
 # Heapster

-[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command and Kubernetes dashbard graphs.
+[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command and Kubernetes dashboard graphs.

 ## Create

--- a/docs/addons/overview.md
+++ b/docs/addons/overview.md
@ -6,6 +6,5 @@ Every Typhoon cluster is verified to work well with several post-install addons.
 * Nginx [Ingress Controller](ingress.md)
 * [Heapster](heapster.md)
 * [Prometheus](prometheus.md)
-* [Grafana](prometheus.md#grafana)
-* Kubernetes [Dashboard](dashboard.md)
+* [Grafana](grafana.md)

--- a/docs/addons/prometheus.md
+++ b/docs/addons/prometheus.md
@ -20,7 +20,7 @@ On Kubernetes clusters, Prometheus is run as a Deployment, configured with a Con
 kubectl apply -f addons/prometheus -R
 ```

-The ConfigMap configures Prometheus to target apiserver endpoints, node metrics, cAdvisor metrics, and exporters. By default, data is kept in an `emptyDir` so it is persisted until the pod is rescheduled.
+The ConfigMap configures Prometheus to discover apiservers, kubelets, cAdvisor, services, endpoints, and exporters. By default, data is kept in an `emptyDir` so it is persisted until the pod is rescheduled.

 ### Exporters

@ -32,7 +32,7 @@ Exporters expose metrics for 3rd-party systems that don't natively expose Promet

 ### Queries and Alerts

-Prometheus provides a simplistic UI for querying metrics and viewing alerts. Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Prometheus pod.
+Prometheus provides a basic UI for querying metrics and viewing alerts. Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Prometheus pod.

 ```
 kubectl get pods -n monitoring
@ -47,21 +47,4 @@ Visit [127.0.0.1:9090](http://127.0.0.1:9090) to query [expressions](http://127.
 <br/>
 ![Prometheus Alerts](/img/prometheus-alerts.png)

-## Grafana
-
-Grafana can be used to build dashboards and rich visualizations that use Prometheus as the datasource. Create the grafana deployment and service.
-
-```
-kubectl apply -f addons/grafana -R
-```
-
-Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Grafana pod.
-
-```
-kubectl port-forward grafana-POD-ID 8080 -n monitoring
-```
-
-Visit [127.0.0.1:8080](http://127.0.0.1:8080), add the prometheus data-source (http://prometheus.monitoring.svc.cluster.local), and import your desired dashboard (e.g. 315).
-
-![Grafana Dashboard](/img/grafana-dashboard.png)
-
+Use [Grafana](/addons/grafana.md) to view or build dashboards that use Prometheus as the datasource.
--- a/docs/advanced/customization.md
+++ b/docs/advanced/customization.md
@ -1,6 +1,130 @@
 # Customization

-To customize clusters in ways that aren't supported by input variables, fork the repo and make changes to the Terraform module. Stay tuned for improvements to this strategy since it is beneficial to stay close to this upstream.
+Typhoon provides minimal Kubernetes clusters with defaults we recommend for production. Terraform variables provide easy to use and supported customizations for clusters. Advanced options are available for customizing the architecture or hosts.
+
+## Variables
+
+Typhoon modules accept Terraform input variables for customizing clusters in meritorious ways (e.g. `worker_count`, etc). Variables are carefully considered to provide essentials, while limiting complexity and test matrix burden. See each platform's tutorial for options.
+
+## Addons
+
+Clusters are kept to a minimal Kubernetes control plane by offering components like Nginx Ingress Controller, Prometheus, Grafana, and Heapster as optional post-install [addons](https://github.com/poseidon/typhoon/tree/master/addons). Customize addons by modifying a copy of our addon manifests.
+
+## Hosts
+
+### Container Linux
+
+!!! danger
+    Container Linux Configs provide powerful host customization abilities. You are responsible for the additional configs defined for hosts.
+
+Container Linux Configs (CLCs) declare how a Container Linux instance's disk should be provisioned on first boot from disk. CLCs define disk partitions, filesystems, files, systemd units, dropins, networkd configs, mount units, raid arrays, and users. Typhoon creates controller and worker instances with base Container Linux Configs to create a minimal, secure Kubernetes cluster on each platform.
+
+Typhoon AWS, Google Cloud, and Digital Ocean give users the ability to provide CLC *snippets* - valid Container Linux Configs that are validated and additively merged into the Typhoon base config during `terraform plan`. This allows advanced host customizations and experimentation.
+
+#### Examples
+
+Container Linux [docs](https://coreos.com/os/docs/latest/clc-examples.html) show many simple config examples. Ensure a file `/opt/hello` is created with permissions 0644. 
+
+```
+# custom-files
+storage:
+  files:
+    - path: /opt/hello
+      filesystem: root
+      contents:
+        inline: |
+          Hello World
+      mode: 0644
+```
+
+Ensure a systemd unit `hello.service` is created and a dropin `50-etcd-cluster.conf` is added for `etcd-member.service`.
+
+```
+# custom-units
+systemd:
+  units:
+    - name: hello.service
+      enable: true
+      contents: |
+        [Unit]
+        Description=Hello World
+        [Service]
+        Type=oneshot
+        ExecStart=/usr/bin/echo Hello World!
+        [Install]
+        WantedBy=multi-user.target
+    - name: etcd-member.service
+      enable: true
+      dropins:
+        - name: 50-etcd-cluster.conf
+          contents: |
+            Environment="ETCD_LOG_PACKAGE_LEVELS=etcdserver=WARNING,security=DEBUG"
+```
+
+#### Specification
+
+View the Container Linux Config [format](https://coreos.com/os/docs/1576.4.0/configuration.html) to read about each field.
+
+#### Usage
+
+Write Container Linux Configs *snippets* as files in the repository where you keep Terraform configs for clusters (perhaps in a `clc` or `snippets` subdirectory). You may organize snippets in multiple files as desired, provided they are each valid.
+
+Define an [AWS](https://typhoon.psdn.io/aws/#cluster), [Google Cloud](https://typhoon.psdn.io/google-cloud/#cluster), or [Digital Ocean](https://typhoon.psdn.io/digital-ocean/#cluster) cluster and fill in the optional `controller_clc_snippets` or `worker_clc_snippets` fields.
+
+```
+module "digital-ocean-nemo" {
+  ...
+
+  controller_count        = 1
+  worker_count            = 2
+  controller_clc_snippets = [
+    "${file("./custom-files")}",
+    "${file("./custom-units")}",
+  ]
+  worker_clc_snippets = [
+    "${file("./custom-files")}",
+    "${file("./custom-units")}",
+  ]
+  ...
+}
+```
+
+Plan the resources to be created.
+
+```
+$ terraform plan
+Plan: 54 to add, 0 to change, 0 to destroy.
+```
+
+Most syntax errors in CLCs can be caught during planning. For example, mangle the indentation in one of the CLC files:
+
+```
+$ terraform plan
+...
+error parsing Container Linux Config: error: yaml: line 3: did not find expected '-' indicator
+```
+
+Undo the mangle. Apply the changes to create the cluster per the tutorial.
+
+```
+$ terraform apply
+```
+
+Container Linux Configs (and the CoreOS Ignition system) create immutable infrastructure. Disk provisioning is performed only on first boot from disk. That means if you change a snippet used by an instance, Terraform will (correctly) try to destroy and recreate that instance. Be careful!
+
+!!! danger
+    Destroying and recreating controller instances is destructive! etcd runs on controller instances and stores data there. Do not modify controller snippets. See [blue/green](https://typhoon.psdn.io/topics/maintenance/#upgrades) clusters.
+
+## Architecture
+
+To customize clusters in ways that aren't supported by input variables, fork Typhoon and maintain a repository with customizations. Reference the repository by changing the username.
+
+```
+module "digital-ocean-nemo" {
+  source = "git::https://github.com/USERNAME/typhoon//digital-ocean/container-linux/kubernetes?ref=myspecialcase"
+  ...
+}
+```

 To customize lower-level Kubernetes control plane bootstrapping, see the [poseidon/bootkube-terraform](https://github.com/poseidon/bootkube-terraform) Terraform module.

--- a/docs/advanced/overview.md
+++ b/docs/advanced/overview.md
@ -0,0 +1,6 @@
+# Advanced
+
+Typhoon clusters offer several advanced features for skilled users.
+
+* [Customization](customization.md)
+* [Worker Pools](worker-pools.md)
--- a/docs/advanced/worker-pools.md
+++ b/docs/advanced/worker-pools.md
@ -0,0 +1,149 @@
+# Worker Pools
+
+Typhoon AWS and Google Cloud allow additional groups of workers to be defined and joined to a cluster. For example, add worker pools of instances with different types, disk sizes, Container Linux channels, or preemptibility modes.
+
+Internal Terraform Modules:
+
+* `aws/container-linux/kubernetes/workers`
+* `google-cloud/container-linux/kubernetes/workers`
+
+## AWS
+
+Create a cluster following the AWS [tutorial](../aws.md#cluster). Define a worker pool using the AWS internal `workers` module.
+
+```tf
+module "tempest-worker-pool" {
+  source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes/workers?ref=v1.9.6"
+  
+  providers = {
+    aws = "aws.default"
+  }
+
+  # AWS
+  vpc_id          = "${module.aws-tempest.vpc_id}"
+  subnet_ids      = "${module.aws-tempest.subnet_ids}"
+  security_groups = "${module.aws-tempest.worker_security_groups}"
+  
+  # configuration
+  name               = "tempest-worker-pool"
+  kubeconfig         = "${module.aws-tempest.kubeconfig}"
+  ssh_authorized_key = "${var.ssh_authorized_key}"
+
+  count         = 2
+  instance_type = "m5.large"
+  os_channel    = "beta"    
+}
+```
+
+Apply the change.
+
+```
+terraform apply
+```
+
+Verify an auto-scaling group of workers join the cluster within a few minutes.
+
+### Variables
+
+The AWS internal `workers` module supports a number of [variables](https://github.com/poseidon/typhoon/blob/master/aws/container-linux/kubernetes/workers/variables.tf).
+
+#### Required
+
+| Name | Description | Example |
+|:-----|:------------|:--------|
+| vpc_id | Must be set to `vpc_id` output by cluster | "${module.cluster.vpc_id}" |
+| subnet_ids | Must be set to `subnet_ids` output by cluster | "${module.cluster.subnet_ids}" |
+| security_groups | Must be set to `worker_security_groups` output by cluster | "${module.cluster.worker_security_groups}" |
+| name | Unique name (distinct from cluster name) | "tempest-m5s" |
+| kubeconfig | Must be set to `kubeconfig` output by cluster | "${module.cluster.kubeconfig}" |
+| ssh_authorized_key | SSH public key for ~/.ssh_authorized_keys | "ssh-rsa AAAAB3NZ..." |
+
+#### Optional
+
+| Name | Description | Default | Example |
+|:-----|:------------|:--------|:--------|
+| count | Number of instances | 1 | 3 |
+| instance_type | EC2 instance type | "t2.small" | "t2.medium" |
+| os_channel | Container Linux AMI channel | stable| "beta", "alpha" |
+| disk_size | Size of the disk in GB | 40 | 100 |
+| service_cidr | Must match `service_cidr` of cluster | "10.3.0.0/16" | "10.3.0.0/24" |
+| cluster_domain_suffix | Must match `cluster_domain_suffix` of cluster | "cluster.local" | "k8s.example.com" |
+
+Check the list of valid [instance types](https://aws.amazon.com/ec2/instance-types/).
+
+## Google Cloud
+
+Create a cluster following the Google Cloud [tutorial](../google-cloud.md#cluster). Define a worker pool using the Google Cloud internal `workers` module.
+
+```tf
+module "yavin-worker-pool" {
+  source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes/workers?ref=v1.9.6"
+
+  providers = {
+    google = "google.default"
+  }
+
+  # Google Cloud
+  region       = "us-central1"
+  network      = "${module.google-cloud-yavin.network_name}"
+  cluster_name = "yavin"
+
+  # configuration
+  name               = "yavin-16x"
+  kubeconfig         = "${module.google-cloud-yavin.kubeconfig}"
+  ssh_authorized_key = "${var.ssh_authorized_key}"
+  
+  count        = 2
+  machine_type = "n1-standard-16"
+  os_image     = "coreos-beta"
+  preemptible  = true
+}
+```
+
+Apply the change.
+
+```
+terraform apply
+```
+
+Verify a managed instance group of workers joins the cluster within a few minutes.
+
+```
+$ kubectl get nodes
+NAME                                             STATUS   AGE    VERSION
+yavin-controller-0.c.example-com.internal        Ready    6m     v1.9.6
+yavin-worker-jrbf.c.example-com.internal         Ready    5m     v1.9.6
+yavin-worker-mzdm.c.example-com.internal         Ready    5m     v1.9.6
+yavin-16x-worker-jrbf.c.example-com.internal     Ready    3m     v1.9.6
+yavin-16x-worker-mzdm.c.example-com.internal     Ready    3m     v1.9.6
+```
+
+### Variables
+
+The Google Cloud internal `workers` module supports a number of [variables](https://github.com/poseidon/typhoon/blob/master/google-cloud/container-linux/kubernetes/workers/variables.tf).
+
+#### Required
+
+| Name | Description | Example |
+|:-----|:------------|:--------|
+| region | Must be set to `region` of cluster | "us-central1" |
+| network | Must be set to `network_name` output by cluster | "${module.cluster.network_name}" |
+| name | Unique name (distinct from cluster name) | "yavin-16x" |
+| cluster_name | Must be set to `cluster_name` of cluster | "yavin" |
+| kubeconfig | Must be set to `kubeconfig` output by cluster | "${module.cluster.kubeconfig}" |
+| ssh_authorized_key | SSH public key for ~/.ssh_authorized_keys | "ssh-rsa AAAAB3NZ..." |
+
+#### Optional
+
+| Name | Description | Default | Example |
+|:-----|:------------|:--------|:--------|
+| count | Number of instances | 1 | 3 |
+| machine_type | Compute instance machine type | "n1-standard-1" | See below |
+| os_image | OS image for compute instances | "coreos-stable" | "coreos-alpha", "coreos-beta" |
+| disk_size | Size of the disk in GB | 40 | 100 |
+| preemptible | If true, Compute Engine will terminate instances randomly within 24 hours | false | true |
+| service_cidr | Must match `service_cidr` of cluster | "10.3.0.0/16" | "10.3.0.0/24" |
+| cluster_domain_suffix | Must match `cluster_domain_suffix` of cluster | "cluster.local" | "k8s.example.com" |
+
+Check the list of valid [machine types](https://cloud.google.com/compute/docs/machine-types).
+
--- a/docs/aws.md
+++ b/docs/aws.md
@ -1,6 +1,6 @@
 # AWS

-In this tutorial, we'll create a Kubernetes v1.8.3 cluster on AWS.
+In this tutorial, we'll create a Kubernetes v1.9.6 cluster on AWS.

 We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module. On apply, a VPC, gateway, subnets, auto-scaling groups of controllers and workers, network load balancers for controllers and workers, and security groups will be created.

@ -10,23 +10,23 @@ Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube

 * AWS Account and IAM credentials
 * AWS Route53 DNS Zone (registered Domain Name or delegated subdomain)
-* Terraform v0.10.4+ and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
+* Terraform v0.11.x and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally

 ## Terraform Setup

-Install [Terraform](https://www.terraform.io/downloads.html) v0.10.1 on your system.
+Install [Terraform](https://www.terraform.io/downloads.html) v0.11.x on your system.

 ```sh
 $ terraform version
-Terraform v0.10.7
+Terraform v0.11.1
 ```

 Add the [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) plugin binary for your system.

 ```sh
-wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.2.0/terraform-provider-ct-v0.2.0-linux-amd64.tar.gz
-tar xzf terraform-provider-ct-v0.2.0-linux-amd64.tar.gz
-sudo mv terraform-provider-ct-v0.2.0-linux-amd64/terraform-provider-ct /usr/local/bin/
+wget https://github.com/coreos/terraform-provider-ct/releases/download/v0.2.1/terraform-provider-ct-v0.2.1-linux-amd64.tar.gz
+tar xzf terraform-provider-ct-v0.2.1-linux-amd64.tar.gz
+sudo mv terraform-provider-ct-v0.2.1-linux-amd64/terraform-provider-ct /usr/local/bin/
 ```

 Add the plugin to your `~/.terraformrc`.
@ -57,9 +57,32 @@ Configure the AWS provider to use your access key credentials in a `providers.tf

 ```tf
 provider "aws" {
+  version = "~> 1.5.0"
+  alias   = "default"
+
  region                  = "eu-central-1"
  shared_credentials_file = "/home/user/.config/aws/credentials"
 }
+
+provider "local" {
+  version = "~> 1.0"
+  alias = "default"
+}
+
+provider "null" {
+  version = "~> 1.0"
+  alias = "default"
+}
+
+provider "template" {
+  version = "~> 1.0"
+  alias = "default"
+}
+
+provider "tls" {
+  version = "~> 1.0"
+  alias = "default"
+}
 ```

 Additional configuration options are described in the `aws` provider [docs](https://www.terraform.io/docs/providers/aws/).
@ -73,8 +96,16 @@ Define a Kubernetes cluster using the module `aws/container-linux/kubernetes`.

 ```tf
 module "aws-tempest" {
-  source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes"
+  source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.9.6"

+  providers = {
+    aws = "aws.default"
+    local = "local.default"
+    null = "null.default"
+    template = "template.default"
+    tls = "tls.default"
+  }
+  
  cluster_name = "tempest"

  # AWS
@ -103,7 +134,7 @@ ssh-add -L
 ```

 !!! warning
-    `terrafrom apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
+    `terraform apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.

 ## Apply

@ -119,7 +150,7 @@ Get or update Terraform modules.
 $ terraform get            # downloads missing modules
 $ terraform get --update   # updates all modules
 Get: git::https://github.com/poseidon/typhoon (update)
-Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.8.2 (update)
+Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.11.0 (update)
 ```

 Plan the resources to be created.
@ -148,12 +179,12 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
 [Install kubectl](https://coreos.com/kubernetes/docs/latest/configure-kubectl.html) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.

 ```
-$ KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
+$ export KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
 $ kubectl get nodes
 NAME             STATUS    AGE       VERSION        
-ip-10-0-12-221   Ready     34m       v1.8.3
-ip-10-0-19-112   Ready     34m       v1.8.3
-ip-10-0-4-22     Ready     34m       v1.8.3
+ip-10-0-12-221   Ready     34m       v1.9.6
+ip-10-0-19-112   Ready     34m       v1.9.6
+ip-10-0-4-22     Ready     34m       v1.9.6
 ```

 List the pods.
@ -179,10 +210,10 @@ kube-system   pod-checkpointer-4kxtl-ip-10-0-12-221     1/1    Running   0

 ## Going Further

-Learn about [version pinning](concepts.md#versioning), maintenance, and [addons](addons/overview.md).
+Learn about [maintenance](topics/maintenance.md) and [addons](addons/overview.md).

 !!! note
-    On Container Linux clusters, install the `container-linux-update-operator` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
+    On Container Linux clusters, install the `CLUO` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.

 ## Variables

@ -194,14 +225,13 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]
 | dns_zone | AWS Route53 DNS zone | "aws.example.com" |
 | dns_zone_id | AWS Route53 DNS zone id | "Z3PAABBCFAKEC0" |
 | ssh_authorized_key | SSH public key for ~/.ssh_authorized_keys | "ssh-rsa AAAAB3NZ..." |
-| os_channel | Container Linux AMI channel | stable, beta, alpha |
 | asset_dir | Path to a directory where generated assets should be placed (contains secrets) | "/home/user/.secrets/clusters/tempest" |

 #### DNS Zone

 Clusters create a DNS A record `${cluster_name}.${dns_zone}` to resolve a network load balancer backed by controller instances. This FQDN is used by workers and `kubectl` to access the apiserver. In this example, the cluster's apiserver would be accessible at `tempest.aws.example.com`.

-You'll need a registered domain name or subdomain registered in a AWS Route53 DNS zone. You can set this up once and create many clusters with unqiue names.
+You'll need a registered domain name or subdomain registered in a AWS Route53 DNS zone. You can set this up once and create many clusters with unique names.

 ```tf
 resource "aws_route53_zone" "zone-for-clusters" {
@ -222,12 +252,16 @@ Reference the DNS zone id with `"${aws_route53_zone.zone-for-clusters.zone_id}"`
 | controller_type | Controller EC2 instance type | "t2.small" | "t2.medium" |
 | worker_count | Number of workers | 1 | 3 |
 | worker_type | Worker EC2 instance type | "t2.small" | "t2.medium" |
+| os_channel | Container Linux AMI channel | stable | stable, beta, alpha |
 | disk_size | Size of the EBS volume in GB | "40" | "100" |
 | networking | Choice of networking provider | "calico" | "calico" or "flannel" |
 | network_mtu | CNI interface MTU (calico only) | 1480 | 8981 |
 | host_cidr | CIDR range to assign to EC2 instances | "10.0.0.0/16" | "10.1.0.0/16" |
 | pod_cidr | CIDR range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
-| service_cidr | CIDR range to assgin to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
+| service_cidr | CIDR range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
+| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by kube-dns. | "cluster.local" | "k8s.example.com" |
+| controller_clc_snippets | Controller Container Linux Config snippets | [] | |
+| worker_clc_snippets | Worker Container Linux Config snippets | [] | |

 Check the list of valid [instance types](https://aws.amazon.com/ec2/instance-types/).

--- a/Show More
+++ b/Show More