Update Kubernetes from v1.10.0 to v1.10.1

* Use kubernetes-incubator/bootkube v0.12.0
Refactor GCP to remove controller internal module
2025-08-02 16:41:34 +02:00 · 2018-04-12 20:57:31 -07:00 · 2018-04-12 19:41:51 -07:00 · 2018-04-11 22:23:51 -07:00 · 2018-04-11 22:19:58 -07:00 · 2018-04-09 23:23:18 -05:00
134 changed files with 9584 additions and 1663 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@ -4,7 +4,7 @@

 ### Environment

-* Platform: bare-metal, google-cloud, digital-ocean
+* Platform: aws, bare-metal, google-cloud, digital-ocean
 * OS: container-linux, fedora-cloud
 * Terraform: `terraform version`
 * Plugins: Provider plugin versions
--- a/CHANGES.md
+++ b/CHANGES.md
@ -4,26 +4,188 @@ Notable changes between versions.

 ## Latest

+* Kubernetes [v1.10.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1101)
+* Enable etcd v3.3 metrics endpoint ([#175](https://github.com/poseidon/typhoon/pull/175))
+* Use `k8s.gcr.io` instead of `gcr.io/google_containers` ([#180](https://github.com/poseidon/typhoon/pull/180))
+  * Kubernetes [recommends](https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ) using the alias to pull from the nearest regional mirror and to abstract the backing container registry
+* Update kube-dns from v1.14.8 to v1.14.9
+* Update etcd from v3.3.2 to v3.3.3
+* Use kubernetes-incubator/bootkube v0.12.0
+
+#### Bare-Metal
+
+* Fix need for multiple `terraform apply` runs to create a cluster with Terraform v0.11.4 ([#181](https://github.com/poseidon/typhoon/pull/181))
+  * To SSH during a disk install for debugging, SSH as user "core" with port 2222
+  * Remove the old trick of using a user "debug" during disk install
+
+#### Google Cloud
+
+* Refactor out the `controller` internal module
+
+#### Addons
+
+* Add Prometheus discovery for etcd peers on controller nodes ([#175](https://github.com/poseidon/typhoon/pull/175))
+  * Scrape etcd v3.3 `--listen-metrics-urls` for metrics
+  * Enable etcd alerts and populate the etcd Grafana dashboard
+* Update kube-state-metrics from v1.2.0 to v1.3.0
+
+## v1.10.0
+
+* Kubernetes [v1.10.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1100)
+* Remove unused, unmaintained `pxe-worker` internal module
+
+#### AWS
+
+* Add `disk_type` optional variable for setting the EBS volume type ([#176](https://github.com/poseidon/typhoon/pull/176))
+  * Change default type from `standard` to `gp2`. Prometheus etcd alerts are tuned for fast disks.
+
+#### Digital Ocean
+
+* Ensure etcd secrets are only distributed to controller hosts, not workers.
+* Remove `networking` optional variable. Only flannel works on Digital Ocean.
+
+#### Google Cloud
+
+* Add `disk_size` optional variable for setting instance disk size in GB
+* Add `controller_type` optional variable for setting machine type for controllers
+* Add `worker_type` optional variable for setting machine type for workers
+* Remove `machine_type` optional variable. Use `controller_type` and `worker_type`.
+
+#### Addons
+
+* Update Grafana from v4.6.3 to v5.0.4 ([#153](https://github.com/poseidon/typhoon/pull/153), [#174](https://github.com/poseidon/typhoon/pull/174))
+  * Restrict dashboard organization role to Viewer
+
+## v1.9.6
+
+* Kubernetes [v1.9.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v196)
+* Update Calico from v3.0.3 to v3.0.4
+
+#### Addons
+
+* Update heapster from v1.5.1 to v1.5.2
+
+## v1.9.5
+
+* Kubernetes [v1.9.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v195)
+  * Fix `subPath` volume mounts regression ([kubernetes#61076](https://github.com/kubernetes/kubernetes/issues/61076))
+* Introduce [Container Linux Config snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) on cloud platforms ([#145](https://github.com/poseidon/typhoon/pull/145))
+  * Validate and additively merge custom Container Linux Configs during `terraform plan`
+  * Define files, systemd units, dropins, networkd configs, mounts, users, and more
+  * Require updating `terraform-provider-ct` plugin from v0.2.0 to v0.2.1
+* Add `node-role.kubernetes.io/controller="true"` node label to controllers ([#160](https://github.com/poseidon/typhoon/pull/160))
+
+#### AWS
+
+* [Require](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action required!)
+
+#### Digital Ocean
+
+* [Require](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action required!)
+
+#### Google Cloud
+
+* [Require](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action required!)
+* Relax `os_image` to optional. Default to "coreos-stable".
+
+#### Addons
+
+* Update nginx-ingress from 0.11.0 to 0.12.0
+* Update Prometheus from 2.2.0 to 2.2.1
+
+## v1.9.4
+
+* Kubernetes [v1.9.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v194)
+  * Secret, configMap, downward API, and projected volumes now read-only (breaking, [kubernetes#58720](https://github.com/kubernetes/kubernetes/pull/58720))
+  * Regressed `subPath` volume mounts (regression, [kubernetes#61076](https://github.com/kubernetes/kubernetes/issues/61076))
+  * Mitigated `subPath` [CVE-2017-1002101](https://github.com/kubernetes/kubernetes/issues/60813)
+* Introduce [worker pools](https://typhoon.psdn.io/advanced/worker-pools/) for AWS and Google Cloud for joining heterogeneous workers to existing clusters.
+* Use new Network Load Balancers and cross zone load balancing on AWS
+* Allow flexvolume plugins to be used on any Typhoon cluster (not just bare-metal)
+* Upgrade etcd from v3.2.15 to v3.3.2
+* Update Calico from v3.0.2 to v3.0.3
+* Use kubernetes-incubator/bootkube v0.11.0
+* [Recommend](https://typhoon.psdn.io/topics/maintenance/#terraform-provider-ct-v021) updating `terraform-provider-ct` plugin from v0.2.0 to [v0.2.1](https://github.com/coreos/terraform-provider-ct/releases/tag/v0.2.1) (action recommended)
+
+#### AWS
+
+* Promote AWS platform to stable
+* Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) ([#150](https://github.com/poseidon/typhoon/pull/150))
+* Replace the apiserver elastic load balancer with a network load balancer ([#136](https://github.com/poseidon/typhoon/pull/136))
+* Replace the Ingress elastic load balancer with a network load balancer ([#141](https://github.com/poseidon/typhoon/pull/141))
+  * AWS [NLBs](https://aws.amazon.com/blogs/aws/new-network-load-balancer-effortless-scaling-to-millions-of-requests-per-second/) can handle millions of RPS with high throughput and low latency.
+  * Require `terraform-provider-aws` 1.7.0 or higher
+* Enable NLB [cross-zone](https://aws.amazon.com/about-aws/whats-new/2018/02/network-load-balancer-now-supports-cross-zone-load-balancing/) load balancing ([#159](https://github.com/poseidon/typhoon/pull/159))
+  * Requests are automatically evenly distributed to targets regardless of AZ
+  * Require `terraform-provider-aws` 1.11.0 or higher
+* Add kubelet `--volume-plugin-dir` flag to allow flexvolume plugins ([#142](https://github.com/poseidon/typhoon/pull/142))
+* Fix controller and worker launch configs to ignore AMI changes ([#126](https://github.com/poseidon/typhoon/pull/126), [#158](https://github.com/poseidon/typhoon/pull/158))
+
+#### Digital Ocean
+
+* Add kubelet `--volume-plugin-dir` flag to allow flexvolume plugins ([#142](https://github.com/poseidon/typhoon/pull/142))
+* Fix to pass `ssh_fingerprints` as a list to droplets ([#143](https://github.com/poseidon/typhoon/pull/143))
+
+#### Google Cloud
+
+* Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) ([#148](https://github.com/poseidon/typhoon/pull/148))
+* Add kubelet `--volume-plugin-dir` flag to allow flexvolume plugins ([#142](https://github.com/poseidon/typhoon/pull/142))
+* Add `kubeconfig` variable to `controllers` and `workers` submodules ([#147](https://github.com/poseidon/typhoon/pull/147))
+* Remove `kubeconfig_*` variables from `controllers` and `workers` submodules ([#147](https://github.com/poseidon/typhoon/pull/147))
+* Allow initial experimentation with accelerators (i.e. GPUs) on workers ([#161](https://github.com/poseidon/typhoon/pull/161)) (unofficial)
+  * Require `terraform-provider-google` v1.6.0
+
+#### Addons
+
+* Update Prometheus from 2.1.0 to 2.2.0 ([#153](https://github.com/poseidon/typhoon/pull/153))
+  * Scrape Prometheus itself to enable alerts about Prometheus itself
+  * Adjust KubeletDown rule to fire when 10% of kubelets are down
+* Update heapster from v1.5.0 to v1.5.1 ([#131](https://github.com/poseidon/typhoon/pull/131))
+  * Use separate service account
+* Update nginx-ingress from 0.10.2 to 0.11.0
+
+## v1.9.3
+
+* Kubernetes [v1.9.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v193)
+* Network improvements and fixes ([#104](https://github.com/poseidon/typhoon/pull/104))
+  * Switch from Calico v2.6.6 to v3.0.2
+  * Add Calico GlobalNetworkSet CRD
+  * Update flannel from v0.9.0 to v0.10.0
+  * Use separate service account for flannel
+* Update etcd from v3.2.14 to v3.2.15
+
+#### Digital Ocean
+
+* Use new Droplet [types](https://developers.digitalocean.com/documentation/changelog/api-v2/new-size-slugs-for-droplet-plan-changes/) which offer more CPU/memory, at lower cost. ([#105](https://github.com/poseidon/typhoon/pull/105))
+  * A small Digital Ocean cluster costs less than $25 a month!
+
+#### Addons
+
+* Update Prometheus from v2.0.0 to v2.1.0 ([#113](https://github.com/poseidon/typhoon/pull/113))
+  * Improve alerting rules
+  * Relabel discovered kubelet, endpoint, service, and apiserver scrapes
+  * Use separate service accounts
+  * Update node-exporter and kube-state-metrics
+* Include Grafana dashboards for Kubernetes admins ([#113](https://github.com/poseidon/typhoon/pull/113))
+  * Add grafana-watcher to load bundled upstream dashboards
+* Update nginx-ingress from 0.9.0 to 0.10.2
+* Update CLUO from v0.5.0 to v0.6.0
+* Switch manifests to use `apps/v1` Deployments and Daemonsets ([#120](https://github.com/poseidon/typhoon/pull/120))
+* Remove Kubernetes Dashboard manifests ([#121](https://github.com/poseidon/typhoon/pull/121))
+
 ## v1.9.2

 * Kubernetes [v1.9.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v192)
 * Add Terraform v0.11.x support
  * Add explicit "providers" section to modules for Terraform v0.11.x
  * Retain support for Terraform v0.10.4+
-* Add [migration guide](https://github.com/poseidon/typhoon/blob/master/docs/topics/maintenance.md) from Terraform v0.10.x to v0.11.x (**action required!**)
+* Add [migration guide](https://typhoon.psdn.io/topics/maintenance/#terraform-v011x) from Terraform v0.10.x to v0.11.x (**action required!**)
 * Update etcd from 3.2.13 to 3.2.14
 * Update calico from 2.6.5 to 2.6.6
 * Update kube-dns from v1.14.7 to v1.14.8
 * Use separate service account for kube-dns
 * Use kubernetes-incubator/bootkube v0.10.0

-#### Addons
-
-* Update CLUO to v0.5.0 to fix compatibility with Kubernetes 1.9 (**important**)
-  * Earlier versions can't roll out Container Linux updates on Kubernetes 1.9 nodes ([cluo#163](https://github.com/coreos/container-linux-update-operator/issues/163))
-* Update kube-state-metrics from v1.1.0 to v1.2.0
-* Fix RBAC cluster role for kube-state-metrics
-
 #### Bare-Metal

 * Use per-node Container Linux install profiles ([#97](https://github.com/poseidon/typhoon/pull/97))
@ -35,6 +197,13 @@ Notable changes between versions.
 * Relax `digitalocean` provider version constraint
 * Fix bug with `terraform plan` always showing a firewall diff to be applied ([#3](https://github.com/poseidon/typhoon/issues/3))

+#### Addons
+
+* Update CLUO to v0.5.0 to fix compatibility with Kubernetes 1.9 (**important**)
+  * Earlier versions can't roll out Container Linux updates on Kubernetes 1.9 nodes ([cluo#163](https://github.com/coreos/container-linux-update-operator/issues/163))
+* Update kube-state-metrics from v1.1.0 to v1.2.0
+* Fix RBAC cluster role for kube-state-metrics
+
 ## v1.9.1

 * Kubernetes [v1.9.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v191)
--- a/README.md
+++ b/README.md
@ -11,10 +11,11 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster

 ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.9.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.10.1 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
-* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
+* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/) and [preemption](https://typhoon.psdn.io/google-cloud/#preemption) (varies by platform)
+* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)

 ## Modules

@ -22,7 +23,7 @@ Typhoon provides a Terraform Module for each supported operating system and plat

 | Platform      | Operating System | Terraform Module | Status |
 |---------------|------------------|------------------|--------|
-| AWS           | Container Linux  | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | beta |
+| AWS           | Container Linux  | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable |
 | Bare-Metal    | Container Linux  | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
 | Digital Ocean | Container Linux  | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta |
 | Google Cloud  | Container Linux  | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | beta |
@ -43,29 +44,28 @@ Define a Kubernetes cluster by using the Terraform module for your chosen platfo

 ```tf
 module "google-cloud-yavin" {
-  source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes"
+  source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.10.1"
  
  providers = {
-    google = "google.default"
-    local = "local.default"
-    null = "null.default"
+    google   = "google.default"
+    local    = "local.default"
+    null     = "null.default"
    template = "template.default"
-    tls = "tls.default"
+    tls      = "tls.default"
  }

  # Google Cloud
+  cluster_name  = "yavin"
  region        = "us-central1"
  dns_zone      = "example.com"
  dns_zone_name = "example-zone"
-  os_image      = "coreos-stable-1576-5-0-v20180105"

-  cluster_name       = "yavin"
-  controller_count   = 1
-  worker_count       = 2
+  # configuration
  ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
-
-  # output assets dir
-  asset_dir = "/home/user/.secrets/clusters/yavin"
+  asset_dir          = "/home/user/.secrets/clusters/yavin"
+  
+  # optional
+  worker_count = 2
 }
 ```

@ -86,9 +86,9 @@ In 4-8 minutes (varies by platform), the cluster will be ready. This Google Clou
 $ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
 $ kubectl get nodes
 NAME                                          STATUS   AGE    VERSION
-yavin-controller-0.c.example-com.internal     Ready    6m     v1.9.2
-yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.9.2
-yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.9.2
+yavin-controller-0.c.example-com.internal     Ready    6m     v1.10.1
+yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.10.1
+yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.10.1
 ```

 List the pods.
@ -123,11 +123,11 @@ Typhoon is strict about minimalism, maturity, and scope. These are not in scope:

 Ask questions on the IRC #typhoon channel on [freenode.net](http://freenode.net/).

-## Background
+## Motivation

 Typhoon powers the author's cloud and colocation clusters. The project has evolved through operational experience and Kubernetes changes. Typhoon is shared under a free license to allow others to use the work freely and contribute to its upkeep.

-Typhoon addresses real world needs, which you may share. It is honest about limitations or areas that aren't mature yet. It avoids buzzword bingo and hype. It does not aim to be the one-solution-fits-all distro. An ecosystem of free (or enterprise) Kubernetes distros is healthy.
+Typhoon addresses real world needs, which you may share. It is honest about limitations or areas that aren't mature yet. It avoids buzzword bingo and hype. It does not aim to be the one-solution-fits-all distro. An ecosystem of Kubernetes distributions is healthy.

 ## Social Contract

@ -135,4 +135,6 @@ Typhoon is not a product, trial, or free-tier. It is not run by a company, does

 Typhoon clusters will contain only [free](https://www.debian.org/intro/free) components. Cluster components will not collect data on users without their permission.

-*Disclosure: The author works for CoreOS and previously wrote Matchbox and original Tectonic for bare-metal and AWS. This project is not associated with CoreOS.*
+## Donations
+
+Typhoon does not accept money donations. Instead, we encourage you to donate to one of [these organizations](https://github.com/poseidon/typhoon/wiki/Donations) to show your appreciation.
--- a/addons/cluo/0-namespace.yaml
+++ b/addons/cluo/0-namespace.yaml
--- a/addons/cluo/cluster-role-binding.yaml
+++ b/addons/cluo/cluster-role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: reboot-coordinator
 roleRef:
--- a/addons/cluo/cluster-role.yaml
+++ b/addons/cluo/cluster-role.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: reboot-coordinator
--- a/addons/cluo/update-agent.yaml
+++ b/addons/cluo/update-agent.yaml
@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: DaemonSet
 metadata:
  name: container-linux-update-agent
@ -8,6 +8,9 @@ spec:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
+  selector:
+    matchLabels:
+      app: container-linux-update-agent
  template:
    metadata:
      labels:
@ -15,7 +18,7 @@ spec:
    spec:
      containers:
      - name: update-agent
-        image: quay.io/coreos/container-linux-update-operator:v0.5.0
+        image: quay.io/coreos/container-linux-update-operator:v0.6.0
        command:
        - "/bin/update-agent"
        volumeMounts:
--- a/addons/cluo/update-operator.yaml
+++ b/addons/cluo/update-operator.yaml
@ -1,10 +1,13 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: container-linux-update-operator
  namespace: reboot-coordinator
 spec:
  replicas: 1
+  selector:
+    matchLabels:
+      app: container-linux-update-operator
  template:
    metadata:
      labels:
@ -12,7 +15,7 @@ spec:
    spec:
      containers:
      - name: update-operator
-        image: quay.io/coreos/container-linux-update-operator:v0.5.0
+        image: quay.io/coreos/container-linux-update-operator:v0.6.0
        command:
        - "/bin/update-operator"
        env:
--- a/addons/dashboard/deployment.yaml
+++ b/addons/dashboard/deployment.yaml
@ -1,32 +0,0 @@
-apiVersion: extensions/v1beta1
-kind: Deployment
-metadata:
-  name: kubernetes-dashboard
-  namespace: kube-system
-spec:
-  replicas: 1
-  template:
-    metadata:
-      labels:
-        name: kubernetes-dashboard
-        phase: prod
-    spec:
-      containers:
-        - name: kubernetes-dashboard
-          image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.1
-          ports:
-            - name: http
-              containerPort: 9090
-          resources:
-            limits:
-              cpu: 100m
-              memory: 300Mi
-            requests:
-              cpu: 100m
-              memory: 100Mi
-          livenessProbe:
-            httpGet:
-              path: /
-              port: 9090
-            initialDelaySeconds: 30
-            timeoutSeconds: 30
--- a/addons/dashboard/service.yaml
+++ b/addons/dashboard/service.yaml
@ -1,15 +0,0 @@
-apiVersion: v1
-kind: Service
-metadata:
-  name: kubernetes-dashboard
-  namespace: kube-system
-spec:
-  type: ClusterIP
-  selector:
-    name: kubernetes-dashboard
-    phase: prod
-  ports:
-    - name: http
-      protocol: TCP
-      port: 80
-      targetPort: 9090
--- a/addons/grafana/dashboard-providers.yaml
+++ b/addons/grafana/dashboard-providers.yaml
@ -0,0 +1,15 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grafana-dashboard-providers
+  namespace: monitoring
+data:
+  dashboard-providers.yaml: |+
+    apiVersion: 1
+    providers:
+    - name: 'default'
+      ordId: 1
+      folder: ''
+      type: file
+      options:
+        path: /var/lib/grafana/dashboards
--- a/addons/grafana/dashboards.yaml
+++ b/addons/grafana/dashboards.yaml
@ -0,0 +1,7361 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grafana-dashboards
+  namespace: monitoring
+data:
+  deployment-dashboard.json: |+
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "prometheus",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 1,
+      "hideControls": false,
+      "links": [],
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "200px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 8,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "cores",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "CPU",
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 9,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "GB",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "80%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(container_memory_usage_bytes{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}) / 1024^3",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Memory",
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "Bps",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": false
+              },
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Network",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "100px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": false
+              },
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "metric": "kube_deployment_spec_replicas",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Desired Replicas",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 6,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Available Replicas",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 3,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_deployment_status_observed_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Observed Generation",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_deployment_metadata_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Metadata Generation",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "350px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 1,
+              "isNew": true,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "max(kube_deployment_status_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "current replicas",
+                  "refId": "A",
+                  "step": 30
+                },
+                {
+                  "expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "available",
+                  "refId": "B",
+                  "step": 30
+                },
+                {
+                  "expr": "max(kube_deployment_status_replicas_unavailable{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "unavailable",
+                  "refId": "C",
+                  "step": 30
+                },
+                {
+                  "expr": "min(kube_deployment_status_replicas_updated{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "updated",
+                  "refId": "D",
+                  "step": 30
+                },
+                {
+                  "expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "desired",
+                  "refId": "E",
+                  "step": 30
+                }
+              ],
+              "title": "Replicas",
+              "tooltip": {
+                "msResolution": true,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "none",
+                  "label": "",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": "",
+                  "logBase": 1,
+                  "show": false
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": [
+          {
+            "allValue": ".*",
+            "current": {},
+            "datasource": "prometheus",
+            "hide": 0,
+            "includeAll": false,
+            "label": "Namespace",
+            "multi": false,
+            "name": "deployment_namespace",
+            "options": [],
+            "query": "label_values(kube_deployment_metadata_generation, namespace)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": null,
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          },
+          {
+            "allValue": null,
+            "current": {},
+            "datasource": "prometheus",
+            "hide": 0,
+            "includeAll": false,
+            "label": "Deployment",
+            "multi": false,
+            "name": "deployment_name",
+            "options": [],
+            "query": "label_values(kube_deployment_metadata_generation{namespace=\"$deployment_namespace\"}, deployment)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "deployment",
+            "type": "query",
+            "useTags": false
+          }
+        ]
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Deployment",
+      "version": 1
+    }
+  etcd-dashboard.json: |+
+    {
+      "__inputs": [
+        {
+          "name": "prometheus",
+          "label": "prometheus",
+          "description": "",
+          "type": "datasource",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus"
+        }
+      ],
+      "__requires": [
+        {
+          "type": "grafana",
+          "id": "grafana",
+          "name": "Grafana",
+          "version": "4.5.2"
+        },
+        {
+          "type": "panel",
+          "id": "graph",
+          "name": "Graph",
+          "version": ""
+        },
+        {
+          "type": "datasource",
+          "id": "prometheus",
+          "name": "Prometheus",
+          "version": "1.0.0"
+        },
+        {
+          "type": "panel",
+          "id": "singlestat",
+          "name": "Singlestat",
+          "version": ""
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "description": "etcd sample Grafana dashboard with Prometheus",
+      "editable": false,
+      "gnetId": null,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "id": null,
+      "links": [],
+      "refresh": false,
+      "rows": [
+        {
+          "collapse": false,
+          "height": "250px",
+          "panels": [
+            {
+              "cacheTimeout": null,
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 28,
+              "interval": null,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "nullText": null,
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "tableColumn": "",
+              "targets": [
+                {
+                  "expr": "sum(etcd_server_has_leader)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "metric": "etcd_server_has_leader",
+                  "refId": "A",
+                  "step": 20
+                }
+              ],
+              "thresholds": "",
+              "title": "Up",
+              "type": "singlestat",
+              "valueFontSize": "200%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 23,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 5,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(grpc_server_started_total{grpc_type=\"unary\"}[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "RPC Rate",
+                  "metric": "grpc_server_started_total",
+                  "refId": "A",
+                  "step": 4
+                },
+                {
+                  "expr": "sum(rate(grpc_server_handled_total{grpc_type=\"unary\",grpc_code!=\"OK\"}[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "RPC Failed Rate",
+                  "metric": "grpc_server_handled_total",
+                  "refId": "B",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "RPC Rate",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "ops",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 41,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 4,
+              "stack": true,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Watch Streams",
+                  "metric": "grpc_server_handled_total",
+                  "refId": "A",
+                  "step": 4
+                },
+                {
+                  "expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Lease Streams",
+                  "metric": "grpc_server_handled_total",
+                  "refId": "B",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Active Streams",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "label": "",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "repeat": null,
+          "repeatIteration": null,
+          "repeatRowId": null,
+          "showTitle": false,
+          "title": "Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "decimals": null,
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "grid": {},
+              "id": 1,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 4,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "etcd_debugging_mvcc_db_total_size_in_bytes",
+                  "format": "time_series",
+                  "hide": false,
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} DB Size",
+                  "metric": "",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "DB Size",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": false
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "grid": {},
+              "id": 3,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 1,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 4,
+              "stack": false,
+              "steppedLine": true,
+              "targets": [
+                {
+                  "expr": "histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))",
+                  "format": "time_series",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} WAL fsync",
+                  "metric": "etcd_disk_wal_fsync_duration_seconds_bucket",
+                  "refId": "A",
+                  "step": 4
+                },
+                {
+                  "expr": "histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} DB fsync",
+                  "metric": "etcd_disk_backend_commit_duration_seconds_bucket",
+                  "refId": "B",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Disk Sync Duration",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "s",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": false
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 29,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 4,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "process_resident_memory_bytes",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Resident Memory",
+                  "metric": "process_resident_memory_bytes",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Memory",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "repeat": null,
+          "repeatIteration": null,
+          "repeatRowId": null,
+          "showTitle": false,
+          "title": "New row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 5,
+              "id": 22,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 3,
+              "stack": true,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "rate(etcd_network_client_grpc_received_bytes_total[5m])",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Client Traffic In",
+                  "metric": "etcd_network_client_grpc_received_bytes_total",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Client Traffic In",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "Bps",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 5,
+              "id": 21,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 3,
+              "stack": true,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "rate(etcd_network_client_grpc_sent_bytes_total[5m])",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Client Traffic Out",
+                  "metric": "etcd_network_client_grpc_sent_bytes_total",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Client Traffic Out",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "Bps",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 20,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 3,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Peer Traffic In",
+                  "metric": "etcd_network_peer_received_bytes_total",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Peer Traffic In",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "Bps",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "decimals": null,
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "grid": {},
+              "id": 16,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 3,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance)",
+                  "format": "time_series",
+                  "hide": false,
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Peer Traffic Out",
+                  "metric": "etcd_network_peer_sent_bytes_total",
+                  "refId": "A",
+                  "step": 4
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Peer Traffic Out",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "Bps",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "repeat": null,
+          "repeatIteration": null,
+          "repeatRowId": null,
+          "showTitle": false,
+          "title": "New row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 40,
+              "legend": {
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(etcd_server_proposals_failed_total[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Proposal Failure Rate",
+                  "metric": "etcd_server_proposals_failed_total",
+                  "refId": "A",
+                  "step": 2
+                },
+                {
+                  "expr": "sum(etcd_server_proposals_pending)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Proposal Pending Total",
+                  "metric": "etcd_server_proposals_pending",
+                  "refId": "B",
+                  "step": 2
+                },
+                {
+                  "expr": "sum(rate(etcd_server_proposals_committed_total[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Proposal Commit Rate",
+                  "metric": "etcd_server_proposals_committed_total",
+                  "refId": "C",
+                  "step": 2
+                },
+                {
+                  "expr": "sum(rate(etcd_server_proposals_applied_total[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Proposal Apply Rate",
+                  "refId": "D",
+                  "step": 2
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Raft Proposals",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "label": "",
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "decimals": 0,
+              "editable": false,
+              "error": false,
+              "fill": 0,
+              "id": 19,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": false,
+                "total": false,
+                "values": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "changes(etcd_server_leader_changes_seen_total[1d])",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{instance}} Total Leader Elections Per Day",
+                  "metric": "etcd_server_leader_changes_seen_total",
+                  "refId": "A",
+                  "step": 2
+                }
+              ],
+              "thresholds": [],
+              "timeFrom": null,
+              "timeShift": null,
+              "title": "Total Leader Elections Per Day",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "buckets": null,
+                "mode": "time",
+                "name": null,
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": null,
+                  "logBase": 1,
+                  "max": null,
+                  "min": null,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "repeat": null,
+          "repeatIteration": null,
+          "repeatRowId": null,
+          "showTitle": false,
+          "title": "New row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-15m",
+        "to": "now"
+      },
+      "timepicker": {
+        "now": true,
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "etcd",
+      "version": 4
+    }
+  kubernetes-capacity-planning-dashboard.json: |+
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "prometheus",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "gnetId": 22,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "refresh": false,
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 3,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(node_cpu{mode=\"idle\"}[2m])) * 100",
+                  "hide": false,
+                  "intervalFactor": 10,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 50
+                }
+              ],
+              "title": "Idle CPU",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "percent",
+                  "label": "cpu usage",
+                  "logBase": 1,
+                  "min": 0,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 9,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(node_load1)",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 1m",
+                  "refId": "A",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(node_load5)",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 5m",
+                  "refId": "B",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(node_load15)",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 15m",
+                  "refId": "C",
+                  "step": 20,
+                  "target": ""
+                }
+              ],
+              "title": "System Load",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "percentunit",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 4,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": true,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory usage",
+                  "metric": "memo",
+                  "refId": "A",
+                  "step": 10,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(node_memory_Buffers)",
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory buffers",
+                  "metric": "memo",
+                  "refId": "B",
+                  "step": 10,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(node_memory_Cached)",
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory cached",
+                  "metric": "memo",
+                  "refId": "C",
+                  "step": 10,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(node_memory_MemFree)",
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory free",
+                  "metric": "memo",
+                  "refId": "D",
+                  "step": 10,
+                  "target": ""
+                }
+              ],
+              "title": "Memory Usage",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "min": "0",
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100",
+                  "intervalFactor": 2,
+                  "metric": "",
+                  "refId": "A",
+                  "step": 60,
+                  "target": ""
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Memory Usage",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "246px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 6,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "read",
+                  "yaxis": 1
+                },
+                {
+                  "alias": "{instance=\"172.17.0.1:9100\"}",
+                  "yaxis": 2
+                },
+                {
+                  "alias": "io time",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(node_disk_bytes_read[5m]))",
+                  "hide": false,
+                  "intervalFactor": 4,
+                  "legendFormat": "read",
+                  "refId": "A",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "sum(rate(node_disk_bytes_written[5m]))",
+                  "intervalFactor": 4,
+                  "legendFormat": "written",
+                  "refId": "B",
+                  "step": 20
+                },
+                {
+                  "expr": "sum(rate(node_disk_io_time_ms[5m]))",
+                  "intervalFactor": 4,
+                  "legendFormat": "io time",
+                  "refId": "C",
+                  "step": 20
+                }
+              ],
+              "title": "Disk I/O",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "ms",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percentunit",
+              "gauge": {
+                "maxValue": 1,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 12,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 60,
+                  "target": ""
+                }
+              ],
+              "thresholds": "0.75, 0.9",
+              "title": "Disk Space Usage",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 8,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "transmitted",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(node_network_receive_bytes{device!~\"lo\"}[5m]))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 10,
+                  "target": ""
+                }
+              ],
+              "title": "Network Received",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 10,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "transmitted",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(rate(node_network_transmit_bytes{device!~\"lo\"}[5m]))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "B",
+                  "step": 10,
+                  "target": ""
+                }
+              ],
+              "title": "Network Transmitted",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "276px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 11,
+              "isNew": true,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 11,
+              "span": 9,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum(kube_pod_info)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Current number of Pods",
+                  "refId": "A",
+                  "step": 10
+                },
+                {
+                  "expr": "sum(kube_node_status_capacity_pods)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Maximum capacity of pods",
+                  "refId": "B",
+                  "step": 10
+                }
+              ],
+              "title": "Cluster Pod Utilization",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 60,
+                  "target": ""
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Pod Utilization",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-1h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Kubernetes Capacity Planning",
+      "version": 4
+    }
+  kubernetes-cluster-health-dashboard.json: |+
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "prometheus",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "refresh": "10s",
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "254px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 1,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Control Plane Components Down",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "Everything UP and healthy",
+                  "value": "null"
+                },
+                {
+                  "op": "=",
+                  "text": "",
+                  "value": ""
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Alerts Firing",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 3,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(ALERTS{alertstate=\"pending\",alertname!=\"DeadMansSwitch\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "3, 5",
+              "title": "Alerts Pending",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 4,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "count(increase(kube_pod_container_status_restarts[1h]) > 5)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Crashlooping Pods",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": false,
+          "title": "Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(kube_node_status_condition{condition=\"Ready\",status!=\"true\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Node Not Ready",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 6,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(kube_node_status_condition{condition=\"DiskPressure\",status=\"true\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Node Disk Pressure",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Node Memory Pressure",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 8,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(kube_node_spec_unschedulable)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Nodes Unschedulable",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": false,
+          "title": "Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Kubernetes Cluster Health",
+      "version": 9
+    }
+  kubernetes-cluster-status-dashboard.json: |+
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "prometheus",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "129px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 6,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Control Plane UP",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "UP",
+                  "value": "null"
+                }
+              ],
+              "valueName": "total"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 6,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 6,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "3, 5",
+              "title": "Alerts Firing",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": true,
+          "title": "Cluster Health",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "168px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 1,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"apiserver\"} == 1) / count(up{job=\"apiserver\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "API Servers UP",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / count(up{job=\"kube-controller-manager\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "Controller Managers UP",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 3,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"kube-scheduler\"} == 1) / count(up{job=\"kube-scheduler\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "Schedulers UP",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": true,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 4,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "count(increase(kube_pod_container_status_restarts{namespace=~\"kube-system|tectonic-system\"}[1h]) > 5)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "1, 3",
+              "title": "Crashlooping Control Plane Pods",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": true,
+          "title": "Control Plane Status",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "158px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 8,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "sum(100 - (avg by (instance) (rate(node_cpu{job=\"node-exporter\",mode=\"idle\"}[5m])) * 100)) / count(node_cpu{job=\"node-exporter\",mode=\"idle\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "CPU Utilization",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Memory Utilization",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 9,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Filesystem Utilization",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 10,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Pod Utilization",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": true,
+          "title": "Capacity Planning",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Kubernetes Cluster Status",
+      "version": 3
+    }
+  kubernetes-control-plane-status-dashboard.json: |+
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "prometheus",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 1,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"apiserver\"} == 1) / sum(up{job=\"apiserver\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "API Servers UP",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / sum(up{job=\"kube-controller-manager\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "Controller Managers UP",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 3,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(up{job=\"kube-scheduler\"} == 1) / sum(up{job=\"kube-scheduler\"})) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "50, 80",
+              "title": "Schedulers UP",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 4,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(sum by(instance) (rate(apiserver_request_count{code=~\"5..\"}[5m])) / sum by(instance) (rate(apiserver_request_count[5m]))) * 100",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "thresholds": "5, 10",
+              "title": "API Server Request Error Rate",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "0",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 7,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 1,
+              "links": [],
+              "nullPointMode": "null",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum by(verb) (rate(apiserver_latency_seconds:quantile[5m]) >= 0)",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 30
+                }
+              ],
+              "title": "API Server Request Latency",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 5,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 1,
+              "links": [],
+              "nullPointMode": "null",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "cluster:scheduler_e2e_scheduling_latency_seconds:quantile",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 60
+                }
+              ],
+              "title": "End to End Scheduling Latency",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "dtdurations",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 6,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 1,
+              "links": [],
+              "nullPointMode": "null",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum by(instance) (rate(apiserver_request_count{code!~\"2..\"}[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Error Rate",
+                  "refId": "A",
+                  "step": 60
+                },
+                {
+                  "expr": "sum by(instance) (rate(apiserver_request_count[5m]))",
+                  "format": "time_series",
+                  "intervalFactor": 2,
+                  "legendFormat": "Request Rate",
+                  "refId": "B",
+                  "step": 60
+                }
+              ],
+              "title": "API Server Request Rates",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Kubernetes Control Plane Status",
+      "version": 3
+    }
+  kubernetes-resource-requests-dashboard.json: |+
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "prometheus",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "refresh": false,
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "300px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "description": "This represents the total [CPU resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) in the cluster.\nFor comparison the total [allocatable CPU cores](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 1,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 1,
+              "links": [],
+              "nullPointMode": "null",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "min(sum(kube_node_status_allocatable_cpu_cores) by (instance))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "Allocatable CPU Cores",
+                  "refId": "A",
+                  "step": 20
+                },
+                {
+                  "expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "Requested CPU Cores",
+                  "refId": "B",
+                  "step": 20
+                }
+              ],
+              "title": "CPU Cores",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "label": "CPU Cores",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance)) / min(sum(kube_node_status_allocatable_cpu_cores) by (instance)) * 100",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 240
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "CPU Cores",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "CPU Cores",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "300px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "description": "This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) in the cluster.\nFor comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 3,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 1,
+              "links": [],
+              "nullPointMode": "null",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "min(sum(kube_node_status_allocatable_memory_bytes) by (instance))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "Allocatable Memory",
+                  "refId": "A",
+                  "step": 20
+                },
+                {
+                  "expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance))",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "Requested Memory",
+                  "refId": "B",
+                  "step": 20
+                }
+              ],
+              "title": "Memory",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "label": "Memory",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 4,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance)) / min(sum(kube_node_status_allocatable_memory_bytes) by (instance)) * 100",
+                  "intervalFactor": 2,
+                  "legendFormat": "",
+                  "refId": "A",
+                  "step": 240
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Memory",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Memory",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": []
+      },
+      "time": {
+        "from": "now-3h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Kubernetes Resource Requests",
+      "version": 2
+    }
+  nodes-dashboard.json: |+
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "prometheus",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "description": "Dashboard to get an overview of one server",
+      "editable": false,
+      "gnetId": 22,
+      "graphTooltip": 0,
+      "hideControls": false,
+      "links": [],
+      "refresh": false,
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 3,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "100 - (avg by (cpu) (irate(node_cpu{mode=\"idle\", instance=\"$server\"}[5m])) * 100)",
+                  "hide": false,
+                  "intervalFactor": 10,
+                  "legendFormat": "{{cpu}}",
+                  "refId": "A",
+                  "step": 50
+                }
+              ],
+              "title": "Idle CPU",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "percent",
+                  "label": "cpu usage",
+                  "logBase": 1,
+                  "max": 100,
+                  "min": 0,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 9,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "node_load1{instance=\"$server\"}",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 1m",
+                  "refId": "A",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "node_load5{instance=\"$server\"}",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 5m",
+                  "refId": "B",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "node_load15{instance=\"$server\"}",
+                  "intervalFactor": 4,
+                  "legendFormat": "load 15m",
+                  "refId": "C",
+                  "step": 20,
+                  "target": ""
+                }
+              ],
+              "title": "System Load",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "percentunit",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 4,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": true,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}",
+                  "hide": false,
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory used",
+                  "metric": "",
+                  "refId": "C",
+                  "step": 10
+                },
+                {
+                  "expr": "node_memory_Buffers{instance=\"$server\"}",
+                  "interval": "",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory buffers",
+                  "metric": "",
+                  "refId": "E",
+                  "step": 10
+                },
+                {
+                  "expr": "node_memory_Cached{instance=\"$server\"}",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory cached",
+                  "metric": "",
+                  "refId": "F",
+                  "step": 10
+                },
+                {
+                  "expr": "node_memory_MemFree{instance=\"$server\"}",
+                  "intervalFactor": 2,
+                  "legendFormat": "memory free",
+                  "metric": "",
+                  "refId": "D",
+                  "step": 10
+                }
+              ],
+              "title": "Memory Usage",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "individual"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "min": "0",
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percent",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "((node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"}  - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}) / node_memory_MemTotal{instance=\"$server\"}) * 100",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 60,
+                  "target": ""
+                }
+              ],
+              "thresholds": "80, 90",
+              "title": "Memory Usage",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 6,
+              "isNew": true,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "read",
+                  "yaxis": 1
+                },
+                {
+                  "alias": "{instance=\"172.17.0.1:9100\"}",
+                  "yaxis": 2
+                },
+                {
+                  "alias": "io time",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 9,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum by (instance) (rate(node_disk_bytes_read{instance=\"$server\"}[2m]))",
+                  "hide": false,
+                  "intervalFactor": 4,
+                  "legendFormat": "read",
+                  "refId": "A",
+                  "step": 20,
+                  "target": ""
+                },
+                {
+                  "expr": "sum by (instance) (rate(node_disk_bytes_written{instance=\"$server\"}[2m]))",
+                  "intervalFactor": 4,
+                  "legendFormat": "written",
+                  "refId": "B",
+                  "step": 20
+                },
+                {
+                  "expr": "sum by (instance) (rate(node_disk_io_time_ms{instance=\"$server\"}[2m]))",
+                  "intervalFactor": 4,
+                  "legendFormat": "io time",
+                  "refId": "C",
+                  "step": 20
+                }
+              ],
+              "title": "Disk I/O",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "ms",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(50, 172, 45, 0.97)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(245, 54, 54, 0.9)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "percentunit",
+              "gauge": {
+                "maxValue": 1,
+                "minValue": 0,
+                "show": true,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "hideTimeOverride": false,
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "(sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"}) - sum(node_filesystem_free{device!=\"rootfs\",instance=\"$server\"})) / sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"})",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 60,
+                  "target": ""
+                }
+              ],
+              "thresholds": "0.75, 0.9",
+              "title": "Disk Space Usage",
+              "transparent": false,
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "current"
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 8,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "transmitted",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "rate(node_network_receive_bytes{instance=\"$server\",device!~\"lo\"}[5m])",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "{{device}}",
+                  "refId": "A",
+                  "step": 10,
+                  "target": ""
+                }
+              ],
+              "title": "Network Received",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            },
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 10,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [
+                {
+                  "alias": "transmitted",
+                  "yaxis": 2
+                }
+              ],
+              "spaceLength": 10,
+              "span": 6,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "rate(node_network_transmit_bytes{instance=\"$server\",device!~\"lo\"}[5m])",
+                  "hide": false,
+                  "intervalFactor": 2,
+                  "legendFormat": "{{device}}",
+                  "refId": "B",
+                  "step": 10,
+                  "target": ""
+                }
+              ],
+              "title": "Network Transmitted",
+              "tooltip": {
+                "msResolution": false,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": [
+          {
+            "allValue": null,
+            "current": {},
+            "datasource": "prometheus",
+            "hide": 0,
+            "includeAll": false,
+            "label": null,
+            "multi": false,
+            "name": "server",
+            "options": [],
+            "query": "label_values(node_boot_time, instance)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          }
+        ]
+      },
+      "time": {
+        "from": "now-1h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Nodes",
+      "version": 2
+    }
+  pods-dashboard.json: |+
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "prometheus",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 1,
+      "hideControls": false,
+      "links": [],
+      "refresh": false,
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 1,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": true,
+                "avg": true,
+                "current": true,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": true,
+                "show": true,
+                "total": false,
+                "values": true
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum by(container_name) (container_memory_usage_bytes{pod_name=\"$pod\", container_name=~\"$container\", container_name!=\"POD\"})",
+                  "interval": "10s",
+                  "intervalFactor": 1,
+                  "legendFormat": "Current: {{ container_name }}",
+                  "metric": "container_memory_usage_bytes",
+                  "refId": "A",
+                  "step": 15
+                },
+                {
+                  "expr": "kube_pod_container_resource_requests_memory_bytes{pod=\"$pod\", container=~\"$container\"}",
+                  "interval": "10s",
+                  "intervalFactor": 2,
+                  "legendFormat": "Requested: {{ container }}",
+                  "metric": "kube_pod_container_resource_requests_memory_bytes",
+                  "refId": "B",
+                  "step": 20
+                },
+                {
+                  "expr": "kube_pod_container_resource_limits_memory_bytes{pod=\"$pod\", container=~\"$container\"}",
+                  "interval": "10s",
+                  "intervalFactor": 2,
+                  "legendFormat": "Limit: {{ container }}",
+                  "metric": "kube_pod_container_resource_limits_memory_bytes",
+                  "refId": "C",
+                  "step": 20
+                }
+              ],
+              "title": "Memory Usage",
+              "tooltip": {
+                "msResolution": true,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 2,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": true,
+                "avg": true,
+                "current": true,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": true,
+                "show": true,
+                "total": false,
+                "values": true
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sum by (container_name)(rate(container_cpu_usage_seconds_total{image!=\"\",container_name!=\"POD\",pod_name=\"$pod\"}[1m]))",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{ container_name }}",
+                  "refId": "A",
+                  "step": 30
+                },
+                {
+                  "expr": "kube_pod_container_resource_requests_cpu_cores{pod=\"$pod\", container=~\"$container\"}",
+                  "interval": "10s",
+                  "intervalFactor": 2,
+                  "legendFormat": "Requested: {{ container }}",
+                  "metric": "kube_pod_container_resource_requests_cpu_cores",
+                  "refId": "B",
+                  "step": 20
+                },
+                {
+                  "expr": "kube_pod_container_resource_limits_cpu_cores{pod=\"$pod\", container=~\"$container\"}",
+                  "interval": "10s",
+                  "intervalFactor": 2,
+                  "legendFormat": "Limit: {{ container }}",
+                  "metric": "kube_pod_container_resource_limits_memory_bytes",
+                  "refId": "C",
+                  "step": 20
+                }
+              ],
+              "title": "CPU Usage",
+              "tooltip": {
+                "msResolution": true,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "250px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 3,
+              "isNew": false,
+              "legend": {
+                "alignAsTable": true,
+                "avg": true,
+                "current": true,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": true,
+                "show": true,
+                "total": false,
+                "values": true
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{pod_name=\"$pod\"}[1m])))",
+                  "intervalFactor": 2,
+                  "legendFormat": "{{ pod_name }}",
+                  "refId": "A",
+                  "step": 30
+                }
+              ],
+              "title": "Network I/O",
+              "tooltip": {
+                "msResolution": true,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "bytes",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "logBase": 1,
+                  "show": true
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "New Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": [
+          {
+            "allValue": ".*",
+            "current": {},
+            "datasource": "prometheus",
+            "hide": 0,
+            "includeAll": true,
+            "label": "Namespace",
+            "multi": false,
+            "name": "namespace",
+            "options": [],
+            "query": "label_values(kube_pod_info, namespace)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          },
+          {
+            "allValue": null,
+            "current": {},
+            "datasource": "prometheus",
+            "hide": 0,
+            "includeAll": false,
+            "label": "Pod",
+            "multi": false,
+            "name": "pod",
+            "options": [],
+            "query": "label_values(kube_pod_info{namespace=~\"$namespace\"}, pod)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          },
+          {
+            "allValue": ".*",
+            "current": {},
+            "datasource": "prometheus",
+            "hide": 0,
+            "includeAll": true,
+            "label": "Container",
+            "multi": false,
+            "name": "container",
+            "options": [],
+            "query": "label_values(kube_pod_container_info{namespace=\"$namespace\", pod=\"$pod\"}, container)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          }
+        ]
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "Pods",
+      "version": 1
+    }
+  statefulset-dashboard.json: |+
+    {
+      "__inputs": [
+        {
+          "description": "",
+          "label": "prometheus",
+          "name": "prometheus",
+          "pluginId": "prometheus",
+          "pluginName": "Prometheus",
+          "type": "datasource"
+        }
+      ],
+      "annotations": {
+        "list": []
+      },
+      "editable": false,
+      "graphTooltip": 1,
+      "hideControls": false,
+      "links": [],
+      "rows": [
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "200px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 8,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "cores",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "CPU",
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 9,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "GB",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "80%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(container_memory_usage_bytes{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}) / 1024^3",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Memory",
+              "type": "singlestat",
+              "valueFontSize": "110%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "Bps",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": false
+              },
+              "id": 7,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfix": "",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 4,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": true
+              },
+              "targets": [
+                {
+                  "expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Network",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "100px",
+          "panels": [
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": false
+              },
+              "id": 5,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "metric": "kube_statefulset_replicas",
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Desired Replicas",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 6,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Available Replicas",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 3,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_statefulset_status_observed_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Observed Generation",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            },
+            {
+              "colorBackground": false,
+              "colorValue": false,
+              "colors": [
+                "rgba(245, 54, 54, 0.9)",
+                "rgba(237, 129, 40, 0.89)",
+                "rgba(50, 172, 45, 0.97)"
+              ],
+              "datasource": "prometheus",
+              "editable": false,
+              "format": "none",
+              "gauge": {
+                "maxValue": 100,
+                "minValue": 0,
+                "show": false,
+                "thresholdLabels": false,
+                "thresholdMarkers": true
+              },
+              "id": 2,
+              "links": [],
+              "mappingType": 1,
+              "mappingTypes": [
+                {
+                  "name": "value to text",
+                  "value": 1
+                },
+                {
+                  "name": "range to text",
+                  "value": 2
+                }
+              ],
+              "maxDataPoints": 100,
+              "nullPointMode": "connected",
+              "postfixFontSize": "50%",
+              "prefix": "",
+              "prefixFontSize": "50%",
+              "rangeMaps": [
+                {
+                  "from": "null",
+                  "text": "N/A",
+                  "to": "null"
+                }
+              ],
+              "span": 3,
+              "sparkline": {
+                "fillColor": "rgba(31, 118, 189, 0.18)",
+                "full": false,
+                "lineColor": "rgb(31, 120, 193)",
+                "show": false
+              },
+              "targets": [
+                {
+                  "expr": "max(kube_statefulset_metadata_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "refId": "A",
+                  "step": 600
+                }
+              ],
+              "title": "Metadata Generation",
+              "type": "singlestat",
+              "valueFontSize": "80%",
+              "valueMaps": [
+                {
+                  "op": "=",
+                  "text": "N/A",
+                  "value": "null"
+                }
+              ],
+              "valueName": "avg"
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        },
+        {
+          "collapse": false,
+          "editable": false,
+          "height": "350px",
+          "panels": [
+            {
+              "aliasColors": {},
+              "bars": false,
+              "dashLength": 10,
+              "dashes": false,
+              "datasource": "prometheus",
+              "editable": false,
+              "error": false,
+              "fill": 1,
+              "grid": {
+                "threshold1Color": "rgba(216, 200, 27, 0.27)",
+                "threshold2Color": "rgba(234, 112, 112, 0.22)"
+              },
+              "id": 1,
+              "isNew": true,
+              "legend": {
+                "alignAsTable": false,
+                "avg": false,
+                "current": false,
+                "hideEmpty": false,
+                "hideZero": false,
+                "max": false,
+                "min": false,
+                "rightSide": false,
+                "show": true,
+                "total": false
+              },
+              "lines": true,
+              "linewidth": 2,
+              "links": [],
+              "nullPointMode": "connected",
+              "percentage": false,
+              "pointradius": 5,
+              "points": false,
+              "renderer": "flot",
+              "seriesOverrides": [],
+              "spaceLength": 10,
+              "span": 12,
+              "stack": false,
+              "steppedLine": false,
+              "targets": [
+                {
+                  "expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "available",
+                  "refId": "B",
+                  "step": 30
+                },
+                {
+                  "expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)",
+                  "intervalFactor": 2,
+                  "legendFormat": "desired",
+                  "refId": "E",
+                  "step": 30
+                }
+              ],
+              "title": "Replicas",
+              "tooltip": {
+                "msResolution": true,
+                "shared": true,
+                "sort": 0,
+                "value_type": "cumulative"
+              },
+              "type": "graph",
+              "xaxis": {
+                "mode": "time",
+                "show": true,
+                "values": []
+              },
+              "yaxes": [
+                {
+                  "format": "none",
+                  "label": "",
+                  "logBase": 1,
+                  "show": true
+                },
+                {
+                  "format": "short",
+                  "label": "",
+                  "logBase": 1,
+                  "show": false
+                }
+              ]
+            }
+          ],
+          "showTitle": false,
+          "title": "Dashboard Row",
+          "titleSize": "h6"
+        }
+      ],
+      "schemaVersion": 14,
+      "sharedCrosshair": false,
+      "style": "dark",
+      "tags": [],
+      "templating": {
+        "list": [
+          {
+            "allValue": ".*",
+            "current": {},
+            "datasource": "prometheus",
+            "hide": 0,
+            "includeAll": false,
+            "label": "Namespace",
+            "multi": false,
+            "name": "statefulset_namespace",
+            "options": [],
+            "query": "label_values(kube_statefulset_metadata_generation, namespace)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": null,
+            "tags": [],
+            "tagsQuery": "",
+            "type": "query",
+            "useTags": false
+          },
+          {
+            "allValue": null,
+            "current": {},
+            "datasource": "prometheus",
+            "hide": 0,
+            "includeAll": false,
+            "label": "StatefulSet",
+            "multi": false,
+            "name": "statefulset_name",
+            "options": [],
+            "query": "label_values(kube_statefulset_metadata_generation{namespace=\"$statefulset_namespace\"}, statefulset)",
+            "refresh": 1,
+            "regex": "",
+            "sort": 0,
+            "tagValuesQuery": "",
+            "tags": [],
+            "tagsQuery": "statefulset",
+            "type": "query",
+            "useTags": false
+          }
+        ]
+      },
+      "time": {
+        "from": "now-6h",
+        "to": "now"
+      },
+      "timepicker": {
+        "refresh_intervals": [
+          "5s",
+          "10s",
+          "30s",
+          "1m",
+          "5m",
+          "15m",
+          "30m",
+          "1h",
+          "2h",
+          "1d"
+        ],
+        "time_options": [
+          "5m",
+          "15m",
+          "1h",
+          "6h",
+          "12h",
+          "24h",
+          "2d",
+          "7d",
+          "30d"
+        ]
+      },
+      "timezone": "browser",
+      "title": "StatefulSet",
+      "version": 1
+    }
+---
--- a/addons/grafana/datasources.yaml
+++ b/addons/grafana/datasources.yaml
@ -0,0 +1,16 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grafana-datasources
+  namespace: monitoring
+data:
+  prometheus.yaml: |+
+    apiVersion: 1
+    datasources:
+    - name: prometheus
+      type: prometheus
+      access: proxy
+      orgId: 1
+      url: http://prometheus.monitoring.svc.cluster.local
+      version: 1
+      editable: false
--- a/addons/grafana/deployment.yaml
+++ b/addons/grafana/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: apps/v1beta2
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: grafana
@ -21,7 +21,7 @@ spec:
    spec:
      containers:
        - name: grafana
-          image: grafana/grafana:4.6.3
+          image: grafana/grafana:5.0.4
          env:
            - name: GF_SERVER_HTTP_PORT
              value: "8080"
@ -30,7 +30,7 @@ spec:
            - name: GF_AUTH_ANONYMOUS_ENABLED
              value: "true"
            - name: GF_AUTH_ANONYMOUS_ORG_ROLE
-              value: Admin
+              value: Viewer
          ports:
            - name: http
              containerPort: 8080
@ -41,6 +41,20 @@ spec:
            limits:
              memory: 200Mi
              cpu: 200m
+          volumeMounts:
+            - name: datasources
+              mountPath: /etc/grafana/provisioning/datasources
+            - name: dashboard-providers
+              mountPath: /etc/grafana/provisioning/dashboards
+            - name: dashboards
+              mountPath: /var/lib/grafana/dashboards
      volumes:
-        - name: grafana-storage
-          emptyDir: {}
+        - name: datasources
+          configMap:
+            name: grafana-datasources
+        - name: dashboard-providers
+          configMap:
+            name: grafana-dashboard-providers
+        - name: dashboards
+          configMap:
+            name: grafana-dashboards
--- a/addons/heapster/cluster-role-binding.yaml
+++ b/addons/heapster/cluster-role-binding.yaml
@ -0,0 +1,12 @@
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: heapster
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: system:heapster
+subjects:
+- kind: ServiceAccount
+  name: heapster
+  namespace: kube-system
--- a/addons/heapster/deployment.yaml
+++ b/addons/heapster/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: apps/v1beta2
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: heapster
@ -14,12 +14,11 @@ spec:
      labels:
        name: heapster
        phase: prod
-      annotations:
-        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
+      serviceAccountName: heapster
      containers:
        - name: heapster
-          image: gcr.io/google_containers/heapster-amd64:v1.5.0
+          image: k8s.gcr.io/heapster-amd64:v1.5.2
          command:
            - /heapster
            - --source=kubernetes.summary_api:''
@ -31,7 +30,7 @@ spec:
            initialDelaySeconds: 180
            timeoutSeconds: 5
        - name: heapster-nanny
-          image: gcr.io/google_containers/addon-resizer:1.7
+          image: k8s.gcr.io/addon-resizer:1.7
          command:
            - /pod_nanny
            - --cpu=80m
--- a/addons/heapster/role-binding.yaml
+++ b/addons/heapster/role-binding.yaml
@ -0,0 +1,13 @@
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: heapster
+  namespace: kube-system
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: system:pod-nanny
+subjects:
+- kind: ServiceAccount
+  name: heapster
+  namespace: kube-system
--- a/addons/heapster/role.yaml
+++ b/addons/heapster/role.yaml
@ -0,0 +1,19 @@
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: system:pod-nanny
+  namespace: kube-system
+rules:
+- apiGroups:
+  - ""
+  resources:
+  - pods
+  verbs:
+  - get
+- apiGroups:
+  - "extensions"
+  resources:
+  - deployments
+  verbs:
+  - get
+  - update
--- a/addons/heapster/service-account.yaml
+++ b/addons/heapster/service-account.yaml
@ -0,0 +1,5 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: heapster
+  namespace: kube-system
--- a/addons/nginx-ingress/aws/0-namespace.yaml
+++ b/addons/nginx-ingress/aws/0-namespace.yaml
--- a/addons/nginx-ingress/aws/default-backend/deployment.yaml
+++ b/addons/nginx-ingress/aws/default-backend/deployment.yaml
@ -1,10 +1,14 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: default-backend
  namespace: ingress
 spec:
  replicas: 1
+  selector:
+    matchLabels:
+      name: default-backend
+      phase: prod
  template:
    metadata:
      labels:
@ -16,7 +20,7 @@ spec:
          # Any image is permissable as long as:
          # 1. It serves a 404 page at /
          # 2. It serves 200 on a /healthz endpoint
-          image: gcr.io/google_containers/defaultbackend:1.4
+          image: k8s.gcr.io/defaultbackend:1.4
          ports:
            - containerPort: 8080
          resources:
--- a/addons/nginx-ingress/aws/deployment.yaml
+++ b/addons/nginx-ingress/aws/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: nginx-ingress-controller
@ -8,6 +8,10 @@ spec:
  strategy:
    rollingUpdate:
      maxUnavailable: 1
+  selector:
+    matchLabels:
+      name: nginx-ingress-controller
+      phase: prod
  template:
    metadata:
      labels:
@ -19,7 +23,7 @@ spec:
      hostNetwork: true
      containers:
        - name: nginx-ingress-controller
-          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
+          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
--- a/addons/nginx-ingress/aws/rbac/cluster-role-binding.yaml
+++ b/addons/nginx-ingress/aws/rbac/cluster-role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
 roleRef:
--- a/addons/nginx-ingress/aws/rbac/cluster-role.yaml
+++ b/addons/nginx-ingress/aws/rbac/cluster-role.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: ingress
--- a/addons/nginx-ingress/aws/rbac/role-binding.yaml
+++ b/addons/nginx-ingress/aws/rbac/role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: RoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/nginx-ingress/aws/rbac/role.yaml
+++ b/addons/nginx-ingress/aws/rbac/role.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: Role
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/nginx-ingress/digital-ocean/0-namespace.yaml
+++ b/addons/nginx-ingress/digital-ocean/0-namespace.yaml
--- a/addons/nginx-ingress/digital-ocean/daemonset.yaml
+++ b/addons/nginx-ingress/digital-ocean/daemonset.yaml
@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: DaemonSet
 metadata:
  name: nginx-ingress-controller
@ -8,6 +8,10 @@ spec:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
+  selector:
+    matchLabels:
+      name: nginx-ingress-controller
+      phase: prod
  template:
    metadata:
      labels:
@ -19,7 +23,7 @@ spec:
      hostNetwork: true
      containers:
        - name: nginx-ingress-controller
-          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
+          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
--- a/addons/nginx-ingress/digital-ocean/default-backend/deployment.yaml
+++ b/addons/nginx-ingress/digital-ocean/default-backend/deployment.yaml
@ -1,10 +1,14 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: default-backend
  namespace: ingress
 spec:
  replicas: 1
+  selector:
+    matchLabels:
+      name: default-backend
+      phase: prod
  template:
    metadata:
      labels:
@ -16,7 +20,7 @@ spec:
          # Any image is permissable as long as:
          # 1. It serves a 404 page at /
          # 2. It serves 200 on a /healthz endpoint
-          image: gcr.io/google_containers/defaultbackend:1.4
+          image: k8s.gcr.io/defaultbackend:1.4
          ports:
            - containerPort: 8080
          resources:
--- a/addons/nginx-ingress/digital-ocean/rbac/cluster-role-binding.yaml
+++ b/addons/nginx-ingress/digital-ocean/rbac/cluster-role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
 roleRef:
--- a/addons/nginx-ingress/digital-ocean/rbac/cluster-role.yaml
+++ b/addons/nginx-ingress/digital-ocean/rbac/cluster-role.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: ingress
--- a/addons/nginx-ingress/digital-ocean/rbac/role-binding.yaml
+++ b/addons/nginx-ingress/digital-ocean/rbac/role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: RoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/nginx-ingress/digital-ocean/rbac/role.yaml
+++ b/addons/nginx-ingress/digital-ocean/rbac/role.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: Role
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/nginx-ingress/google-cloud/0-namespace.yaml
+++ b/addons/nginx-ingress/google-cloud/0-namespace.yaml
--- a/addons/nginx-ingress/google-cloud/default-backend/deployment.yaml
+++ b/addons/nginx-ingress/google-cloud/default-backend/deployment.yaml
@ -1,10 +1,14 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: default-backend
  namespace: ingress
 spec:
  replicas: 1
+  selector:
+    matchLabels:
+      name: default-backend
+      phase: prod
  template:
    metadata:
      labels:
@ -16,7 +20,7 @@ spec:
          # Any image is permissable as long as:
          # 1. It serves a 404 page at /
          # 2. It serves 200 on a /healthz endpoint
-          image: gcr.io/google_containers/defaultbackend:1.4
+          image: k8s.gcr.io/defaultbackend:1.4
          ports:
            - containerPort: 8080
          resources:
--- a/addons/nginx-ingress/google-cloud/deployment.yaml
+++ b/addons/nginx-ingress/google-cloud/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: nginx-ingress-controller
@ -8,6 +8,10 @@ spec:
  strategy:
    rollingUpdate:
      maxUnavailable: 1
+  selector:
+    matchLabels:
+      name: nginx-ingress-controller
+      phase: prod
  template:
    metadata:
      labels:
@ -19,7 +23,7 @@ spec:
      hostNetwork: true
      containers:
        - name: nginx-ingress-controller
-          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
+          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
--- a/addons/nginx-ingress/google-cloud/rbac/cluster-role-binding.yaml
+++ b/addons/nginx-ingress/google-cloud/rbac/cluster-role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
 roleRef:
--- a/addons/nginx-ingress/google-cloud/rbac/cluster-role.yaml
+++ b/addons/nginx-ingress/google-cloud/rbac/cluster-role.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: ingress
--- a/addons/nginx-ingress/google-cloud/rbac/role-binding.yaml
+++ b/addons/nginx-ingress/google-cloud/rbac/role-binding.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: RoleBinding
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/nginx-ingress/google-cloud/rbac/role.yaml
+++ b/addons/nginx-ingress/google-cloud/rbac/role.yaml
@ -1,5 +1,5 @@
+apiVersion: rbac.authorization.k8s.io/v1
 kind: Role
-apiVersion: rbac.authorization.k8s.io/v1beta1
 metadata:
  name: ingress
  namespace: ingress
--- a/addons/prometheus/0-namespace.yaml
+++ b/addons/prometheus/0-namespace.yaml
--- a/addons/prometheus/config.yaml
+++ b/addons/prometheus/config.yaml
@ -39,7 +39,7 @@ data:
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        # Using endpoints to discover kube-apiserver targets finds the pod IP
-        # (host IP since apiserver is uses host network) which is not used in
+        # (host IP since apiserver uses host network) which is not used in
        # the server certificate.
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
@ -51,6 +51,9 @@ data:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
+      - replacement: apiserver
+        action: replace
+        target_label: job

    # Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
    # metrics from a node by scraping kubelet (127.0.0.1:10255/metrics).
@ -59,7 +62,7 @@ data:
    # Kubernetes apiserver.  This means it will work if Prometheus is running out of
    # cluster, or can't connect to nodes for some other reason (e.g. because of
    # firewalling).
-    - job_name: 'kubernetes-nodes'
+    - job_name: 'kubelet'
      kubernetes_sd_configs:
      - role: node
      
@ -109,6 +112,22 @@ data:
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    
+    # Scrap etcd metrics from controllers 
+    - job_name: 'etcd'
+      kubernetes_sd_configs:
+      - role: node
+      scheme: http
+      relabel_configs:
+        - source_labels: [__meta_kubernetes_node_label_node_role_kubernetes_io_controller]
+          action: keep
+          regex: 'true'
+        - action: labelmap
+          regex: __meta_kubernetes_node_label_(.+)
+        - source_labels: [__meta_kubernetes_node_name]
+          action: replace
+          target_label: __address__
+          replacement: '${1}:2381'
+    
    # Scrape config for service endpoints.
    #
    # The relabeling allows the actual service scrape endpoint to be configured
@ -149,7 +168,7 @@ data:
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
-        target_label: kubernetes_name
+        target_label: job

    # Example scrape config for probing services via the Blackbox Exporter.
    #
@ -181,7 +200,7 @@ data:
      - source_labels: [__meta_kubernetes_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
-        target_label: kubernetes_name
+        target_label: job

    # Example scrape config for pods
    #
--- a/addons/prometheus/deployment.yaml
+++ b/addons/prometheus/deployment.yaml
@ -1,22 +1,24 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: prometheus
  namespace: monitoring
 spec:
  replicas: 1
-  strategy:
-    rollingUpdate:
-      maxUnavailable: 1
+  selector:
+    matchLabels:
+      name: prometheus
+      phase: prod
  template:
    metadata:
      labels:
        name: prometheus
        phase: prod
    spec:
+      serviceAccountName: prometheus
      containers:
      - name: prometheus
-        image: quay.io/prometheus/prometheus:v2.0.0
+        image: quay.io/prometheus/prometheus:v2.2.1
        args:
          - '--config.file=/etc/prometheus/prometheus.yaml'
        ports:
--- a/addons/prometheus/exporters/kube-state-metrics/cluster-role.yaml
+++ b/addons/prometheus/exporters/kube-state-metrics/cluster-role.yaml
@ -5,6 +5,8 @@ metadata:
 rules:
 - apiGroups: [""]
  resources:
+  - configmaps
+  - secrets
  - nodes
  - pods
  - services
@ -35,4 +37,3 @@ rules:
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]
-
--- a/addons/prometheus/exporters/kube-state-metrics/deployment.yaml
+++ b/addons/prometheus/exporters/kube-state-metrics/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: apps/v1beta2
+apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: kube-state-metrics
@ -22,7 +22,7 @@ spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
-        image: quay.io/coreos/kube-state-metrics:v1.2.0
+        image: quay.io/coreos/kube-state-metrics:v1.3.0
        ports:
          - name: metrics
            containerPort: 8080
@ -33,7 +33,7 @@ spec:
          initialDelaySeconds: 5
          timeoutSeconds: 5
      - name: addon-resizer
-        image: gcr.io/google_containers/addon-resizer:1.0
+        image: k8s.gcr.io/addon-resizer:1.7
        resources:
          limits:
            cpu: 100m
--- a/addons/prometheus/exporters/kube-state-metrics/service.yaml
+++ b/addons/prometheus/exporters/kube-state-metrics/service.yaml
@ -15,5 +15,5 @@ spec:
  ports:
    - name: metrics
      protocol: TCP
-      port: 80
+      port: 8080
      targetPort: 8080
--- a/addons/prometheus/exporters/node-exporter/daemonset.yaml
+++ b/addons/prometheus/exporters/node-exporter/daemonset.yaml
@ -1,4 +1,4 @@
-apiVersion: apps/v1beta2
+apiVersion: apps/v1
 kind: DaemonSet
 metadata:
  name: node-exporter
@ -18,11 +18,15 @@ spec:
        name: node-exporter
        phase: prod
    spec:
+      serviceAccountName: node-exporter
+      securityContext:
+        runAsNonRoot: true
+        runAsUser: 65534
      hostNetwork: true
      hostPID: true
      containers:
      - name: node-exporter
-        image: quay.io/prometheus/node-exporter:v0.15.0
+        image: quay.io/prometheus/node-exporter:v0.15.2
        args:
          - "--path.procfs=/host/proc"
          - "--path.sysfs=/host/sys"
@ -45,9 +49,8 @@ spec:
            mountPath: /host/sys
            readOnly: true
      tolerations:
-        - key: node-role.kubernetes.io/master
+        - effect: NoSchedule
          operator: Exists
-          effect: NoSchedule
      volumes:
        - name: proc
          hostPath:
--- a/addons/prometheus/exporters/node-exporter/service-account.yaml
+++ b/addons/prometheus/exporters/node-exporter/service-account.yaml
@ -0,0 +1,5 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: node-exporter
+  namespace: monitoring
--- a/addons/prometheus/rbac/cluster-role-binding.yaml
+++ b/addons/prometheus/rbac/cluster-role-binding.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
 metadata:
  name: prometheus
@ -8,5 +8,5 @@ roleRef:
  name: prometheus
 subjects:
 - kind: ServiceAccount
-  name: default
+  name: prometheus
  namespace: monitoring
--- a/addons/prometheus/rbac/cluster-role.yaml
+++ b/addons/prometheus/rbac/cluster-role.yaml
@ -1,4 +1,4 @@
-apiVersion: rbac.authorization.k8s.io/v1beta1
+apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: prometheus
--- a/addons/prometheus/rules.yaml
+++ b/addons/prometheus/rules.yaml
@ -4,8 +4,7 @@ metadata:
  name: prometheus-rules
  namespace: monitoring
 data:
-  # Rules adapted from those provided by coreos/prometheus-operator and SoundCloud
-  alertmanager.rules.yaml: |+
+  alertmanager.rules.yaml: |
    groups:
    - name: alertmanager.rules
      rules:
@ -36,7 +35,7 @@ data:
        annotations:
          description: Reloading Alertmanager's configuration has failed for {{ $labels.namespace
            }}/{{ $labels.pod}}.
-  etcd3.rules.yaml: |+
+  etcd3.rules.yaml: |
    groups:
    - name: ./etcd3.rules
      rules:
@ -64,28 +63,8 @@ data:
          description: etcd instance {{ $labels.instance }} has seen {{ $value }} leader
            changes within the last hour
          summary: a high number of leader changes within the etcd cluster are happening
-      - alert: HighNumberOfFailedGRPCRequests
-        expr: sum(rate(etcd_grpc_requests_failed_total{job="etcd"}[5m])) BY (grpc_method)
-          / sum(rate(etcd_grpc_total{job="etcd"}[5m])) BY (grpc_method) > 0.01
-        for: 10m
-        labels:
-          severity: warning
-        annotations:
-          description: '{{ $value }}% of requests for {{ $labels.grpc_method }} failed
-            on etcd instance {{ $labels.instance }}'
-          summary: a high number of gRPC requests are failing
-      - alert: HighNumberOfFailedGRPCRequests
-        expr: sum(rate(etcd_grpc_requests_failed_total{job="etcd"}[5m])) BY (grpc_method)
-          / sum(rate(etcd_grpc_total{job="etcd"}[5m])) BY (grpc_method) > 0.05
-        for: 5m
-        labels:
-          severity: critical
-        annotations:
-          description: '{{ $value }}% of requests for {{ $labels.grpc_method }} failed
-            on etcd instance {{ $labels.instance }}'
-          summary: a high number of gRPC requests are failing
      - alert: GRPCRequestsSlow
-        expr: histogram_quantile(0.99, rate(etcd_grpc_unary_requests_duration_seconds_bucket[5m]))
+        expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job="etcd",grpc_type="unary"}[5m])) by (grpc_service, grpc_method, le))
          > 0.15
        for: 10m
        labels:
@ -125,7 +104,7 @@ data:
            }} are slow
          summary: slow HTTP requests
      - alert: EtcdMemberCommunicationSlow
-        expr: histogram_quantile(0.99, rate(etcd_network_member_round_trip_time_seconds_bucket[5m]))
+        expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[5m]))
          > 0.15
        for: 10m
        labels:
@ -160,7 +139,7 @@ data:
        annotations:
          description: etcd instance {{ $labels.instance }} commit durations are high
          summary: high commit durations
-  general.rules.yaml: |+
+  general.rules.yaml: |
    groups:
    - name: general.rules
      rules:
@ -192,12 +171,12 @@ data:
          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
            will exhaust in file/socket descriptors within the next hour'
          summary: file descriptors soon exhausted
-  kube-controller-manager.rules.yaml: |+
+  kube-controller-manager.rules.yaml: |
    groups:
    - name: kube-controller-manager.rules
      rules:
      - alert: K8SControllerManagerDown
-        expr: absent(up{kubernetes_name="kube-controller-manager"} == 1)
+        expr: absent(up{job="kube-controller-manager"} == 1)
        for: 5m
        labels:
          severity: critical
@ -205,7 +184,7 @@ data:
          description: There is no running K8S controller manager. Deployments and replication
            controllers are not making progress.
          summary: Controller manager is down
-  kube-scheduler.rules.yaml: |+
+  kube-scheduler.rules.yaml: |
    groups:
    - name: kube-scheduler.rules
      rules:
@ -255,7 +234,7 @@ data:
        labels:
          quantile: "0.5"
      - alert: K8SSchedulerDown
-        expr: absent(up{kubernetes_name="kube-scheduler"} == 1)
+        expr: absent(up{job="kube-scheduler"} == 1)
        for: 5m
        labels:
          severity: critical
@ -263,7 +242,7 @@ data:
          description: There is no running K8S scheduler. New pods are not being assigned
            to nodes.
          summary: Scheduler is down
-  kube-state-metrics.rules.yaml: |+
+  kube-state-metrics.rules.yaml: |
    groups:
    - name: kube-state-metrics.rules
      rules:
@ -274,7 +253,8 @@ data:
          severity: warning
        annotations:
          description: Observed deployment generation does not match expected one for
-            deployment {{$labels.namespaces}}{{$labels.deployment}}
+            deployment {{$labels.namespaces}}/{{$labels.deployment}}
+          summary: Deployment is outdated
      - alert: DeploymentReplicasNotUpdated
        expr: ((kube_deployment_status_replicas_updated != kube_deployment_spec_replicas)
          or (kube_deployment_status_replicas_available != kube_deployment_spec_replicas))
@ -284,8 +264,9 @@ data:
          severity: warning
        annotations:
          description: Replicas are not updated and available for deployment {{$labels.namespaces}}/{{$labels.deployment}}
+          summary: Deployment replicas are outdated
      - alert: DaemonSetRolloutStuck
-        expr: kube_daemonset_status_current_number_ready / kube_daemonset_status_desired_number_scheduled
+        expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled
          * 100 < 100
        for: 15m
        labels:
@ -293,6 +274,7 @@ data:
        annotations:
          description: Only {{$value}}% of desired pods scheduled and ready for daemon
            set {{$labels.namespaces}}/{{$labels.daemonset}}
+          summary: DaemonSet is missing pods
      - alert: K8SDaemonSetsNotScheduled
        expr: kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled
          > 0
@ -312,14 +294,15 @@ data:
            to run.
          summary: Daemonsets are not scheduled correctly
      - alert: PodFrequentlyRestarting
-        expr: increase(kube_pod_container_status_restarts[1h]) > 5
+        expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
        for: 10m
        labels:
          severity: warning
        annotations:
          description: Pod {{$labels.namespaces}}/{{$labels.pod}} is was restarted {{$value}}
            times within the last hour
-  kubelet.rules.yaml: |+
+          summary: Pod is restarting frequently
+  kubelet.rules.yaml: |
    groups:
    - name: kubelet.rules
      rules:
@ -342,15 +325,15 @@ data:
        annotations:
          description: '{{ $value }}% of Kubernetes nodes are not ready'
      - alert: K8SKubeletDown
-        expr: count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"}) * 100 > 3
+        expr: count(up{job="kubelet"} == 0) / count(up{job="kubelet"}) * 100 > 3
        for: 1h
        labels:
          severity: warning
        annotations:
          description: Prometheus failed to scrape {{ $value }}% of kubelets.
      - alert: K8SKubeletDown
-        expr: (absent(up{job="kubernetes-nodes"} == 1) or count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"}))
-          * 100 > 1
+        expr: (absent(up{job="kubelet"} == 1) or count(up{job="kubelet"} == 0) / count(up{job="kubelet"}))
+          * 100 > 10
        for: 1h
        labels:
          severity: critical
@ -367,7 +350,7 @@ data:
          description: Kubelet {{$labels.instance}} is running {{$value}} pods, close
            to the limit of 110
          summary: Kubelet is close to pod limit
-  kubernetes.rules.yaml: |+
+  kubernetes.rules.yaml: |
    groups:
    - name: kubernetes.rules
      rules:
@ -447,14 +430,28 @@ data:
        annotations:
          description: API server returns errors for {{ $value }}% of requests
      - alert: K8SApiserverDown
-        expr: absent(up{job="kubernetes-apiservers"} == 1)
+        expr: absent(up{job="apiserver"} == 1)
        for: 20m
        labels:
          severity: critical
        annotations:
          description: No API servers are reachable or all have disappeared from service
            discovery
-  node.rules.yaml: |+
+
+      - alert: K8sCertificateExpirationNotice
+        labels:
+          severity: warning
+        annotations:
+          description: Kubernetes API Certificate is expiring soon (less than 7 days)
+        expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="604800"}) > 0
+
+      - alert: K8sCertificateExpirationNotice
+        labels:
+          severity: critical
+        annotations:
+          description: Kubernetes API Certificate is expiring in less than 1 day
+        expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="86400"}) > 0
+  node.rules.yaml: |
    groups:
    - name: node.rules
      rules:
@ -476,7 +473,7 @@ data:
      - record: cluster:node_cpu:ratio
        expr: cluster:node_cpu:rate5m / count(sum(node_cpu) BY (instance, cpu))
      - alert: NodeExporterDown
-        expr: absent(up{kubernetes_name="node-exporter"} == 1)
+        expr: absent(up{job="node-exporter"} == 1)
        for: 10m
        labels:
          severity: warning
@ -499,7 +496,7 @@ data:
        annotations:
          description: device {{$labels.device}} on node {{$labels.instance}} is running
            full within the next 2 hours (mounted at {{$labels.mountpoint}})
-  prometheus.rules.yaml: |+
+  prometheus.rules.yaml: |
    groups:
    - name: prometheus.rules
      rules:
@ -544,3 +541,38 @@ data:
        annotations:
          description: Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected
            to any Alertmanagers
+      - alert: PrometheusTSDBReloadsFailing
+        expr: increase(prometheus_tsdb_reloads_failures_total[2h]) > 0
+        for: 12h
+        labels:
+          severity: warning
+        annotations:
+          description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}
+            reload failures over the last four hours.'
+          summary: Prometheus has issues reloading data blocks from disk
+      - alert: PrometheusTSDBCompactionsFailing
+        expr: increase(prometheus_tsdb_compactions_failed_total[2h]) > 0
+        for: 12h
+        labels:
+          severity: warning
+        annotations:
+          description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}
+            compaction failures over the last four hours.'
+          summary: Prometheus has issues compacting sample blocks
+      - alert: PrometheusTSDBWALCorruptions
+        expr: tsdb_wal_corruptions_total > 0
+        for: 4h
+        labels:
+          severity: warning
+        annotations:
+          description: '{{$labels.job}} at {{$labels.instance}} has a corrupted write-ahead
+            log (WAL).'
+          summary: Prometheus write-ahead log is corrupted
+      - alert: PrometheusNotIngestingSamples
+        expr: rate(prometheus_tsdb_head_samples_appended_total[5m]) <= 0
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: "Prometheus {{ $labels.namespace }}/{{ $labels.pod}} isn't ingesting samples."
+          summary: "Prometheus isn't ingesting samples"
--- a/addons/prometheus/service-account.yaml
+++ b/addons/prometheus/service-account.yaml
@ -0,0 +1,5 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: prometheus
+  namespace: monitoring
--- a/addons/prometheus/service.yaml
+++ b/addons/prometheus/service.yaml
@ -3,6 +3,8 @@ kind: Service
 metadata:
  name: prometheus
  namespace: monitoring
+  annotations:
+    prometheus.io/scrape: 'true'
 spec:
  type: ClusterIP
  selector:
--- a/aws/container-linux/kubernetes/README.md
+++ b/aws/container-linux/kubernetes/README.md
@ -11,10 +11,11 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster

 ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.9.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.10.1 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
-* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
+* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/)
+* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)

 ## Docs

--- a/aws/container-linux/kubernetes/apiserver.tf
+++ b/aws/container-linux/kubernetes/apiserver.tf
@ -0,0 +1,69 @@
+# kube-apiserver Network Load Balancer DNS Record
+resource "aws_route53_record" "apiserver" {
+  zone_id = "${var.dns_zone_id}"
+
+  name = "${format("%s.%s.", var.cluster_name, var.dns_zone)}"
+  type = "A"
+
+  # AWS recommends their special "alias" records for ELBs
+  alias {
+    name                   = "${aws_lb.apiserver.dns_name}"
+    zone_id                = "${aws_lb.apiserver.zone_id}"
+    evaluate_target_health = true
+  }
+}
+
+# Network Load Balancer for apiservers
+resource "aws_lb" "apiserver" {
+  name               = "${var.cluster_name}-apiserver"
+  load_balancer_type = "network"
+  internal           = false
+
+  subnets = ["${aws_subnet.public.*.id}"]
+
+  enable_cross_zone_load_balancing = true
+}
+
+# Forward HTTP traffic to controllers
+resource "aws_lb_listener" "apiserver-https" {
+  load_balancer_arn = "${aws_lb.apiserver.arn}"
+  protocol          = "TCP"
+  port              = "443"
+
+  default_action {
+    type             = "forward"
+    target_group_arn = "${aws_lb_target_group.controllers.arn}"
+  }
+}
+
+# Target group of controllers
+resource "aws_lb_target_group" "controllers" {
+  name        = "${var.cluster_name}-controllers"
+  vpc_id      = "${aws_vpc.network.id}"
+  target_type = "instance"
+
+  protocol = "TCP"
+  port     = 443
+
+  # Kubelet HTTP health check
+  health_check {
+    protocol = "TCP"
+    port     = 443
+
+    # NLBs required to use same healthy and unhealthy thresholds
+    healthy_threshold   = 3
+    unhealthy_threshold = 3
+
+    # Interval between health checks required to be 10 or 30
+    interval = 10
+  }
+}
+
+# Attach controller instances to apiserver NLB
+resource "aws_lb_target_group_attachment" "controllers" {
+  count = "${var.controller_count}"
+
+  target_group_arn = "${aws_lb_target_group.controllers.arn}"
+  target_id        = "${element(aws_instance.controllers.*.id, count.index)}"
+  port             = 443
+}
--- a/aws/container-linux/kubernetes/bootkube.tf
+++ b/aws/container-linux/kubernetes/bootkube.tf
@ -1,6 +1,6 @@
 # Self-hosted Kubernetes assets (kubeconfig, manifests)
 module "bootkube" {
-  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.10.0"
+  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=db36b92abced3c4b0af279adfd5ed4bf0cf8c39f"

  cluster_name          = "${var.cluster_name}"
  api_servers           = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
--- a/aws/container-linux/kubernetes/cl/controller.yaml.tmpl
+++ b/aws/container-linux/kubernetes/cl/controller.yaml.tmpl
@ -7,12 +7,13 @@ systemd:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
-            Environment="ETCD_IMAGE_TAG=v3.2.14"
+            Environment="ETCD_IMAGE_TAG=v3.3.3"
            Environment="ETCD_NAME=${etcd_name}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
            Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
            Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380"
+            Environment="ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381"
            Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}"
            Environment="ETCD_STRICT_RECONFIG_CHECK=true"
            Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
@ -66,6 +67,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -80,8 +82,10 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/master \
+          --node-labels=node-role.kubernetes.io/controller="true" \
          --pod-manifest-path=/etc/kubernetes/manifests \
-          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule
+          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=10
@ -107,29 +111,14 @@ storage:
      mode: 0644
      contents:
        inline: |
-          apiVersion: v1
-          kind: Config
-          clusters:
-          - name: local
-            cluster:
-              server: ${kubeconfig_server}
-              certificate-authority-data: ${kubeconfig_ca_cert}
-          users:
-          - name: kubelet
-            user:
-              client-certificate-data: ${kubeconfig_kubelet_cert}
-              client-key-data: ${kubeconfig_kubelet_key}
-          contexts:
-          - context:
-              cluster: local
-              user: kubelet
+          ${kubeconfig}
    - path: /etc/kubernetes/kubelet.env
      filesystem: root
      mode: 0644
      contents:
        inline: |
-          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.9.2
+          KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
+          KUBELET_IMAGE_TAG=v1.10.1
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -150,7 +139,7 @@ storage:
          # Move experimental manifests
          [ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
          BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
-          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.10.0}"
+          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.12.0}"
          BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
          exec /usr/bin/rkt run \
            --trust-keys-from-https \
--- a/aws/container-linux/kubernetes/controllers.tf
+++ b/aws/container-linux/kubernetes/controllers.tf
@ -28,7 +28,7 @@ resource "aws_instance" "controllers" {

  # storage
  root_block_device {
-    volume_type = "standard"
+    volume_type = "${var.disk_type}"
    volume_size = "${var.disk_size}"
  }

@ -36,6 +36,10 @@ resource "aws_instance" "controllers" {
  associate_public_ip_address = true
  subnet_id                   = "${element(aws_subnet.public.*.id, count.index)}"
  vpc_security_group_ids      = ["${aws_security_group.controller.id}"]
+
+  lifecycle {
+    ignore_changes = ["ami"]
+  }
 }

 # Controller Container Linux Config
@ -52,13 +56,10 @@ data "template_file" "controller_config" {
    # etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
    etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"

-    k8s_dns_service_ip      = "${cidrhost(var.service_cidr, 10)}"
-    ssh_authorized_key      = "${var.ssh_authorized_key}"
-    cluster_domain_suffix   = "${var.cluster_domain_suffix}"
-    kubeconfig_ca_cert      = "${module.bootkube.ca_cert}"
-    kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
-    kubeconfig_kubelet_key  = "${module.bootkube.kubelet_key}"
-    kubeconfig_server       = "${module.bootkube.server}"
+    kubeconfig            = "${indent(10, module.bootkube.kubeconfig)}"
+    ssh_authorized_key    = "${var.ssh_authorized_key}"
+    k8s_dns_service_ip    = "${cidrhost(var.service_cidr, 10)}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
  }
 }

@ -77,186 +78,5 @@ data "ct_config" "controller_ign" {
  count        = "${var.controller_count}"
  content      = "${element(data.template_file.controller_config.*.rendered, count.index)}"
  pretty_print = false
-}
-
-# Security Group (instance firewall)
-
-resource "aws_security_group" "controller" {
-  name        = "${var.cluster_name}-controller"
-  description = "${var.cluster_name} controller security group"
-
-  vpc_id = "${aws_vpc.network.id}"
-
-  tags = "${map("Name", "${var.cluster_name}-controller")}"
-}
-
-resource "aws_security_group_rule" "controller-icmp" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type        = "ingress"
-  protocol    = "icmp"
-  from_port   = 0
-  to_port     = 0
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "controller-ssh" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 22
-  to_port     = 22
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "controller-apiserver" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 443
-  to_port     = 443
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "controller-etcd" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 2379
-  to_port   = 2380
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-flannel" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = "udp"
-  from_port                = 8472
-  to_port                  = 8472
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-flannel-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = "udp"
-  from_port = 8472
-  to_port   = 8472
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-node-exporter" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 9100
-  to_port                  = 9100
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-kubelet-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 10250
-  to_port   = 10250
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-kubelet-read" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 10255
-  to_port                  = 10255
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-kubelet-read-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 10255
-  to_port   = 10255
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-bgp" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 179
-  to_port                  = 179
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-bgp-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 179
-  to_port   = 179
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-ipip" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = 4
-  from_port                = 0
-  to_port                  = 0
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-ipip-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = 4
-  from_port = 0
-  to_port   = 0
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-ipip-legacy" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type                     = "ingress"
-  protocol                 = 94
-  from_port                = 0
-  to_port                  = 0
-  source_security_group_id = "${aws_security_group.worker.id}"
-}
-
-resource "aws_security_group_rule" "controller-ipip-legacy-self" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type      = "ingress"
-  protocol  = 94
-  from_port = 0
-  to_port   = 0
-  self      = true
-}
-
-resource "aws_security_group_rule" "controller-egress" {
-  security_group_id = "${aws_security_group.controller.id}"
-
-  type             = "egress"
-  protocol         = "-1"
-  from_port        = 0
-  to_port          = 0
-  cidr_blocks      = ["0.0.0.0/0"]
-  ipv6_cidr_blocks = ["::/0"]
+  snippets     = ["${var.controller_clc_snippets}"]
 }
--- a/aws/container-linux/kubernetes/elb.tf
+++ b/aws/container-linux/kubernetes/elb.tf
@ -1,43 +0,0 @@
-# kube-apiserver Network Load Balancer DNS Record
-resource "aws_route53_record" "apiserver" {
-  zone_id = "${var.dns_zone_id}"
-
-  name = "${format("%s.%s.", var.cluster_name, var.dns_zone)}"
-  type = "A"
-
-  # AWS recommends their special "alias" records for ELBs
-  alias {
-    name                   = "${aws_elb.apiserver.dns_name}"
-    zone_id                = "${aws_elb.apiserver.zone_id}"
-    evaluate_target_health = true
-  }
-}
-
-# Controller Network Load Balancer
-resource "aws_elb" "apiserver" {
-  name            = "${var.cluster_name}-apiserver"
-  subnets         = ["${aws_subnet.public.*.id}"]
-  security_groups = ["${aws_security_group.controller.id}"]
-
-  listener {
-    lb_port           = 443
-    lb_protocol       = "tcp"
-    instance_port     = 443
-    instance_protocol = "tcp"
-  }
-
-  instances = ["${aws_instance.controllers.*.id}"]
-
-  # Kubelet HTTP health check
-  health_check {
-    target              = "SSL:443"
-    healthy_threshold   = 2
-    unhealthy_threshold = 4
-    timeout             = 5
-    interval            = 6
-  }
-
-  idle_timeout                = 3600
-  connection_draining         = true
-  connection_draining_timeout = 300
-}
--- a/aws/container-linux/kubernetes/ingress.tf
+++ b/aws/container-linux/kubernetes/ingress.tf
@ -1,32 +0,0 @@
-# Ingress Network Load Balancer
-resource "aws_elb" "ingress" {
-  name            = "${var.cluster_name}-ingress"
-  subnets         = ["${aws_subnet.public.*.id}"]
-  security_groups = ["${aws_security_group.worker.id}"]
-
-  listener {
-    lb_port           = 80
-    lb_protocol       = "tcp"
-    instance_port     = 80
-    instance_protocol = "tcp"
-  }
-
-  listener {
-    lb_port           = 443
-    lb_protocol       = "tcp"
-    instance_port     = 443
-    instance_protocol = "tcp"
-  }
-
-  # Ingress Controller HTTP health check
-  health_check {
-    target              = "HTTP:10254/healthz"
-    healthy_threshold   = 2
-    unhealthy_threshold = 4
-    timeout             = 5
-    interval            = 6
-  }
-
-  connection_draining         = true
-  connection_draining_timeout = 300
-}
--- a/aws/container-linux/kubernetes/outputs.tf
+++ b/aws/container-linux/kubernetes/outputs.tf
@ -1,4 +1,25 @@
 output "ingress_dns_name" {
-  value       = "${aws_elb.ingress.dns_name}"
-  description = "DNS name of the ELB for distributing traffic to Ingress controllers"
+  value       = "${module.workers.ingress_dns_name}"
+  description = "DNS name of the network load balancer for distributing traffic to Ingress controllers"
+}
+
+# Outputs for worker pools
+
+output "vpc_id" {
+  value       = "${aws_vpc.network.id}"
+  description = "ID of the VPC for creating worker instances"
+}
+
+output "subnet_ids" {
+  value       = ["${aws_subnet.public.*.id}"]
+  description = "List of subnet IDs for creating worker instances"
+}
+
+output "worker_security_groups" {
+  value       = ["${aws_security_group.worker.id}"]
+  description = "List of worker security group IDs"
+}
+
+output "kubeconfig" {
+  value = "${module.bootkube.kubeconfig}"
 }
--- a/aws/container-linux/kubernetes/require.tf
+++ b/aws/container-linux/kubernetes/require.tf
@ -5,7 +5,7 @@ terraform {
 }

 provider "aws" {
-  version = "~> 1.0"
+  version = "~> 1.11"
 }

 provider "local" {
--- a/aws/container-linux/kubernetes/security.tf
+++ b/aws/container-linux/kubernetes/security.tf
@ -0,0 +1,395 @@
+# Security Groups (instance firewalls)
+
+# Controller security group
+
+resource "aws_security_group" "controller" {
+  name        = "${var.cluster_name}-controller"
+  description = "${var.cluster_name} controller security group"
+
+  vpc_id = "${aws_vpc.network.id}"
+
+  tags = "${map("Name", "${var.cluster_name}-controller")}"
+}
+
+resource "aws_security_group_rule" "controller-icmp" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type        = "ingress"
+  protocol    = "icmp"
+  from_port   = 0
+  to_port     = 0
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "controller-ssh" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 22
+  to_port     = 22
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "controller-apiserver" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 443
+  to_port     = 443
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "controller-etcd" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 2379
+  to_port   = 2380
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-etcd-metrics" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 2381
+  to_port                  = 2381
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-flannel" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = "udp"
+  from_port                = 8472
+  to_port                  = 8472
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-flannel-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = "udp"
+  from_port = 8472
+  to_port   = 8472
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-node-exporter" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 9100
+  to_port                  = 9100
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-kubelet-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 10250
+  to_port   = 10250
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-kubelet-read" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 10255
+  to_port                  = 10255
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-kubelet-read-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 10255
+  to_port   = 10255
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-bgp" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 179
+  to_port                  = 179
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-bgp-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 179
+  to_port   = 179
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-ipip" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = 4
+  from_port                = 0
+  to_port                  = 0
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-ipip-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = 4
+  from_port = 0
+  to_port   = 0
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-ipip-legacy" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type                     = "ingress"
+  protocol                 = 94
+  from_port                = 0
+  to_port                  = 0
+  source_security_group_id = "${aws_security_group.worker.id}"
+}
+
+resource "aws_security_group_rule" "controller-ipip-legacy-self" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type      = "ingress"
+  protocol  = 94
+  from_port = 0
+  to_port   = 0
+  self      = true
+}
+
+resource "aws_security_group_rule" "controller-egress" {
+  security_group_id = "${aws_security_group.controller.id}"
+
+  type             = "egress"
+  protocol         = "-1"
+  from_port        = 0
+  to_port          = 0
+  cidr_blocks      = ["0.0.0.0/0"]
+  ipv6_cidr_blocks = ["::/0"]
+}
+
+# Worker security group
+
+resource "aws_security_group" "worker" {
+  name        = "${var.cluster_name}-worker"
+  description = "${var.cluster_name} worker security group"
+
+  vpc_id = "${aws_vpc.network.id}"
+
+  tags = "${map("Name", "${var.cluster_name}-worker")}"
+}
+
+resource "aws_security_group_rule" "worker-icmp" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type        = "ingress"
+  protocol    = "icmp"
+  from_port   = 0
+  to_port     = 0
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "worker-ssh" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 22
+  to_port     = 22
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "worker-http" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 80
+  to_port     = 80
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "worker-https" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 443
+  to_port     = 443
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "worker-flannel" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = "udp"
+  from_port                = 8472
+  to_port                  = 8472
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-flannel-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = "udp"
+  from_port = 8472
+  to_port   = 8472
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-node-exporter" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 9100
+  to_port   = 9100
+  self      = true
+}
+
+resource "aws_security_group_rule" "ingress-health" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type        = "ingress"
+  protocol    = "tcp"
+  from_port   = 10254
+  to_port     = 10254
+  cidr_blocks = ["0.0.0.0/0"]
+}
+
+resource "aws_security_group_rule" "worker-kubelet" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 10250
+  to_port                  = 10250
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-kubelet-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 10250
+  to_port   = 10250
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-kubelet-read" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 10255
+  to_port                  = 10255
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-kubelet-read-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 10255
+  to_port   = 10255
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-bgp" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = "tcp"
+  from_port                = 179
+  to_port                  = 179
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-bgp-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = "tcp"
+  from_port = 179
+  to_port   = 179
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-ipip" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = 4
+  from_port                = 0
+  to_port                  = 0
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-ipip-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = 4
+  from_port = 0
+  to_port   = 0
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-ipip-legacy" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type                     = "ingress"
+  protocol                 = 94
+  from_port                = 0
+  to_port                  = 0
+  source_security_group_id = "${aws_security_group.controller.id}"
+}
+
+resource "aws_security_group_rule" "worker-ipip-legacy-self" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type      = "ingress"
+  protocol  = 94
+  from_port = 0
+  to_port   = 0
+  self      = true
+}
+
+resource "aws_security_group_rule" "worker-egress" {
+  security_group_id = "${aws_security_group.worker.id}"
+
+  type             = "egress"
+  protocol         = "-1"
+  from_port        = 0
+  to_port          = 0
+  cidr_blocks      = ["0.0.0.0/0"]
+  ipv6_cidr_blocks = ["::/0"]
+}
--- a/aws/container-linux/kubernetes/ssh.tf
+++ b/aws/container-linux/kubernetes/ssh.tf
@ -1,5 +1,5 @@
-# Secure copy etcd TLS assets and kubeconfig to controllers. Activates kubelet.service
-resource "null_resource" "copy-secrets" {
+# Secure copy etcd TLS assets to controllers.
+resource "null_resource" "copy-controller-secrets" {
  count = "${var.controller_count}"

  connection {
@ -9,11 +9,6 @@ resource "null_resource" "copy-secrets" {
    timeout = "15m"
  }

-  provisioner "file" {
-    content     = "${module.bootkube.kubeconfig}"
-    destination = "$HOME/kubeconfig"
-  }
-
  provisioner "file" {
    content     = "${module.bootkube.etcd_ca_cert}"
    destination = "$HOME/etcd-client-ca.crt"
@ -61,7 +56,6 @@ resource "null_resource" "copy-secrets" {
      "sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
      "sudo chown -R etcd:etcd /etc/ssl/etcd",
      "sudo chmod -R 500 /etc/ssl/etcd",
-      "sudo mv /home/core/kubeconfig /etc/kubernetes/kubeconfig",
    ]
  }
 }
@ -69,7 +63,12 @@ resource "null_resource" "copy-secrets" {
 # Secure copy bootkube assets to ONE controller and start bootkube to perform
 # one-time self-hosted cluster bootstrapping.
 resource "null_resource" "bootkube-start" {
-  depends_on = ["module.bootkube", "null_resource.copy-secrets", "aws_route53_record.apiserver"]
+  depends_on = [
+    "module.bootkube",
+    "module.workers",
+    "aws_route53_record.apiserver",
+    "null_resource.copy-controller-secrets",
+  ]

  connection {
    type    = "ssh"
@ -85,7 +84,7 @@ resource "null_resource" "bootkube-start" {

  provisioner "remote-exec" {
    inline = [
-      "sudo mv /home/core/assets /opt/bootkube",
+      "sudo mv $HOME/assets /opt/bootkube",
      "sudo systemctl start bootkube",
    ]
  }
--- a/aws/container-linux/kubernetes/variables.tf
+++ b/aws/container-linux/kubernetes/variables.tf
@ -1,21 +1,44 @@
 variable "cluster_name" {
  type        = "string"
-  description = "Cluster name"
+  description = "Unique cluster name (prepended to dns_zone)"
 }

+# AWS
+
 variable "dns_zone" {
  type        = "string"
-  description = "AWS DNS Zone (e.g. aws.dghubble.io)"
+  description = "AWS Route53 DNS Zone (e.g. aws.example.com)"
 }

 variable "dns_zone_id" {
  type        = "string"
-  description = "AWS DNS Zone ID (e.g. Z3PAABBCFAKEC0)"
+  description = "AWS Route53 DNS Zone ID (e.g. Z3PAABBCFAKEC0)"
 }

-variable "ssh_authorized_key" {
+# instances
+
+variable "controller_count" {
  type        = "string"
-  description = "SSH public key for user 'core'"
+  default     = "1"
+  description = "Number of controllers (i.e. masters)"
+}
+
+variable "worker_count" {
+  type        = "string"
+  default     = "1"
+  description = "Number of workers"
+}
+
+variable "controller_type" {
+  type        = "string"
+  default     = "t2.small"
+  description = "EC2 instance type for controllers"
+}
+
+variable "worker_type" {
+  type        = "string"
+  default     = "t2.small"
+  description = "EC2 instance type for workers"
 }

 variable "os_channel" {
@ -27,41 +50,34 @@ variable "os_channel" {
 variable "disk_size" {
  type        = "string"
  default     = "40"
-  description = "The size of the disk in Gigabytes"
+  description = "Size of the EBS volume in GB"
 }

-variable "host_cidr" {
-  description = "CIDR IPv4 range to assign to EC2 nodes"
+variable "disk_type" {
  type        = "string"
-  default     = "10.0.0.0/16"
+  default     = "gp2"
+  description = "Type of the EBS volume (e.g. standard, gp2, io1)"
 }

-variable "controller_count" {
+variable "controller_clc_snippets" {
+  type        = "list"
+  description = "Controller Container Linux Config snippets"
+  default     = []
+}
+
+variable "worker_clc_snippets" {
+  type        = "list"
+  description = "Worker Container Linux Config snippets"
+  default     = []
+}
+
+# configuration
+
+variable "ssh_authorized_key" {
  type        = "string"
-  default     = "1"
-  description = "Number of controllers"
+  description = "SSH public key for user 'core'"
 }

-variable "controller_type" {
-  type        = "string"
-  default     = "t2.small"
-  description = "Controller EC2 instance type"
-}
-
-variable "worker_count" {
-  type        = "string"
-  default     = "1"
-  description = "Number of workers"
-}
-
-variable "worker_type" {
-  type        = "string"
-  default     = "t2.small"
-  description = "Worker EC2 instance type"
-}
-
-# bootkube assets
-
 variable "asset_dir" {
  description = "Path to a directory where generated assets should be placed (contains secrets)"
  type        = "string"
@ -79,6 +95,12 @@ variable "network_mtu" {
  default     = "1480"
 }

+variable "host_cidr" {
+  description = "CIDR IPv4 range to assign to EC2 nodes"
+  type        = "string"
+  default     = "10.0.0.0/16"
+}
+
 variable "pod_cidr" {
  description = "CIDR IPv4 range to assign Kubernetes pods"
  type        = "string"
--- a/aws/container-linux/kubernetes/workers.tf
+++ b/aws/container-linux/kubernetes/workers.tf
@ -1,275 +1,20 @@
-# Workers AutoScaling Group
-resource "aws_autoscaling_group" "workers" {
-  name           = "${var.cluster_name}-worker ${aws_launch_configuration.worker.name}"
-  load_balancers = ["${aws_elb.ingress.id}"]
+module "workers" {
+  source = "workers"
+  name   = "${var.cluster_name}"

-  # count
-  desired_capacity          = "${var.worker_count}"
-  min_size                  = "${var.worker_count}"
-  max_size                  = "${var.worker_count + 2}"
-  default_cooldown          = 30
-  health_check_grace_period = 30
-
-  # network
-  vpc_zone_identifier = ["${aws_subnet.public.*.id}"]
-
-  # template
-  launch_configuration = "${aws_launch_configuration.worker.name}"
-
-  lifecycle {
-    # override the default destroy and replace update behavior
-    create_before_destroy = true
-    ignore_changes        = ["image_id"]
-  }
-
-  tags = [{
-    key                 = "Name"
-    value               = "${var.cluster_name}-worker"
-    propagate_at_launch = true
-  }]
-}
-
-# Worker template
-resource "aws_launch_configuration" "worker" {
-  image_id      = "${data.aws_ami.coreos.image_id}"
-  instance_type = "${var.worker_type}"
-
-  user_data = "${data.ct_config.worker_ign.rendered}"
-
-  # storage
-  root_block_device {
-    volume_type = "standard"
-    volume_size = "${var.disk_size}"
-  }
-
-  # network
+  # AWS
+  vpc_id          = "${aws_vpc.network.id}"
+  subnet_ids      = ["${aws_subnet.public.*.id}"]
  security_groups = ["${aws_security_group.worker.id}"]
+  count           = "${var.worker_count}"
+  instance_type   = "${var.worker_type}"
+  os_channel      = "${var.os_channel}"
+  disk_size       = "${var.disk_size}"

-  lifecycle {
-    // Override the default destroy and replace update behavior
-    create_before_destroy = true
-  }
-}
-
-# Worker Container Linux Config
-data "template_file" "worker_config" {
-  template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
-
-  vars = {
-    k8s_dns_service_ip      = "${cidrhost(var.service_cidr, 10)}"
-    k8s_etcd_service_ip     = "${cidrhost(var.service_cidr, 15)}"
-    ssh_authorized_key      = "${var.ssh_authorized_key}"
-    cluster_domain_suffix   = "${var.cluster_domain_suffix}"
-    kubeconfig_ca_cert      = "${module.bootkube.ca_cert}"
-    kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
-    kubeconfig_kubelet_key  = "${module.bootkube.kubelet_key}"
-    kubeconfig_server       = "${module.bootkube.server}"
-  }
-}
-
-data "ct_config" "worker_ign" {
-  content      = "${data.template_file.worker_config.rendered}"
-  pretty_print = false
-}
-
-# Security Group (instance firewall)
-
-resource "aws_security_group" "worker" {
-  name        = "${var.cluster_name}-worker"
-  description = "${var.cluster_name} worker security group"
-
-  vpc_id = "${aws_vpc.network.id}"
-
-  tags = "${map("Name", "${var.cluster_name}-worker")}"
-}
-
-resource "aws_security_group_rule" "worker-icmp" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type        = "ingress"
-  protocol    = "icmp"
-  from_port   = 0
-  to_port     = 0
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "worker-ssh" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 22
-  to_port     = 22
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "worker-http" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 80
-  to_port     = 80
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "worker-https" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 443
-  to_port     = 443
-  cidr_blocks = ["0.0.0.0/0"]
-}
-
-resource "aws_security_group_rule" "worker-flannel" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = "udp"
-  from_port                = 8472
-  to_port                  = 8472
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-flannel-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = "udp"
-  from_port = 8472
-  to_port   = 8472
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-node-exporter" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type        = "ingress"
-  protocol    = "tcp"
-  from_port   = 9100
-  to_port     = 9100
-  self = true
-}
-
-resource "aws_security_group_rule" "worker-kubelet" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 10250
-  to_port                  = 10250
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-kubelet-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 10250
-  to_port   = 10250
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-kubelet-read" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 10255
-  to_port                  = 10255
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-kubelet-read-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 10255
-  to_port   = 10255
-  self      = true
-}
-
-resource "aws_security_group_rule" "ingress-health-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 10254
-  to_port   = 10254
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-bgp" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = "tcp"
-  from_port                = 179
-  to_port                  = 179
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-bgp-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = "tcp"
-  from_port = 179
-  to_port   = 179
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-ipip" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = 4
-  from_port                = 0
-  to_port                  = 0
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-ipip-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = 4
-  from_port = 0
-  to_port   = 0
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-ipip-legacy" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type                     = "ingress"
-  protocol                 = 94
-  from_port                = 0
-  to_port                  = 0
-  source_security_group_id = "${aws_security_group.controller.id}"
-}
-
-resource "aws_security_group_rule" "worker-ipip-legacy-self" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type      = "ingress"
-  protocol  = 94
-  from_port = 0
-  to_port   = 0
-  self      = true
-}
-
-resource "aws_security_group_rule" "worker-egress" {
-  security_group_id = "${aws_security_group.worker.id}"
-
-  type             = "egress"
-  protocol         = "-1"
-  from_port        = 0
-  to_port          = 0
-  cidr_blocks      = ["0.0.0.0/0"]
-  ipv6_cidr_blocks = ["::/0"]
+  # configuration
+  kubeconfig            = "${module.bootkube.kubeconfig}"
+  ssh_authorized_key    = "${var.ssh_authorized_key}"
+  service_cidr          = "${var.service_cidr}"
+  cluster_domain_suffix = "${var.cluster_domain_suffix}"
+  clc_snippets          = "${var.worker_clc_snippets}"
 }
--- a/aws/container-linux/kubernetes/workers/ami.tf
+++ b/aws/container-linux/kubernetes/workers/ami.tf
@ -0,0 +1,19 @@
+data "aws_ami" "coreos" {
+  most_recent = true
+  owners      = ["595879546273"]
+
+  filter {
+    name   = "architecture"
+    values = ["x86_64"]
+  }
+
+  filter {
+    name   = "virtualization-type"
+    values = ["hvm"]
+  }
+
+  filter {
+    name   = "name"
+    values = ["CoreOS-${var.os_channel}-*"]
+  }
+}
--- a/aws/container-linux/kubernetes/workers/cl/worker.yaml.tmpl
+++ b/aws/container-linux/kubernetes/workers/cl/worker.yaml.tmpl
@ -39,9 +39,8 @@ systemd:
        ExecStartPre=/bin/mkdir -p /opt/cni/bin
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
-        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
-        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -56,7 +55,8 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/node \
-          --pod-manifest-path=/etc/kubernetes/manifests
+          --pod-manifest-path=/etc/kubernetes/manifests \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=5
@ -81,29 +81,14 @@ storage:
      mode: 0644
      contents:
        inline: |
-          apiVersion: v1
-          kind: Config
-          clusters:
-          - name: local
-            cluster:
-              server: ${kubeconfig_server}
-              certificate-authority-data: ${kubeconfig_ca_cert}
-          users:
-          - name: kubelet
-            user:
-              client-certificate-data: ${kubeconfig_kubelet_cert}
-              client-key-data: ${kubeconfig_kubelet_key}
-          contexts:
-          - context:
-              cluster: local
-              user: kubelet
+          ${kubeconfig}
    - path: /etc/kubernetes/kubelet.env
      filesystem: root
      mode: 0644
      contents:
        inline: |
-          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.9.2
+          KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
+          KUBELET_IMAGE_TAG=v1.10.1
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -121,7 +106,7 @@ storage:
            --volume config,kind=host,source=/etc/kubernetes \
            --mount volume=config,target=/etc/kubernetes \
            --insecure-options=image \
-            docker://gcr.io/google_containers/hyperkube:v1.9.2 \
+            docker://k8s.gcr.io/hyperkube:v1.10.1 \
            --net=host \
            --dns=host \
            --exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
--- a/aws/container-linux/kubernetes/workers/ingress.tf
+++ b/aws/container-linux/kubernetes/workers/ingress.tf
@ -0,0 +1,82 @@
+# Network Load Balancer for Ingress
+resource "aws_lb" "ingress" {
+  name               = "${var.name}-ingress"
+  load_balancer_type = "network"
+  internal           = false
+
+  subnets = ["${var.subnet_ids}"]
+
+  enable_cross_zone_load_balancing = true
+}
+
+# Forward HTTP traffic to workers
+resource "aws_lb_listener" "ingress-http" {
+  load_balancer_arn = "${aws_lb.ingress.arn}"
+  protocol          = "TCP"
+  port              = 80
+
+  default_action {
+    type             = "forward"
+    target_group_arn = "${aws_lb_target_group.workers-http.arn}"
+  }
+}
+
+# Forward HTTPS traffic to workers
+resource "aws_lb_listener" "ingress-https" {
+  load_balancer_arn = "${aws_lb.ingress.arn}"
+  protocol          = "TCP"
+  port              = 443
+
+  default_action {
+    type             = "forward"
+    target_group_arn = "${aws_lb_target_group.workers-https.arn}"
+  }
+}
+
+# Network Load Balancer target groups of instances
+
+resource "aws_lb_target_group" "workers-http" {
+  name        = "${var.name}-workers-http"
+  vpc_id      = "${var.vpc_id}"
+  target_type = "instance"
+
+  protocol = "TCP"
+  port     = 80
+
+  # Ingress Controller HTTP health check
+  health_check {
+    protocol = "HTTP"
+    port     = 10254
+    path     = "/healthz"
+
+    # NLBs required to use same healthy and unhealthy thresholds
+    healthy_threshold   = 3
+    unhealthy_threshold = 3
+
+    # Interval between health checks required to be 10 or 30
+    interval = 10
+  }
+}
+
+resource "aws_lb_target_group" "workers-https" {
+  name        = "${var.name}-workers-https"
+  vpc_id      = "${var.vpc_id}"
+  target_type = "instance"
+
+  protocol = "TCP"
+  port     = 443
+
+  # Ingress Controller HTTP health check
+  health_check {
+    protocol = "HTTP"
+    port     = 10254
+    path     = "/healthz"
+
+    # NLBs required to use same healthy and unhealthy thresholds
+    healthy_threshold   = 3
+    unhealthy_threshold = 3
+
+    # Interval between health checks required to be 10 or 30
+    interval = 10
+  }
+}
--- a/aws/container-linux/kubernetes/workers/outputs.tf
+++ b/aws/container-linux/kubernetes/workers/outputs.tf
@ -0,0 +1,4 @@
+output "ingress_dns_name" {
+  value       = "${aws_lb.ingress.dns_name}"
+  description = "DNS name of the network load balancer for distributing traffic to Ingress controllers"
+}
--- a/aws/container-linux/kubernetes/workers/variables.tf
+++ b/aws/container-linux/kubernetes/workers/variables.tf
@ -0,0 +1,87 @@
+variable "name" {
+  type        = "string"
+  description = "Unique name for the worker pool"
+}
+
+# AWS
+
+variable "vpc_id" {
+  type        = "string"
+  description = "Must be set to `vpc_id` output by cluster"
+}
+
+variable "subnet_ids" {
+  type        = "list"
+  description = "Must be set to `subnet_ids` output by cluster"
+}
+
+variable "security_groups" {
+  type        = "list"
+  description = "Must be set to `worker_security_groups` output by cluster"
+}
+
+# instances
+
+variable "count" {
+  type        = "string"
+  default     = "1"
+  description = "Number of instances"
+}
+
+variable "instance_type" {
+  type        = "string"
+  default     = "t2.small"
+  description = "EC2 instance type"
+}
+
+variable "os_channel" {
+  type        = "string"
+  default     = "stable"
+  description = "Container Linux AMI channel (stable, beta, alpha)"
+}
+
+variable "disk_size" {
+  type        = "string"
+  default     = "40"
+  description = "Size of the EBS volume in GB"
+}
+
+variable "disk_type" {
+  type        = "string"
+  default     = "gp2"
+  description = "Type of the EBS volume (e.g. standard, gp2, io1)"
+}
+
+variable "clc_snippets" {
+  type        = "list"
+  description = "Container Linux Config snippets"
+  default     = []
+}
+
+# configuration
+
+variable "kubeconfig" {
+  type        = "string"
+  description = "Must be set to `kubeconfig` output by cluster"
+}
+
+variable "ssh_authorized_key" {
+  type        = "string"
+  description = "SSH public key for user 'core'"
+}
+
+variable "service_cidr" {
+  description = <<EOD
+CIDR IPv4 range to assign Kubernetes services.
+The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for kube-dns.
+EOD
+
+  type    = "string"
+  default = "10.3.0.0/16"
+}
+
+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
--- a/aws/container-linux/kubernetes/workers/workers.tf
+++ b/aws/container-linux/kubernetes/workers/workers.tf
@ -0,0 +1,75 @@
+# Workers AutoScaling Group
+resource "aws_autoscaling_group" "workers" {
+  name = "${var.name}-worker ${aws_launch_configuration.worker.name}"
+
+  # count
+  desired_capacity          = "${var.count}"
+  min_size                  = "${var.count}"
+  max_size                  = "${var.count + 2}"
+  default_cooldown          = 30
+  health_check_grace_period = 30
+
+  # network
+  vpc_zone_identifier = ["${var.subnet_ids}"]
+
+  # template
+  launch_configuration = "${aws_launch_configuration.worker.name}"
+
+  # target groups to which instances should be added
+  target_group_arns = [
+    "${aws_lb_target_group.workers-http.id}",
+    "${aws_lb_target_group.workers-https.id}",
+  ]
+
+  lifecycle {
+    # override the default destroy and replace update behavior
+    create_before_destroy = true
+  }
+
+  tags = [{
+    key                 = "Name"
+    value               = "${var.name}-worker"
+    propagate_at_launch = true
+  }]
+}
+
+# Worker template
+resource "aws_launch_configuration" "worker" {
+  image_id      = "${data.aws_ami.coreos.image_id}"
+  instance_type = "${var.instance_type}"
+
+  user_data = "${data.ct_config.worker_ign.rendered}"
+
+  # storage
+  root_block_device {
+    volume_type = "${var.disk_type}"
+    volume_size = "${var.disk_size}"
+  }
+
+  # network
+  security_groups = ["${var.security_groups}"]
+
+  lifecycle {
+    // Override the default destroy and replace update behavior
+    create_before_destroy = true
+    ignore_changes        = ["image_id"]
+  }
+}
+
+# Worker Container Linux Config
+data "template_file" "worker_config" {
+  template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
+
+  vars = {
+    kubeconfig            = "${indent(10, var.kubeconfig)}"
+    ssh_authorized_key    = "${var.ssh_authorized_key}"
+    k8s_dns_service_ip    = "${cidrhost(var.service_cidr, 10)}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
+  }
+}
+
+data "ct_config" "worker_ign" {
+  content      = "${data.template_file.worker_config.rendered}"
+  pretty_print = false
+  snippets     = ["${var.clc_snippets}"]
+}
--- a/bare-metal/container-linux/kubernetes/README.md
+++ b/bare-metal/container-linux/kubernetes/README.md
@ -11,10 +11,10 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster

 ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.9.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.10.1 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
-* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
+* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)

 ## Docs

--- a/bare-metal/container-linux/kubernetes/bootkube.tf
+++ b/bare-metal/container-linux/kubernetes/bootkube.tf
@ -1,6 +1,6 @@
 # Self-hosted Kubernetes assets (kubeconfig, manifests)
 module "bootkube" {
-  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.10.0"
+  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=db36b92abced3c4b0af279adfd5ed4bf0cf8c39f"

  cluster_name          = "${var.cluster_name}"
  api_servers           = ["${var.k8s_domain_name}"]
--- a/bare-metal/container-linux/kubernetes/cl/container-linux-install.yaml.tmpl
+++ b/bare-metal/container-linux/kubernetes/cl/container-linux-install.yaml.tmpl
@ -12,6 +12,16 @@ systemd:
        ExecStart=/opt/installer
        [Install]
        WantedBy=multi-user.target
+    # Avoid using the standard SSH port so terraform apply cannot SSH until
+    # post-install. But admins may SSH to debug disk install problems.
+    # After install, sshd will use port 22 and users/terraform can connect.
+    - name: sshd.socket
+      dropins:
+        - name: 10-sshd-port.conf
+          contents: |
+            [Socket]
+            ListenStream=
+            ListenStream=2222
 storage:
  files:
    - path: /opt/installer
@ -32,11 +42,6 @@ storage:
          systemctl reboot
 passwd:
  users:
-    # Avoid using standard name "core" so terraform apply cannot SSH until post-install.
-    - name: debug
-      create:
-        groups:
-          - sudo
-          - docker
+    - name: core
      ssh_authorized_keys:
-        - {{.ssh_authorized_key}}
+        - "${ssh_authorized_key}"
--- a/bare-metal/container-linux/kubernetes/cl/controller.yaml.tmpl
+++ b/bare-metal/container-linux/kubernetes/cl/controller.yaml.tmpl
@ -7,12 +7,13 @@ systemd:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
-            Environment="ETCD_IMAGE_TAG=v3.2.14"
+            Environment="ETCD_IMAGE_TAG=v3.3.3"
            Environment="ETCD_NAME=${etcd_name}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
            Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
            Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380"
+            Environment="ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381"
            Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}"
            Environment="ETCD_STRICT_RECONFIG_CHECK=true"
            Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
@ -90,6 +91,7 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/master \
+          --node-labels=node-role.kubernetes.io/controller="true" \
          --pod-manifest-path=/etc/kubernetes/manifests \
          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
@ -116,8 +118,8 @@ storage:
      mode: 0644
      contents:
        inline: |
-          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.9.2
+          KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
+          KUBELET_IMAGE_TAG=v1.10.1
    - path: /etc/hostname
      filesystem: root
      mode: 0644
@ -144,7 +146,7 @@ storage:
          # Move experimental manifests
          [ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
          BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
-          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.10.0}"
+          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.12.0}"
          BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
          exec /usr/bin/rkt run \
            --trust-keys-from-https \
--- a/bare-metal/container-linux/kubernetes/cl/worker.yaml.tmpl
+++ b/bare-metal/container-linux/kubernetes/cl/worker.yaml.tmpl
@ -47,8 +47,6 @@ systemd:
        ExecStartPre=/bin/mkdir -p /opt/cni/bin
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
-        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
-        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
@ -81,8 +79,8 @@ storage:
      mode: 0644
      contents:
        inline: |
-          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.9.2
+          KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
+          KUBELET_IMAGE_TAG=v1.10.1
    - path: /etc/hostname
      filesystem: root
      mode: 0644
--- a/bare-metal/container-linux/kubernetes/groups.tf
+++ b/bare-metal/container-linux/kubernetes/groups.tf
@ -8,10 +8,6 @@ resource "matchbox_group" "container-linux-install" {
  selector {
    mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}"
  }
-
-  metadata {
-    ssh_authorized_key = "${var.ssh_authorized_key}"
-  }
 }

 resource "matchbox_group" "controller" {
--- a/bare-metal/container-linux/kubernetes/profiles.tf
+++ b/bare-metal/container-linux/kubernetes/profiles.tf
@ -32,6 +32,7 @@ data "template_file" "container-linux-install-configs" {
    ignition_endpoint       = "${format("%s/ignition", var.matchbox_http_endpoint)}"
    install_disk            = "${var.install_disk}"
    container_linux_oem     = "${var.container_linux_oem}"
+    ssh_authorized_key      = "${var.ssh_authorized_key}"

    # only cached-container-linux profile adds -b baseurl
    baseurl_flag = ""
@ -73,6 +74,7 @@ data "template_file" "cached-container-linux-install-configs" {
    ignition_endpoint       = "${format("%s/ignition", var.matchbox_http_endpoint)}"
    install_disk            = "${var.install_disk}"
    container_linux_oem     = "${var.container_linux_oem}"
+    ssh_authorized_key      = "${var.ssh_authorized_key}"

    # profile uses -b baseurl to install from matchbox cache
    baseurl_flag = "-b ${var.matchbox_http_endpoint}/assets/coreos"
--- a/bare-metal/container-linux/kubernetes/ssh.tf
+++ b/bare-metal/container-linux/kubernetes/ssh.tf
@ -1,5 +1,5 @@
 # Secure copy etcd TLS assets and kubeconfig to controllers. Activates kubelet.service
-resource "null_resource" "copy-etcd-secrets" {
+resource "null_resource" "copy-controller-secrets" {
  count = "${length(var.controller_names)}"

  connection {
@ -61,13 +61,13 @@ resource "null_resource" "copy-etcd-secrets" {
      "sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
      "sudo chown -R etcd:etcd /etc/ssl/etcd",
      "sudo chmod -R 500 /etc/ssl/etcd",
-      "sudo mv /home/core/kubeconfig /etc/kubernetes/kubeconfig",
+      "sudo mv $HOME/kubeconfig /etc/kubernetes/kubeconfig",
    ]
  }
 }

 # Secure copy kubeconfig to all workers. Activates kubelet.service
-resource "null_resource" "copy-kubeconfig" {
+resource "null_resource" "copy-worker-secrets" {
  count = "${length(var.worker_names)}"

  connection {
@ -84,7 +84,7 @@ resource "null_resource" "copy-kubeconfig" {

  provisioner "remote-exec" {
    inline = [
-      "sudo mv /home/core/kubeconfig /etc/kubernetes/kubeconfig",
+      "sudo mv $HOME/kubeconfig /etc/kubernetes/kubeconfig",
    ]
  }
 }
@ -95,13 +95,16 @@ resource "null_resource" "bootkube-start" {
  # Without depends_on, this remote-exec may start before the kubeconfig copy.
  # Terraform only does one task at a time, so it would try to bootstrap
  # while no Kubelets are running.
-  depends_on = ["null_resource.copy-etcd-secrets", "null_resource.copy-kubeconfig"]
+  depends_on = [
+    "null_resource.copy-controller-secrets",
+    "null_resource.copy-worker-secrets",
+  ]

  connection {
    type    = "ssh"
    host    = "${element(var.controller_domains, 0)}"
    user    = "core"
-    timeout = "30m"
+    timeout = "15m"
  }

  provisioner "file" {
@ -111,7 +114,7 @@ resource "null_resource" "bootkube-start" {

  provisioner "remote-exec" {
    inline = [
-      "sudo mv /home/core/assets /opt/bootkube",
+      "sudo mv $HOME/assets /opt/bootkube",
      "sudo systemctl start bootkube",
    ]
  }
--- a/bare-metal/container-linux/kubernetes/variables.tf
+++ b/bare-metal/container-linux/kubernetes/variables.tf
@ -1,3 +1,10 @@
+variable "cluster_name" {
+  type        = "string"
+  description = "Unique cluster name"
+}
+
+# bare-metal
+
 variable "matchbox_http_endpoint" {
  type        = "string"
  description = "Matchbox HTTP read-only endpoint (e.g. http://matchbox.example.com:8080)"
@ -13,18 +20,8 @@ variable "container_linux_version" {
  description = "Container Linux version of the kernel/initrd to PXE or the image to install"
 }

-variable "cluster_name" {
-  type        = "string"
-  description = "Cluster name"
-}
-
-variable "ssh_authorized_key" {
-  type        = "string"
-  description = "SSH public key to set as an authorized_key on machines"
-}
-
-# Machines
-# Terraform's crude "type system" does properly support lists of maps so we do this.
+# machines
+# Terraform's crude "type system" does not properly support lists of maps so we do this.

 variable "controller_names" {
  type = "list"
@ -50,13 +47,18 @@ variable "worker_domains" {
  type = "list"
 }

-# bootkube assets
+# configuration

 variable "k8s_domain_name" {
  description = "Controller DNS name which resolves to a controller instance. Workers and kubeconfig's will communicate with this endpoint (e.g. cluster.example.com)"
  type        = "string"
 }

+variable "ssh_authorized_key" {
+  type        = "string"
+  description = "SSH public key for user 'core'"
+}
+
 variable "asset_dir" {
  description = "Path to a directory where generated assets should be placed (contains secrets)"
  type        = "string"
@ -75,14 +77,14 @@ variable "network_mtu" {
 }

 variable "pod_cidr" {
-  description = "CIDR IP range to assign Kubernetes pods"
+  description = "CIDR IPv4 range to assign Kubernetes pods"
  type        = "string"
  default     = "10.2.0.0/16"
 }

 variable "service_cidr" {
  description = <<EOD
-CIDR IP range to assign Kubernetes services.
+CIDR IPv4 range to assign Kubernetes services.
 The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for kube-dns.
 EOD

--- a/bare-metal/container-linux/pxe-worker/cl/bootkube-worker.yaml.tmpl
+++ b/bare-metal/container-linux/pxe-worker/cl/bootkube-worker.yaml.tmpl
@ -1,117 +0,0 @@
---
-systemd:
-  units:
-    - name: docker.service
-      enable: true
-    - name: locksmithd.service
-      mask: true
-    - name: kubelet.path
-      enable: true
-      contents: |
-        [Unit]
-        Description=Watch for kubeconfig
-        [Path]
-        PathExists=/etc/kubernetes/kubeconfig
-        [Install]
-        WantedBy=multi-user.target
-    - name: wait-for-dns.service
-      enable: true
-      contents: |
-        [Unit]
-        Description=Wait for DNS entries
-        Wants=systemd-resolved.service
-        Before=kubelet.service
-        [Service]
-        Type=oneshot
-        RemainAfterExit=true
-        ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
-        [Install]
-        RequiredBy=kubelet.service
-    - name: kubelet.service
-      contents: |
-        [Unit]
-        Description=Kubelet via Hyperkube
-        Wants=rpc-statd.service
-        [Service]
-        EnvironmentFile=/etc/kubernetes/kubelet.env
-        Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
-          --volume=resolv,kind=host,source=/etc/resolv.conf \
-          --mount volume=resolv,target=/etc/resolv.conf \
-          --volume var-lib-cni,kind=host,source=/var/lib/cni \
-          --mount volume=var-lib-cni,target=/var/lib/cni \
-          --volume opt-cni-bin,kind=host,source=/opt/cni/bin \
-          --mount volume=opt-cni-bin,target=/opt/cni/bin \
-          --volume var-log,kind=host,source=/var/log \
-          --mount volume=var-log,target=/var/log \
-          --insecure-options=image"
-        ExecStartPre=/bin/mkdir -p /opt/cni/bin
-        ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
-        ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
-        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
-        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
-        ExecStartPre=/bin/mkdir -p /var/lib/cni
-        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
-        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
-        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
-        ExecStart=/usr/lib/coreos/kubelet-wrapper \
-          --allow-privileged \
-          --anonymous-auth=false \
-          --client-ca-file=/etc/kubernetes/ca.crt \
-          --cluster_dns={{.k8s_dns_service_ip}} \
-          --cluster_domain={{.cluster_domain_suffix}} \
-          --cni-conf-dir=/etc/kubernetes/cni/net.d \
-          --exit-on-lock-contention \
-          --hostname-override={{.domain_name}} \
-          --kubeconfig=/etc/kubernetes/kubeconfig \
-          --lock-file=/var/run/lock/kubelet.lock \
-          --network-plugin=cni \
-          --node-labels=node-role.kubernetes.io/node \
-          --pod-manifest-path=/etc/kubernetes/manifests \
-          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
-        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
-        Restart=always
-        RestartSec=5
-        [Install]
-        WantedBy=multi-user.target
-
-storage:
-  {{ if index . "pxe" }}
-  disks:
-    - device: /dev/sda
-      wipe_table: true
-      partitions:
-        - label: ROOT
-  filesystems:
-    - name: root
-      mount:
-        device: "/dev/sda1"
-        format: "ext4"
-        create:
-          force: true
-          options:
-            - "-LROOT"
-  {{end}}
-  files:
-    - path: /etc/kubernetes/kubelet.env
-      filesystem: root
-      mode: 0644
-      contents:
-        inline: |
-          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.9.2
-    - path: /etc/hostname
-      filesystem: root
-      mode: 0644
-      contents:
-        inline:
-          {{.domain_name}}
-    - path: /etc/sysctl.d/max-user-watches.conf
-      filesystem: root
-      contents:
-        inline: |
-          fs.inotify.max_user_watches=16184
-passwd:
-  users:
-    - name: core
-      ssh_authorized_keys:
-        - {{.ssh_authorized_key}}
--- a/bare-metal/container-linux/pxe-worker/groups.tf
+++ b/bare-metal/container-linux/pxe-worker/groups.tf
@ -1,22 +0,0 @@
-resource "matchbox_group" "workers" {
-  count   = "${length(var.worker_names)}"
-  name    = "${format("%s-%s", var.cluster_name, element(var.worker_names, count.index))}"
-  profile = "${matchbox_profile.bootkube-worker-pxe.name}"
-
-  selector {
-    mac = "${element(var.worker_macs, count.index)}"
-  }
-
-  metadata {
-    pxe            = "true"
-    domain_name    = "${element(var.worker_domains, count.index)}"
-    etcd_endpoints = "${join(",", formatlist("%s:2379", var.controller_domains))}"
-
-    # TODO
-    etcd_on_host          = "true"
-    k8s_etcd_service_ip   = "10.3.0.15"
-    k8s_dns_service_ip    = "${var.kube_dns_service_ip}"
-    cluster_domain_suffix = "${var.cluster_domain_suffix}"
-    ssh_authorized_key    = "${var.ssh_authorized_key}"
-  }
-}
--- a/bare-metal/container-linux/pxe-worker/profiles.tf
+++ b/bare-metal/container-linux/pxe-worker/profiles.tf
@ -1,20 +0,0 @@
-// Container Linux Install profile (from release.core-os.net)
-resource "matchbox_profile" "bootkube-worker-pxe" {
-  name   = "bootkube-worker-pxe"
-  kernel = "http://${var.container_linux_channel}.release.core-os.net/amd64-usr/${var.container_linux_version}/coreos_production_pxe.vmlinuz"
-
-  initrd = [
-    "http://${var.container_linux_channel}.release.core-os.net/amd64-usr/${var.container_linux_version}/coreos_production_pxe_image.cpio.gz",
-  ]
-
-  args = [
-    "initrd=coreos_production_pxe_image.cpio.gz",
-    "coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
-    "coreos.first_boot=yes",
-    "console=tty0",
-    "console=ttyS0",
-    "${var.kernel_args}",
-  ]
-
-  container_linux_config = "${file("${path.module}/cl/bootkube-worker.yaml.tmpl")}"
-}
--- a/bare-metal/container-linux/pxe-worker/ssh.tf
+++ b/bare-metal/container-linux/pxe-worker/ssh.tf
@ -1,22 +0,0 @@
-# Secure copy kubeconfig to all nodes to activate kubelet.service
-resource "null_resource" "copy-kubeconfig" {
-  count = "${length(var.worker_names)}"
-
-  connection {
-    type    = "ssh"
-    host    = "${element(var.worker_domains, count.index)}"
-    user    = "core"
-    timeout = "60m"
-  }
-
-  provisioner "file" {
-    content     = "${var.kubeconfig}"
-    destination = "$HOME/kubeconfig"
-  }
-
-  provisioner "remote-exec" {
-    inline = [
-      "sudo mv /home/core/kubeconfig /etc/kubernetes/kubeconfig",
-    ]
-  }
-}
--- a/bare-metal/container-linux/pxe-worker/variables.tf
+++ b/bare-metal/container-linux/pxe-worker/variables.tf
@ -1,72 +0,0 @@
-variable "cluster_name" {
-  description = "Cluster name"
-  type        = "string"
-}
-
-variable "matchbox_http_endpoint" {
-  type        = "string"
-  description = "Matchbox HTTP read-only endpoint (e.g. http://matchbox.example.com:8080)"
-}
-
-variable "container_linux_channel" {
-  type        = "string"
-  description = "Container Linux channel corresponding to the container_linux_version"
-}
-
-variable "container_linux_version" {
-  type        = "string"
-  description = "Container Linux version of the kernel/initrd to PXE or the image to install"
-}
-
-variable "ssh_authorized_key" {
-  type        = "string"
-  description = "SSH public key to set as an authorized key"
-}
-
-# machines
-# Terraform's crude "type system" does properly support lists of maps so we do this.
-
-variable "controller_domains" {
-  type = "list"
-}
-
-variable "worker_names" {
-  type = "list"
-}
-
-variable "worker_macs" {
-  type = "list"
-}
-
-variable "worker_domains" {
-  type = "list"
-}
-
-# bootkube
-
-variable "kubeconfig" {
-  type = "string"
-}
-
-variable "kube_dns_service_ip" {
-  description = "Kubernetes service IP for kube-dns (must be within server_cidr)"
-  type        = "string"
-  default     = "10.3.0.10"
-}
-
-# optional
-
-variable "kernel_args" {
-  description = "Additional kernel arguments to provide at PXE boot."
-  type        = "list"
-
-  default = [
-    "root=/dev/sda1",
-  ]
-}
-
-variable "cluster_domain_suffix" {
-  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
-  type        = "string"
-  default     = "cluster.local"
-}
--- a/digital-ocean/container-linux/kubernetes/README.md
+++ b/digital-ocean/container-linux/kubernetes/README.md
@ -11,10 +11,10 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster

 ## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.9.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
-* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
+* Kubernetes v1.10.1 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Single or multi-master, workloads isolated on workers, [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
-* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
+* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)

 ## Docs

--- a/digital-ocean/container-linux/kubernetes/bootkube.tf
+++ b/digital-ocean/container-linux/kubernetes/bootkube.tf
@ -1,12 +1,12 @@
 # Self-hosted Kubernetes assets (kubeconfig, manifests)
 module "bootkube" {
-  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.10.0"
+  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=db36b92abced3c4b0af279adfd5ed4bf0cf8c39f"

  cluster_name          = "${var.cluster_name}"
  api_servers           = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
  etcd_servers          = "${digitalocean_record.etcds.*.fqdn}"
  asset_dir             = "${var.asset_dir}"
-  networking            = "${var.networking}"
+  networking            = "flannel"
  network_mtu           = 1440
  pod_cidr              = "${var.pod_cidr}"
  service_cidr          = "${var.service_cidr}"
--- a/digital-ocean/container-linux/kubernetes/cl/controller.yaml.tmpl
+++ b/digital-ocean/container-linux/kubernetes/cl/controller.yaml.tmpl
@ -7,12 +7,13 @@ systemd:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
-            Environment="ETCD_IMAGE_TAG=v3.2.14"
+            Environment="ETCD_IMAGE_TAG=v3.3.3"
            Environment="ETCD_NAME=${etcd_name}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
            Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
            Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380"
+            Environment="ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381"
            Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}"
            Environment="ETCD_STRICT_RECONFIG_CHECK=true"
            Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
@ -77,6 +78,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -92,8 +94,10 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/master \
+          --node-labels=node-role.kubernetes.io/controller="true" \
          --pod-manifest-path=/etc/kubernetes/manifests \
-          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule
+          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=10
@ -119,8 +123,8 @@ storage:
      mode: 0644
      contents:
        inline: |
-          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.9.2
+          KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
+          KUBELET_IMAGE_TAG=v1.10.1
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -141,7 +145,7 @@ storage:
          # Move experimental manifests
          [ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
          BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
-          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.10.0}"
+          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.12.0}"
          BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
          exec /usr/bin/rkt run \
            --trust-keys-from-https \
--- a/digital-ocean/container-linux/kubernetes/cl/worker.yaml.tmpl
+++ b/digital-ocean/container-linux/kubernetes/cl/worker.yaml.tmpl
@ -50,9 +50,8 @@ systemd:
        ExecStartPre=/bin/mkdir -p /opt/cni/bin
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
-        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
-        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -68,7 +67,8 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/node \
-          --pod-manifest-path=/etc/kubernetes/manifests
+          --pod-manifest-path=/etc/kubernetes/manifests \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=5
@ -93,8 +93,8 @@ storage:
      mode: 0644
      contents:
        inline: |
-          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.9.2
+          KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
+          KUBELET_IMAGE_TAG=v1.10.1
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -112,7 +112,7 @@ storage:
            --volume config,kind=host,source=/etc/kubernetes \
            --mount volume=config,target=/etc/kubernetes \
            --insecure-options=image \
-            docker://gcr.io/google_containers/hyperkube:v1.9.2 \
+            docker://k8s.gcr.io/hyperkube:v1.10.1 \
            --net=host \
            --dns=host \
            --exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
--- a/digital-ocean/container-linux/kubernetes/controllers.tf
+++ b/digital-ocean/container-linux/kubernetes/controllers.tf
@ -45,7 +45,7 @@ resource "digitalocean_droplet" "controllers" {
  private_networking = true

  user_data = "${element(data.ct_config.controller_ign.*.rendered, count.index)}"
-  ssh_keys  = "${var.ssh_fingerprints}"
+  ssh_keys  = ["${var.ssh_fingerprints}"]

  tags = [
    "${digitalocean_tag.controllers.id}",
@ -90,4 +90,6 @@ data "ct_config" "controller_ign" {
  count        = "${var.controller_count}"
  content      = "${element(data.template_file.controller_config.*.rendered, count.index)}"
  pretty_print = false
+
+  snippets = ["${var.controller_clc_snippets}"]
 }
--- a/digital-ocean/container-linux/kubernetes/ssh.tf
+++ b/digital-ocean/container-linux/kubernetes/ssh.tf
@ -1,10 +1,10 @@
-# Secure copy kubeconfig to all nodes. Activates kubelet.service
-resource "null_resource" "copy-secrets" {
-  count = "${var.controller_count + var.worker_count}"
+# Secure copy etcd TLS assets and kubeconfig to controllers. Activates kubelet.service
+resource "null_resource" "copy-controller-secrets" {
+  count = "${var.controller_count}"

  connection {
    type    = "ssh"
-    host    = "${element(concat(digitalocean_droplet.controllers.*.ipv4_address, digitalocean_droplet.workers.*.ipv4_address), count.index)}"
+    host    = "${element(concat(digitalocean_droplet.controllers.*.ipv4_address), count.index)}"
    user    = "core"
    timeout = "15m"
  }
@ -61,7 +61,30 @@ resource "null_resource" "copy-secrets" {
      "sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
      "sudo chown -R etcd:etcd /etc/ssl/etcd",
      "sudo chmod -R 500 /etc/ssl/etcd",
-      "sudo mv /home/core/kubeconfig /etc/kubernetes/kubeconfig",
+      "sudo mv $HOME/kubeconfig /etc/kubernetes/kubeconfig",
+    ]
+  }
+}
+
+# Secure copy kubeconfig to all workers. Activates kubelet.service.
+resource "null_resource" "copy-worker-secrets" {
+  count = "${var.worker_count}"
+
+  connection {
+    type    = "ssh"
+    host    = "${element(concat(digitalocean_droplet.workers.*.ipv4_address), count.index)}"
+    user    = "core"
+    timeout = "15m"
+  }
+
+  provisioner "file" {
+    content     = "${module.bootkube.kubeconfig}"
+    destination = "$HOME/kubeconfig"
+  }
+
+  provisioner "remote-exec" {
+    inline = [
+      "sudo mv $HOME/kubeconfig /etc/kubernetes/kubeconfig",
    ]
  }
 }
@ -69,7 +92,11 @@ resource "null_resource" "copy-secrets" {
 # Secure copy bootkube assets to ONE controller and start bootkube to perform
 # one-time self-hosted cluster bootstrapping.
 resource "null_resource" "bootkube-start" {
-  depends_on = ["module.bootkube", "null_resource.copy-secrets"]
+  depends_on = [
+    "module.bootkube",
+    "null_resource.copy-controller-secrets",
+    "null_resource.copy-worker-secrets",
+  ]

  connection {
    type    = "ssh"
@ -85,7 +112,7 @@ resource "null_resource" "bootkube-start" {

  provisioner "remote-exec" {
    inline = [
-      "sudo mv /home/core/assets /opt/bootkube",
+      "sudo mv $HOME/assets /opt/bootkube",
      "sudo systemctl start bootkube",
    ]
  }
--- a/digital-ocean/container-linux/kubernetes/variables.tf
+++ b/digital-ocean/container-linux/kubernetes/variables.tf
@ -1,8 +1,10 @@
 variable "cluster_name" {
  type        = "string"
-  description = "Unique cluster name"
+  description = "Unique cluster name (prepended to dns_zone)"
 }

+# Digital Ocean
+
 variable "region" {
  type        = "string"
  description = "Digital Ocean region (e.g. nyc1, sfo2, fra1, tor1)"
@ -13,22 +15,12 @@ variable "dns_zone" {
  description = "Digital Ocean domain (i.e. DNS zone) (e.g. do.example.com)"
 }

-variable "image" {
-  type        = "string"
-  default     = "coreos-stable"
-  description = "OS image from which to initialize the disk (e.g. coreos-stable)"
-}
+# instances

 variable "controller_count" {
  type        = "string"
  default     = "1"
-  description = "Number of controllers"
-}
-
-variable "controller_type" {
-  type        = "string"
-  default     = "2gb"
-  description = "Digital Ocean droplet size (e.g. 2gb (min), 4gb, 8gb)."
+  description = "Number of controllers (i.e. masters)"
 }

 variable "worker_count" {
@ -37,39 +29,57 @@ variable "worker_count" {
  description = "Number of workers"
 }

+variable "controller_type" {
+  type        = "string"
+  default     = "s-2vcpu-2gb"
+  description = "Droplet type for controllers (e.g. s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb)."
+}
+
 variable "worker_type" {
  type        = "string"
-  default     = "512mb"
-  description = "Digital Ocean droplet size (e.g. 512mb, 1gb, 2gb, 4gb)"
+  default     = "s-1vcpu-1gb"
+  description = "Droplet type for workers (e.g. s-1vcpu-1gb, s-1vcpu-2gb, s-2vcpu-2gb)"
 }

+variable "image" {
+  type        = "string"
+  default     = "coreos-stable"
+  description = "Container Linux image for instances (e.g. coreos-stable)"
+}
+
+variable "controller_clc_snippets" {
+  type        = "list"
+  description = "Controller Container Linux Config snippets"
+  default     = []
+}
+
+variable "worker_clc_snippets" {
+  type        = "list"
+  description = "Worker Container Linux Config snippets"
+  default     = []
+}
+
+# configuration
+
 variable "ssh_fingerprints" {
  type        = "list"
  description = "SSH public key fingerprints. (e.g. see `ssh-add -l -E md5`)"
 }

-# bootkube assets
-
 variable "asset_dir" {
  description = "Path to a directory where generated assets should be placed (contains secrets)"
  type        = "string"
 }

-variable "networking" {
-  description = "Choice of networking provider (flannel or calico)"
-  type        = "string"
-  default     = "flannel"
-}
-
 variable "pod_cidr" {
-  description = "CIDR IP range to assign Kubernetes pods"
+  description = "CIDR IPv4 range to assign Kubernetes pods"
  type        = "string"
  default     = "10.2.0.0/16"
 }

 variable "service_cidr" {
  description = <<EOD
-CIDR IP range to assign Kubernetes services.
+CIDR IPv4 range to assign Kubernetes services.
 The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for kube-dns.
 EOD

@ -82,4 +92,3 @@ variable "cluster_domain_suffix" {
  type        = "string"
  default     = "cluster.local"
 }
-
--- a/digital-ocean/container-linux/kubernetes/workers.tf
+++ b/digital-ocean/container-linux/kubernetes/workers.tf
@ -26,7 +26,7 @@ resource "digitalocean_droplet" "workers" {
  private_networking = true

  user_data = "${data.ct_config.worker_ign.rendered}"
-  ssh_keys  = "${var.ssh_fingerprints}"
+  ssh_keys  = ["${var.ssh_fingerprints}"]

  tags = [
    "${digitalocean_tag.workers.id}",
@ -44,7 +44,6 @@ data "template_file" "worker_config" {

  vars = {
    k8s_dns_service_ip    = "${cidrhost(var.service_cidr, 10)}"
-    k8s_etcd_service_ip   = "${cidrhost(var.service_cidr, 15)}"
    cluster_domain_suffix = "${var.cluster_domain_suffix}"
  }
 }
@ -52,4 +51,5 @@ data "template_file" "worker_config" {
 data "ct_config" "worker_ign" {
  content      = "${data.template_file.worker_config.rendered}"
  pretty_print = false
+  snippets     = ["${var.worker_clc_snippets}"]
 }
--- a/docs/addons/cluo.md
+++ b/docs/addons/cluo.md
@ -18,7 +18,7 @@ kubectl apply -f addons/cluo -R
 $ kubectl get nodes --show-labels
 ...
 container-linux-update.v1.coreos.com/group=stable
-container-linux-update.v1.coreos.com/version=1576.5.0
+container-linux-update.v1.coreos.com/version=1632.3.0
 ```

 `update-operator` ensures one node reboots at a time and that pods are drained prior to reboot.
--- a/docs/addons/dashboard.md
+++ b/docs/addons/dashboard.md
@ -1,27 +0,0 @@
-# Kubernetes Dashboard
-
-!!! warning
-    The Kubernetes Dashboard takes [unusual approaches](https://github.com/kubernetes/dashboard/wiki/Access-control#authorization-header) to security and is often a point of security escalations. We recommend you do don't deploy it and get familiar with `kubectl`, if possible.
-
-The Kubernetes [Dashboard](https://github.com/kubernetes/dashboard) provides a web UI to manage a Kubernetes cluster for those who prefer an alternative to `kubectl`.
-
-## Create
-
-Create the dashboard deployment and service.
-
-```
-kubectl apply -f addons/dashboard -R
-```
-
-## Access
-
-Use `kubectl` to authenticate to the apiserver and create a local port forward to the remote port on the dashboard pod.
-
-```sh
-kubectl get pods -n kube-system
-kubectl port-forward POD [LOCAL_PORT:]REMOTE_PORT
-kubectl port-forward kubernetes-dashboard-id 9090 -n kube-system
-```
-
-!!! tip
-    If you'd like to expose the Dashboard via Ingress and add authentication, use a suitable OAuth2 proxy sidecar and pick your favorite OAuth2 provider.
--- a/docs/addons/grafana.md
+++ b/docs/addons/grafana.md
@ -0,0 +1,20 @@
+## Grafana
+
+Grafana can be used to build dashboards and visualizations that use Prometheus as the datasource. Create the grafana deployment and service.
+
+```
+kubectl apply -f addons/grafana -R
+```
+
+Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Grafana pod.
+
+```
+kubectl port-forward grafana-POD-ID 8080 -n monitoring
+```
+
+Visit [127.0.0.1:8080](http://127.0.0.1:8080) to view the bundled dashboards.
+
+![Grafana Capacity Planning](/img/grafana-capacity.png)
+![Grafana Control Plane](/img/grafana-control-plane.png)
+![Grafana Node View](/img/grafana-node.png)
+
--- a/docs/addons/overview.md
+++ b/docs/addons/overview.md
@ -6,6 +6,5 @@ Every Typhoon cluster is verified to work well with several post-install addons.
 * Nginx [Ingress Controller](ingress.md)
 * [Heapster](heapster.md)
 * [Prometheus](prometheus.md)
-* [Grafana](prometheus.md#grafana)
-* Kubernetes [Dashboard](dashboard.md)
+* [Grafana](grafana.md)

--- a/docs/addons/prometheus.md
+++ b/docs/addons/prometheus.md
@ -20,7 +20,7 @@ On Kubernetes clusters, Prometheus is run as a Deployment, configured with a Con
 kubectl apply -f addons/prometheus -R
 ```

-The ConfigMap configures Prometheus to target apiserver endpoints, node metrics, cAdvisor metrics, and exporters. By default, data is kept in an `emptyDir` so it is persisted until the pod is rescheduled.
+The ConfigMap configures Prometheus to discover apiservers, kubelets, cAdvisor, services, endpoints, and exporters. By default, data is kept in an `emptyDir` so it is persisted until the pod is rescheduled.

 ### Exporters

@ -32,7 +32,7 @@ Exporters expose metrics for 3rd-party systems that don't natively expose Promet

 ### Queries and Alerts

-Prometheus provides a simplistic UI for querying metrics and viewing alerts. Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Prometheus pod.
+Prometheus provides a basic UI for querying metrics and viewing alerts. Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Prometheus pod.

 ```
 kubectl get pods -n monitoring
@ -47,21 +47,4 @@ Visit [127.0.0.1:9090](http://127.0.0.1:9090) to query [expressions](http://127.
 <br/>
 ![Prometheus Alerts](/img/prometheus-alerts.png)

-## Grafana
-
-Grafana can be used to build dashboards and rich visualizations that use Prometheus as the datasource. Create the grafana deployment and service.
-
-```
-kubectl apply -f addons/grafana -R
-```
-
-Use `kubectl` to authenticate to the apiserver and create a local port-forward to the Grafana pod.
-
-```
-kubectl port-forward grafana-POD-ID 8080 -n monitoring
-```
-
-Visit [127.0.0.1:8080](http://127.0.0.1:8080), add the prometheus data-source (http://prometheus.monitoring.svc.cluster.local), and import your desired dashboard (e.g. [Grafana Dashboard 315](https://grafana.com/dashboards/315)).
-
-![Grafana Dashboard](/img/grafana-dashboard.png)
-
+Use [Grafana](/addons/grafana.md) to view or build dashboards that use Prometheus as the datasource.
--- a/docs/advanced/customization.md
+++ b/docs/advanced/customization.md
@ -1,6 +1,130 @@
 # Customization

-To customize clusters in ways that aren't supported by input variables, fork the repo and make changes to the Terraform module. Stay tuned for improvements to this strategy since it is beneficial to stay close to this upstream.
+Typhoon provides minimal Kubernetes clusters with defaults we recommend for production. Terraform variables provide easy to use and supported customizations for clusters. Advanced options are available for customizing the architecture or hosts.
+
+## Variables
+
+Typhoon modules accept Terraform input variables for customizing clusters in meritorious ways (e.g. `worker_count`, etc). Variables are carefully considered to provide essentials, while limiting complexity and test matrix burden. See each platform's tutorial for options.
+
+## Addons
+
+Clusters are kept to a minimal Kubernetes control plane by offering components like Nginx Ingress Controller, Prometheus, Grafana, and Heapster as optional post-install [addons](https://github.com/poseidon/typhoon/tree/master/addons). Customize addons by modifying a copy of our addon manifests.
+
+## Hosts
+
+### Container Linux
+
+!!! danger
+    Container Linux Configs provide powerful host customization abilities. You are responsible for the additional configs defined for hosts.
+
+Container Linux Configs (CLCs) declare how a Container Linux instance's disk should be provisioned on first boot from disk. CLCs define disk partitions, filesystems, files, systemd units, dropins, networkd configs, mount units, raid arrays, and users. Typhoon creates controller and worker instances with base Container Linux Configs to create a minimal, secure Kubernetes cluster on each platform.
+
+Typhoon AWS, Google Cloud, and Digital Ocean give users the ability to provide CLC *snippets* - valid Container Linux Configs that are validated and additively merged into the Typhoon base config during `terraform plan`. This allows advanced host customizations and experimentation.
+
+#### Examples
+
+Container Linux [docs](https://coreos.com/os/docs/latest/clc-examples.html) show many simple config examples. Ensure a file `/opt/hello` is created with permissions 0644. 
+
+```
+# custom-files
+storage:
+  files:
+    - path: /opt/hello
+      filesystem: root
+      contents:
+        inline: |
+          Hello World
+      mode: 0644
+```
+
+Ensure a systemd unit `hello.service` is created and a dropin `50-etcd-cluster.conf` is added for `etcd-member.service`.
+
+```
+# custom-units
+systemd:
+  units:
+    - name: hello.service
+      enable: true
+      contents: |
+        [Unit]
+        Description=Hello World
+        [Service]
+        Type=oneshot
+        ExecStart=/usr/bin/echo Hello World!
+        [Install]
+        WantedBy=multi-user.target
+    - name: etcd-member.service
+      enable: true
+      dropins:
+        - name: 50-etcd-cluster.conf
+          contents: |
+            Environment="ETCD_LOG_PACKAGE_LEVELS=etcdserver=WARNING,security=DEBUG"
+```
+
+#### Specification
+
+View the Container Linux Config [format](https://coreos.com/os/docs/1576.4.0/configuration.html) to read about each field.
+
+#### Usage
+
+Write Container Linux Configs *snippets* as files in the repository where you keep Terraform configs for clusters (perhaps in a `clc` or `snippets` subdirectory). You may organize snippets in multiple files as desired, provided they are each valid.
+
+Define an [AWS](https://typhoon.psdn.io/aws/#cluster), [Google Cloud](https://typhoon.psdn.io/google-cloud/#cluster), or [Digital Ocean](https://typhoon.psdn.io/digital-ocean/#cluster) cluster and fill in the optional `controller_clc_snippets` or `worker_clc_snippets` fields.
+
+```
+module "digital-ocean-nemo" {
+  ...
+
+  controller_count        = 1
+  worker_count            = 2
+  controller_clc_snippets = [
+    "${file("./custom-files")}",
+    "${file("./custom-units")}",
+  ]
+  worker_clc_snippets = [
+    "${file("./custom-files")}",
+    "${file("./custom-units")}",
+  ]
+  ...
+}
+```
+
+Plan the resources to be created.
+
+```
+$ terraform plan
+Plan: 54 to add, 0 to change, 0 to destroy.
+```
+
+Most syntax errors in CLCs can be caught during planning. For example, mangle the indentation in one of the CLC files:
+
+```
+$ terraform plan
+...
+error parsing Container Linux Config: error: yaml: line 3: did not find expected '-' indicator
+```
+
+Undo the mangle. Apply the changes to create the cluster per the tutorial.
+
+```
+$ terraform apply
+```
+
+Container Linux Configs (and the CoreOS Ignition system) create immutable infrastructure. Disk provisioning is performed only on first boot from disk. That means if you change a snippet used by an instance, Terraform will (correctly) try to destroy and recreate that instance. Be careful!
+
+!!! danger
+    Destroying and recreating controller instances is destructive! etcd runs on controller instances and stores data there. Do not modify controller snippets. See [blue/green](https://typhoon.psdn.io/topics/maintenance/#upgrades) clusters.
+
+## Architecture
+
+To customize clusters in ways that aren't supported by input variables, fork Typhoon and maintain a repository with customizations. Reference the repository by changing the username.
+
+```
+module "digital-ocean-nemo" {
+  source = "git::https://github.com/USERNAME/typhoon//digital-ocean/container-linux/kubernetes?ref=myspecialcase"
+  ...
+}
+```

 To customize lower-level Kubernetes control plane bootstrapping, see the [poseidon/bootkube-terraform](https://github.com/poseidon/bootkube-terraform) Terraform module.

--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Dalton Hubble	77c0a4cf2e	Update Kubernetes from v1.10.0 to v1.10.1 * Use kubernetes-incubator/bootkube v0.12.0	2018-04-12 20:57:31 -07:00
Dalton Hubble	5035d56db2	Refactor GCP to remove controller internal module * Remove the controller internal module to align with other platforms and since its not a supported use case	2018-04-12 19:41:51 -07:00
Dalton Hubble	9bb3de5327	Skip creating unused dirs on worker nodes	2018-04-11 22:23:51 -07:00
Dalton Hubble	c8eabc2af4	Fix GCP controller_type and worker_type vars	2018-04-11 22:19:58 -07:00
Matt Dorn	2eaf858c5c	Update example BGPPeer manifest Previous example may have been outdated. It resulted in `error: unable to recognize "example.yaml": no matches for /, Kind=bgpPeer` . See https://docs.projectcalico.org/v3.0/reference/calicoctl/resources/bgppeer.	2018-04-09 23:23:18 -05:00
Dalton Hubble	b8656fd74b	Clarify bare-metal SSH instructions	2018-04-08 14:11:05 -07:00
Dalton Hubble	d276fffcda	Fix bare-metal multiple apply/ssh on Terraform v0.11.4+ * Terraform v0.11.4 introduced changes to remote-exec that mean Typhoon bare-metal clusters require multiple runs of terraform apply to ssh and bootstrap. * Bare-metal installs PXE boot a live instance to install to disk and then reboot from disk as controllers/workers. Terraform remote-exec has no way to "know" to wait until the reboot has occurred to kickoff Kubernetes bootstrap. Previously Typhoon created a "debug" user during this install phase to allow an admin to SSH, but remote-exec would hang, trying to connect as user "core". Terraform v0.11.4 changes this behavior so remote-exec fails and a user must re-run terraform apply until succeeding. * A new way to "trick" remote-exec into waiting for the reboot into the disk install is to run SSH on a non-standard port during the disk install. This retains the ability for an admin to SSH during install (most distros don't have this) and fixes the issue so only a single run of terraform apply is needed. * https://github.com/hashicorp/terraform/pull/17359#issuecomment-376415464	2018-04-08 13:32:31 -07:00
Dalton Hubble	6b08bde479	Use k8s.gcr.io instead of gcr.io/google_containers * Kubernetes recommends using the alias to fetch images from the nearest GCR regional mirror, to abstract the use of GCR, and to drop names containing 'google' * https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ	2018-04-08 12:57:52 -07:00
Dalton Hubble	f4b2396718	Return Prometheus deployment to be a worker workload * Expose etcd metrics to workers so Prometheus can run on a worker, rather than a controller * Drop temporary firewall rules allowing Prometheus to run on a controller and scrape targes * Related to https://github.com/poseidon/typhoon/pull/175	2018-04-08 12:20:00 -07:00
Dalton Hubble	b76126db93	Update docs builder and material theme	2018-04-08 00:00:03 -07:00
Dalton Hubble	7186aa46da	Update kube-state-metrics from v1.2.0 to v1.3.0 * https://github.com/kubernetes/kube-state-metrics/pull/412 * https://github.com/kubernetes/kube-state-metrics/pull/413	2018-04-04 21:04:13 -07:00
Dalton Hubble	18dbaf74ce	Update kube-dns from v1.14.8 to v1.14.9 * https://github.com/kubernetes/kubernetes/pull/61908	2018-04-04 21:00:23 -07:00
Dalton Hubble	ce001e9d56	Update etcd from v3.3.2 to v3.3.3 * https://github.com/coreos/etcd/releases/tag/v3.3.3	2018-04-04 20:32:24 -07:00
Dalton Hubble	d770393dbc	Add etcd metrics, Prometheus scrapes, and Grafana dash * Use etcd v3.3 --listen-metrics-urls to expose only metrics data via http://0.0.0.0:2381 on controllers * Add Prometheus discovery for etcd peers on controller nodes * Temporarily drop two noisy Prometheus alerts	2018-04-03 20:31:00 -07:00
Dalton Hubble	642f7ec22f	Update CHANGES.md with Kubernetes link	2018-03-30 23:12:38 -07:00
Dalton Hubble	1cc043d1eb	Update Kubernetes from v1.9.6 to v1.10.0	2018-03-30 22:14:07 -07:00
Dalton Hubble	f8e9bfb1c0	Add disk_type variable for EBS volume type on AWS * Change EBS volume type from `standard` ("prior generation) to `gp2`. Prometheus alerts are tuned for SSDs * Other platforms have fast enough disks by default	2018-03-29 22:51:54 -07:00
Dalton Hubble	b1e41dcb99	addons: Update from Grafana v4.6.3 to v5.0.4 This reverts commit `c59a9c66b1`.	2018-03-28 19:45:19 -07:00
Dalton Hubble	de4d90750e	Use consistent naming of remote provision steps	2018-03-26 00:29:57 -07:00
Dalton Hubble	7acd4931f6	Remove redundant kubeconfig copy on AWS and GCP * AWS and Google Cloud make use of auto-scaling groups and managed instance groups, respectively. As such, the kubeconfig is already held in cloud user-data * Controller instances are provisioned with a kubeconfig from user-data. Its redundant to use a Terraform remote file copy step for the kubeconfig.	2018-03-26 00:01:47 -07:00
Dalton Hubble	cfd603bea2	Ensure etcd secrets are only distributed to controller hosts * Previously, etcd secrets were erroneously distributed to worker nodes (permissions 500, ownership etc:etcd).	2018-03-25 23:46:44 -07:00
Dalton Hubble	fdb543e834	Add optional controller_type and worker_type vars on GCP * Remove optional machine_type variable on Google Cloud * Use controller_type and worker_type instead	2018-03-25 22:11:18 -07:00
Dalton Hubble	8d3d4220fd	Add disk_size variable on Google Cloud	2018-03-25 22:04:14 -07:00
Dalton Hubble	ba9daf439e	Remove unmaintained pxe-worker internal module	2018-03-25 21:57:52 -07:00
Dalton Hubble	38adb14bd2	Remove optional variable networking on Digital Ocean * Calico isn't viable on Digital Ocean because their firewalls do not support IP-IP protocol. Its not viable to run a cluster without firewalls just to use Calico. * Remove the caveat note. Don't allow users to shoot themselves in the foot	2018-03-25 21:48:51 -07:00
Dalton Hubble	e43cf9f608	Organize and cleanup variable descriptions	2018-03-25 21:44:43 -07:00
Dalton Hubble	455a4af27e	Improve cluster definition examples in docs	2018-03-25 20:41:52 -07:00
Dalton Hubble	39876e455f	Fix docs to reflect enforced provider versions	2018-03-25 11:34:39 -07:00
Dalton Hubble	da2be86e8c	Add v1.9.6 heading to CHANGES.md	2018-03-22 22:01:29 -07:00
Dalton Hubble	65a2751f77	addons: Update heapster from v1.5.1 to v1.5.2 * https://github.com/kubernetes/heapster/releases/tag/v1.5.2	2018-03-21 20:32:01 -07:00
Dalton Hubble	a04ef3919a	Update Kubernetes from v1.9.5 to v1.9.6	2018-03-21 20:29:52 -07:00
Dalton Hubble	851bc1a3f8	Update nginx-ingress from 0.11.0 to 0.12.0	2018-03-19 23:17:17 -07:00
Dalton Hubble	758c09fa5c	Update Kubernetes from v1.9.4 to v1.9.5	2018-03-19 00:25:44 -07:00
Dalton Hubble	b1cdd361ef	Mention controllers node label in changelog	2018-03-19 00:15:56 -07:00
Dalton Hubble	7f7bc960a6	Set default Google Cloud os_image to coreos-stable	2018-03-19 00:08:26 -07:00
Dalton Hubble	29108fd99d	Improve changelog with migration links	2018-03-18 23:54:55 -07:00
Dalton Hubble	18d08de898	Add Container Linux Config snippet docs	2018-03-18 23:22:40 -07:00
Dalton Hubble	f3730b2bfa	Add Container Linux Config snippets feature * Introduce the ability to support Container Linux Config "snippets" for controllers and workers on cloud platforms. This allows end-users to customize hosts by providing Container Linux configs that are additively merged into the base configs defined by Typhoon. Config snippets are validated, merged, and show any errors during `terraform plan` * Example uses include adding systemd units, network configs, mounts, files, raid arrays, or other disk provisioning features provided by Container Linux Configs (using Ignition low-level) * Requires terraform-provider-ct v0.2.1 plugin	2018-03-18 18:28:18 -07:00
Dalton Hubble	88aa9a46e5	Add /var/lib/calico volume mount to Calico DaemonSet	2018-03-18 16:40:38 -07:00
Dalton Hubble	efa90d8b44	Add a new key=value label to controller nodes * Add a node-role.kubernetes.io/controller="true" node label to controllers so Prometheus service discovery can filter to services that only run on controllers (i.e. masters) * Leave node-role.kubernetes.io/master="" untouched as its a Kubernetes convention	2018-03-18 16:39:10 -07:00
Dalton Hubble	46226a8015	Update Prometheus from 2.2.0 to 2.2.1	2018-03-18 15:56:44 -07:00
Dalton Hubble	270d1ce357	Add links to upstream regressions	2018-03-14 18:56:20 -07:00
Dalton Hubble	ab87b6cea3	Add clarifying links to CHANGES	2018-03-12 21:19:15 -07:00
Dalton Hubble	d621512dd6	Promote AWS platform from beta to stable	2018-03-12 21:15:53 -07:00
Dalton Hubble	c59a9c66b1	Revert "addons: Update from Grafana v4.6.3 to v5.0.0" * Revert commit `9dcc255f8e`. * Grafana v5.0 is not compatible with Kubernetes v1.9.4. See https://github.com/poseidon/typhoon/pull/162	2018-03-12 21:01:14 -07:00
Dalton Hubble	21f2cef12f	Improve changelog, README, and index page	2018-03-12 20:58:02 -07:00
Dalton Hubble	931e311786	Update Kubernetes from v1.9.3 to v1.9.4	2018-03-12 18:07:50 -07:00
Dalton Hubble	2592a0aad4	Allow Google accelerators (i.e. GPUs) on workers	2018-03-11 17:21:24 -07:00
Dalton Hubble	6c5e287c29	Add details and links to the changelog	2018-03-11 17:07:07 -07:00
Dalton Hubble	2a4595eeee	Add links to the charitable donations list	2018-03-11 14:51:40 -07:00
Dalton Hubble	8e7e6b9f7f	Normalize Terraform configs with terraform fmt	2018-03-11 14:46:05 -07:00
Dalton Hubble	35f3b1b28c	Enable AWS NLB cross-zone load balancing * https://github.com/terraform-providers/terraform-provider-aws/pull/3537 * https://aws.amazon.com/about-aws/whats-new/2018/02/network-load-balancer-now-supports-cross-zone-load-balancing/	2018-03-10 23:25:18 -08:00
Dalton Hubble	9fb1e1a0e2	Update etcd from v3.3.1 to v3.3.2 * https://github.com/coreos/etcd/releases/tag/v3.3.2	2018-03-10 13:44:35 -08:00
Dalton Hubble	b61d6373c5	Add ignore_changes for AWS worker image_id	2018-03-10 13:16:05 -08:00
Dalton Hubble	42708f9a70	Update Prometheus from v2.2.0-rc.1 to v2.2.0 * https://github.com/prometheus/prometheus/releases/tag/v2.2.0	2018-03-09 00:20:40 -08:00
Dalton Hubble	d54709f89c	Update Grafana from v5.0.0 to 5.0.1 * https://github.com/grafana/grafana/releases/tag/v5.0.1	2018-03-09 00:20:40 -08:00
Dalton Hubble	0e688ef05a	Update CHANGES.md changelog with monitoring updates	2018-03-09 00:20:40 -08:00
Dalton Hubble	9dcc255f8e	addons: Update from Grafana v4.6.3 to v5.0.0	2018-03-09 00:20:40 -08:00
Dalton Hubble	9307e97c46	addons: Update Prometheus from v2.1.0 to v2.2.0 * Annotate Prometheus service to scrape metrics from Prometheus itself (enables Prometheus* alerts) * Update kube-state-metrics addon-resizer to 1.7 * Use port 8080 for kube-state-metrics * Add PrometheusNotIngestingSamples alert rule * Change K8SKubeletDown alert rule to fire when 10% of kubelets are down, not 1% * https://github.com/coreos/prometheus-operator/pull/1032	2018-03-09 00:20:40 -08:00
Dalton Hubble	c112ee3829	Rename cluster_name to name in internal module * Ensure consistency between AWS and GCP platforms	2018-03-03 17:52:01 -08:00
Dalton Hubble	45b556c08f	Fix overly strict firewall for GCP "worker pools" * Fix issue where worker firewall rules didn't apply to additional workers attached to a GCP cluster using the new "worker pools" feature (unreleased, #148). Solves host connection timeouts and pods not being scheduled to attached worker pools. * Add `name` field to GCP internal worker module to represent the unique name of of the worker pool * Use `cluster_name` field of GCP internal worker module for passing the name of the cluster to which workers should be attached	2018-03-03 17:40:17 -08:00
Dalton Hubble	da6aafe816	Revert "Add module version requirements to internal workers modules" * This reverts commit `cce4537487`. * Provider passing to child modules is complex and the behavior changed between Terraform v0.10 and v0.11. We're continuing to allow both versions so this change should be reverted. For the time being, those using our internal Terraform modules will have to be aware of the minimum version for AWS and GCP providers, there is no good way to do enforcement.	2018-03-03 16:56:34 -08:00
Dalton Hubble	cce4537487	Add module version requirements to internal workers modules	2018-03-03 14:39:25 -08:00
Dalton Hubble	73126eb7f8	Add support for worker pools on AWS * Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) * Move worker resources into a Terraform submodule * Output variables needed for passing to worker pools * Add usage docs for AWS worker pools (advanced)	2018-02-27 18:31:42 -08:00
Dalton Hubble	160ae34e71	Add support for worker pools on google-cloud * Set defaults for internal worker module's count, machine_type, and os_image * Allow "pools" of homogeneous workers to be created using the google-cloud/kubernetes/workers module	2018-02-26 22:36:36 -08:00
Dalton Hubble	06d40c5b44	Show os_image coreos-stable on Google Cloud * Don't need to define a specific dated image. Managed instance groups do not delete instances when new images are released to a channel	2018-02-26 22:24:44 -08:00
Dalton Hubble	98985e5acd	Remove unused etcd_service_ip template variable * etcd_service_ip dates back to deprecated self-hosted etcd	2018-02-26 22:20:20 -08:00
Dalton Hubble	ea6bf9c9fb	Improve links in tutorials and changelog notes	2018-02-26 12:55:32 -08:00
Dalton Hubble	486fdb6968	Simplify CLC kubeconfig templating on AWS and GCP * Template terraform-render-bootkube's multi-line kubeconfig output using the right indentation * Add `kubeconfig` variable to google-cloud controllers and workers Terraform submodules * Remove `kubeconfig_*` variables from google-cloud controllers and workers Terraform submodules	2018-02-26 12:49:01 -08:00
Dalton Hubble	a44cf0edbd	Update Calico from v3.0.2 to v3.0.3 * https://github.com/projectcalico/calico/releases/tag/v3.0.3	2018-02-26 12:48:19 -08:00
Dalton Hubble	983c7aa012	Recommend installing terraform-provider-ct v0.2.1 * Upcoming releases may begin to use features that require the `terraform-provider-ct` plugin v0.2.1 * New users should use `terraform-provider-ct` v0.2.1. Existing users can safely drop-in replace their v0.2.0 plugin with v0.2.1 as well (location referenced in ~/.terraformrc). * See https://github.com/poseidon/typhoon/pull/145	2018-02-25 19:39:54 -08:00
Dalton Hubble	3d9683b6e8	Update the Digital Ocean SSH fingerprint docs	2018-02-25 19:09:38 -08:00
Sean Swehla	0da7757ef4	Pass Digital Ocean ssh_fingerprints as a list * Fix digital-ocean module to pass ssh_fingerprints as a list since the module accepts a list	2018-02-25 19:03:33 -08:00
Barak Michener	04c6613ff3	Mention the command that applies the changes	2018-02-25 17:15:42 -08:00
Dalton Hubble	92600efd11	Remove author employment disclosure note * Author no longer works for CoreOS / Red Hat * Typhoon development continues as usual	2018-02-24 18:30:51 -08:00
Dalton Hubble	66c64b4e45	List addons below platforms in CHANGES	2018-02-22 22:33:13 -08:00
Dalton Hubble	13f3745093	Add kubelet --volume-plugin-dir flag * Set Kubelet search path for flexvolume plugins to /var/lib/kubelet/volumeplugins * Add support for flexvolume plugins on AWS, GCE, and DO * See `9548572d98` which added flexvolume support for bare-metal	2018-02-22 22:11:45 -08:00
Dalton Hubble	c4914c326b	Update bootkube and terraform-render-bootkube to v0.11.0	2018-02-22 21:53:26 -08:00
Dalton Hubble	461fd46986	Update CHANGES.md with AWS ELB to NLB change	2018-02-22 21:36:35 -08:00
Paul Saunders	ceb5555222	Switch apiserver from ELB to a network load balancer	2018-02-22 16:10:31 -08:00
Paul Saunders	86420fd507	Rename namespace manifests to be applied first * Ensure kubectl apply -R creates manifests in the right order	2018-02-22 01:04:30 -08:00
Dalton Hubble	5c383f4184	addons: Update nginx-ingress from 0.10.2 to 0.11.0	2018-02-21 23:54:12 -08:00
Dalton Hubble	22fa051002	Switch Ingress ELB to a network load balancer * Require terraform-provider-aws 1.7 or higher	2018-02-20 17:34:38 -08:00
Stephen Augustus	c8313751d7	Ignore lifecycle changes to the AWS controller ami	2018-02-15 19:48:39 -08:00
Dalton Hubble	195d902ab6	Upgrade etcd from v3.2.15 to v3.3.1	2018-02-15 19:29:46 -08:00
Dalton Hubble	c19a68b59b	Update bootkube control-plane manifests * Remove PersistentVolumeLabel admission controller flag * Switch Deployments and DaemonSets to apps/v1 * Minor update to pod-checkpointer image version	2018-02-15 11:06:35 -08:00
Dalton Hubble	de88fa5457	addons: Update Heapster from v1.5.0 to v1.5.1 * Switch to k8s.gcr.io vanity image name * Add service account, Role, and ClusterRole for heapster	2018-02-15 10:57:47 -08:00
Stephen Augustus	d9a0183f3f	addons/nginx-ingress: Fix typo in GCP selector name	2018-02-14 03:07:36 -05:00
Dalton Hubble	7e24c67608	Remove docs mention of the etcd-network-checkpointer * etcd-network-checkpointer is no longer used, its a holdover from the self-hosted etcd era	2018-02-13 16:19:03 -08:00
Dalton Hubble	a37aff7f35	Update CHANGELOG.md for v1.9.3	2018-02-11 10:59:16 -08:00
Dalton Hubble	03d23bfde7	addons: Remove Kubernetes Dashboard manifests and docs * Stop maintaining Kubernetes Dashboard manifests. Dashboard takes an unusual approch to security and is often a security weak point. * Recommendation: Use `kubectl` and avoid using the dashboard. If you must use the dashboard, explore hardening and consider using an authenticating proxy rather than the dashboard's auth features	2018-02-11 10:33:23 -08:00
Dalton Hubble	2c10d24113	addons: Switch to apps/v1 workload APIs * Deployments now belong to the apps/v1 API group * DaemonSets now belong to the apps/v1 API group * RBAC types now belong to the rbac.authorization.k8s.io/v1 API group	2018-02-10 23:56:31 -08:00
Dalton Hubble	82a616c70b	Fix terraform config formatting	2018-02-10 15:18:27 -08:00
Dalton Hubble	2fa7dac247	List aws platform in the Github issue template	2018-02-10 15:16:42 -08:00
Dalton Hubble	a41691b222	Update Kubernetes from v1.9.2 to v1.9.3 * Add flannel service account and limited RBAC cluster role * Change DaemonSets to tolerate NoSchedule and NoExecute taints * Remove deprecated apiserver --etcd-quorum-read flag * Update Calico from v3.0.1 to v3.0.2 * Add Calico GlobalNetworkSet CRD * https://github.com/poseidon/terraform-render-bootkube/pull/44	2018-02-10 13:37:07 -08:00
bkcsfi	9034203d7a	Fix typo in list of maps comment	2018-02-09 19:11:06 -08:00
Dalton Hubble	d42f6d6b5d	Update author's employment disclosure * Typhoon remains independently maintained. Its goals remain unchanged	2018-01-30 15:00:07 -08:00
Dalton Hubble	2fa1840c30	Update flannel from v0.9.0 to v0.10.0 * https://github.com/coreos/flannel/releases/tag/v0.10.0	2018-01-28 23:09:21 -08:00
Dalton Hubble	8e0b8d7e40	Upgrade Calico from 2.6.6 to 3.0.1	2018-01-28 11:47:23 -08:00
Dalton Hubble	a0cf527ccf	Update changelog with recent addon improvements	2018-01-28 01:24:27 -08:00
Dalton Hubble	65321acad2	addons: Add grafana-watcher and bundle dashboards * Add separate Grafana addons docs and screenshots	2018-01-28 01:01:30 -08:00
Dalton Hubble	064ce83f25	addons: Update Prometheus to v2.1.0 * Change service discovery to relabel jobs to align with rule expressions in upstream examples * Use a separate service account for prometheus instead of granting roles to the namespace's default * Use a separate service account for node-exporter * Update node-exporter and kube-state-metrics exporters	2018-01-27 21:00:15 -08:00
Dalton Hubble	c3b0cdddf3	addons: Update nginx-ingress from v0.10.1 to v0.10.2	2018-01-26 17:27:36 -08:00
Dalton Hubble	211ec94c75	addons: Update CLUO from v0.5.0 to v0.6.0 * https://github.com/coreos/container-linux-update-operator/releases/tag/v0.6.0	2018-01-26 17:24:09 -08:00
Dalton Hubble	8aca5a089e	addons: Update nginx-ingress to 0.10.1	2018-01-24 20:34:05 -08:00
Dalton Hubble	3e6e4ea339	Update etcd from 3.2.14 to 3.2.15 * https://github.com/coreos/etcd/releases/tag/v3.2.15	2018-01-23 23:50:04 -08:00
Dalton Hubble	103f1e16d7	addons: Update nginx-ingress to 0.10.0	2018-01-23 09:28:37 -08:00
irontoby	50dd3e3b82	Update Digital Ocean variables / docs to use new droplet sizes	2018-01-20 20:41:13 -05:00