Update CHANGELOG.md for v1.9.1

Add maintenance docs with upgrade policies
* Add best practices for maintenance * Describe blue-green replacement strategy * Mention unsupported in-place edit and node replacement strategies
2025-08-02 19:01:34 +02:00 · 2018-01-09 07:03:04 -08:00 · 2018-01-09 06:54:44 -08:00 · 2018-01-06 16:55:06 -08:00 · 2018-01-06 16:20:34 -08:00 · 2018-01-06 14:58:38 -08:00
59 changed files with 720 additions and 403 deletions
--- a/CHANGES.md
+++ b/CHANGES.md
@ -4,6 +4,46 @@ Notable changes between versions.

 ## Latest

+* Kubernetes [v1.9.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v191)
+* Update kube-dns from 1.14.5 to v1.14.7
+* Update etcd from 3.2.0 to 3.2.13
+* Update Calico from v2.6.4 to v2.6.5
+* Enable portmap to fix hostPort with Calico
+* Service account for controller-manager
+
+## v1.8.6
+
+* Kubernetes [v1.8.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#v186)
+* Update Calico from v2.6.3 to v2.6.4
+
+## v1.8.5
+
+* Kubernetes [v1.8.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#v185)
+* Recommend Container Linux [images](https://coreos.com/releases/) with Docker 17.09
+  * Container Linux stable, beta, and alpha now provide Docker 17.09 (instead
+  of 1.12)
+  * Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09
+* Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69)) 
+* Add optional `cluster_domain_suffix` variable (#74)
+* Use kubernetes-incubator/bootkube v0.9.1
+
+#### Bare-Metal
+
+* Add kubelet `--volume-plugin-dir` flag to allow flexvolume providers ([#61](https://github.com/poseidon/typhoon/pull/61))
+
+#### Addons
+
+* Discourage deploying the Kubernetes Dashboard (security)
+
+## v1.8.4
+
+* Kubernetes v1.8.4
+* Calico related bug fixes
+* Update Calico from v2.6.1 to v2.6.3
+* Update flannel from v0.9.0 to v0.9.1
+* Service accounts for kube-proxy and pod-checkpointer
+* Use kubernetes-incubator/bootkube v0.9.0
+
 ## v1.8.3

 * Kubernetes v1.8.3
@ -86,7 +126,7 @@ Notable changes between versions.
 ## v1.7.3

 * Kubernetes v1.7.3
-* Use kubernete-incubator/bootkube v0.6.1
+* Use kubernetes-incubator/bootkube v0.6.1

 #### Digital Ocean

@ -96,7 +136,7 @@ Notable changes between versions.
 ## v1.7.1

 * Kubernetes v1.7.1
-* Use kubernete-incubator/bootkube v0.6.0
+* Use kubernetes-incubator/bootkube v0.6.0
 * Add Bare-Metal Terraform module (stable)
 * Add Digital Ocean Terraform module (beta)

@ -109,12 +149,12 @@ Notable changes between versions.
 ## v1.6.7

 * Kubernetes v1.6.7
-* Use kubernete-incubator/bootkube v0.5.1
+* Use kubernetes-incubator/bootkube v0.5.1

 ## v1.6.6

 * Kubernetes v1.6.6
-* Use kubernete-incubator/bootkube v0.4.5
+* Use kubernetes-incubator/bootkube v0.4.5
 * Disable locksmithd on hosts, in favor of [CLUO](https://github.com/coreos/container-linux-update-operator).

 ## v1.6.4
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -2,4 +2,4 @@

 ## Developer Certificate of Origin

-By contributing, you agree to the Linux Foundation's Developer Certificate of Origin ([DOC](DCO)). The DCO is a statement that you, the contributor, have the legal right to make your contribution and understand the contribution will be distributed as part of this project.
+By contributing, you agree to the Linux Foundation's Developer Certificate of Origin ([DCO](DCO)). The DCO is a statement that you, the contributor, have the legal right to make your contribution and understand the contribution will be distributed as part of this project.
--- a/README.md
+++ b/README.md
@ -1,4 +1,4 @@
-# Typhoon [![IRC](https://img.shields.io/badge/freenode-%23typhoon-0099ef.svg)]() <img align="right" src="https://storage.googleapis.com/dghubble/spin.png">
+# Typhoon [![IRC](https://img.shields.io/badge/freenode-%23typhoon-0099ef.svg)]() <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">

 Typhoon is a minimal and free Kubernetes distribution.

@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.

 Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.

-## Features
+## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.9.1 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
 * Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
@ -49,7 +49,7 @@ module "google-cloud-yavin" {
  region        = "us-central1"
  dns_zone      = "example.com"
  dns_zone_name = "example-zone"
-  os_image      = "coreos-stable-1465-6-0-v20170817"
+  os_image      = "coreos-stable-1576-5-0-v20180105"

  cluster_name       = "yavin"
  controller_count   = 1
@ -75,12 +75,12 @@ Apply complete! Resources: 37 added, 0 changed, 0 destroyed.
 In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.

 ```sh
-$ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
+$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
 $ kubectl get nodes
 NAME                                          STATUS   AGE    VERSION
-yavin-controller-0.c.example-com.internal     Ready    6m     v1.8.3
-yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.8.3
-yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.8.3
+yavin-controller-0.c.example-com.internal     Ready    6m     v1.9.1
+yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.9.1
+yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.9.1
 ```

 List the pods.
--- a/addons/grafana/deployment.yaml
+++ b/addons/grafana/deployment.yaml
@ -21,7 +21,7 @@ spec:
    spec:
      containers:
        - name: grafana
-          image: grafana/grafana:4.6.1
+          image: grafana/grafana:4.6.3
          env:
            - name: GF_SERVER_HTTP_PORT
              value: "8080"
--- a/addons/heapster/deployment.yaml
+++ b/addons/heapster/deployment.yaml
@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1beta2
 kind: Deployment
 metadata:
  name: heapster
@ -19,7 +19,7 @@ spec:
    spec:
      containers:
        - name: heapster
-          image: gcr.io/google_containers/heapster-amd64:v1.4.3
+          image: gcr.io/google_containers/heapster-amd64:v1.5.0
          command:
            - /heapster
            - --source=kubernetes.summary_api:''
@ -31,16 +31,18 @@ spec:
            initialDelaySeconds: 180
            timeoutSeconds: 5
        - name: heapster-nanny
-          image: gcr.io/google_containers/addon-resizer:2.0
+          image: gcr.io/google_containers/addon-resizer:1.7
          command:
            - /pod_nanny
            - --cpu=80m
            - --extra-cpu=0.5m
            - --memory=140Mi
            - --extra-memory=4Mi
+            - --threshold=5
            - --deployment=heapster
            - --container=heapster
            - --poll-period=300000
+            - --estimator=exponential
          env:
            - name: MY_POD_NAME
              valueFrom:
--- a/addons/nginx-ingress/aws/deployment.yaml
+++ b/addons/nginx-ingress/aws/deployment.yaml
@ -19,7 +19,7 @@ spec:
      hostNetwork: true
      containers:
        - name: nginx-ingress-controller
-          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
+          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
--- a/addons/nginx-ingress/digital-ocean/daemonset.yaml
+++ b/addons/nginx-ingress/digital-ocean/daemonset.yaml
@ -19,7 +19,7 @@ spec:
      hostNetwork: true
      containers:
        - name: nginx-ingress-controller
-          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
+          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
--- a/addons/nginx-ingress/google-cloud/deployment.yaml
+++ b/addons/nginx-ingress/google-cloud/deployment.yaml
@ -19,7 +19,7 @@ spec:
      hostNetwork: true
      containers:
        - name: nginx-ingress-controller
-          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0-beta.17
+          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
--- a/addons/prometheus/rules.yaml
+++ b/addons/prometheus/rules.yaml
@ -7,7 +7,7 @@ data:
  # Rules adapted from those provided by coreos/prometheus-operator and SoundCloud
  alertmanager.rules.yaml: |+
    groups:
-    - name: ./alertmanager.rules
+    - name: alertmanager.rules
      rules:
      - alert: AlertmanagerConfigInconsistent
        expr: count_values("config_hash", alertmanager_config_hash) BY (service) / ON(service)
@ -19,7 +19,6 @@ data:
        annotations:
          description: The configuration of the instances of the Alertmanager cluster
            `{{$labels.service}}` are out of sync.
-          summary: Alertmanager configurations are inconsistent
      - alert: AlertmanagerDownOrMissing
        expr: label_replace(prometheus_operator_alertmanager_spec_replicas, "job", "alertmanager-$1",
          "alertmanager", "(.*)") / ON(job) GROUP_RIGHT() sum(up) BY (job) != 1
@ -29,8 +28,7 @@ data:
        annotations:
          description: An unexpected number of Alertmanagers are scraped or Alertmanagers
            disappeared from discovery.
-          summary: Alertmanager down or not discovered
-      - alert: FailedReload
+      - alert: AlertmanagerFailedReload
        expr: alertmanager_config_last_reload_successful == 0
        for: 10m
        labels:
@ -38,7 +36,6 @@ data:
        annotations:
          description: Reloading Alertmanager's configuration has failed for {{ $labels.namespace
            }}/{{ $labels.pod}}.
-          summary: Alertmanager configuration reload has failed
  etcd3.rules.yaml: |+
    groups:
    - name: ./etcd3.rules
@ -165,7 +162,7 @@ data:
          summary: high commit durations
  general.rules.yaml: |+
    groups:
-    - name: ./general.rules
+    - name: general.rules
      rules:
      - alert: TargetDown
        expr: 100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10
@ -173,63 +170,31 @@ data:
        labels:
          severity: warning
        annotations:
-          description: '{{ $value }}% or more of {{ $labels.job }} targets are down.'
+          description: '{{ $value }}% of {{ $labels.job }} targets are down.'
          summary: Targets are down
-      - alert: TooManyOpenFileDescriptors
-        expr: 100 * (process_open_fds / process_max_fds) > 95
-        for: 10m
-        labels:
-          severity: critical
-        annotations:
-          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
-            $labels.instance }}) is using {{ $value }}% of the available file/socket descriptors.'
-          summary: too many open file descriptors
-      - record: instance:fd_utilization
+      - record: fd_utilization
        expr: process_open_fds / process_max_fds
      - alert: FdExhaustionClose
-        expr: predict_linear(instance:fd_utilization[1h], 3600 * 4) > 1
+        expr: predict_linear(fd_utilization[1h], 3600 * 4) > 1
        for: 10m
        labels:
          severity: warning
        annotations:
-          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
-            $labels.instance }}) instance will exhaust in file/socket descriptors soon'
+          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
+            will exhaust in file/socket descriptors within the next 4 hours'
          summary: file descriptors soon exhausted
      - alert: FdExhaustionClose
-        expr: predict_linear(instance:fd_utilization[10m], 3600) > 1
+        expr: predict_linear(fd_utilization[10m], 3600) > 1
        for: 10m
        labels:
          severity: critical
        annotations:
-          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} ({{
-            $labels.instance }}) instance will exhaust in file/socket descriptors soon'
+          description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
+            will exhaust in file/socket descriptors within the next hour'
          summary: file descriptors soon exhausted
-  kube-apiserver.rules.yaml: |+
-    groups:
-    - name: ./kube-apiserver.rules
-      rules:
-      - alert: K8SApiserverDown
-        expr: absent(up{job="kubernetes-apiservers"} == 1)
-        for: 5m
-        labels:
-          severity: critical
-        annotations:
-          description: Prometheus failed to scrape API server(s), or all API servers have
-            disappeared from service discovery.
-          summary: API server unreachable
-      - alert: K8SApiServerLatency
-        expr: histogram_quantile(0.99, sum(apiserver_request_latencies_bucket{subresource!="log",verb!~"^(?:CONNECT|WATCHLIST|WATCH|PROXY)$"})
-          WITHOUT (instance, resource)) / 1e+06 > 1
-        for: 10m
-        labels:
-          severity: warning
-        annotations:
-          description: 99th percentile Latency for {{ $labels.verb }} requests to the
-            kube-apiserver is higher than 1s.
-          summary: Kubernetes apiserver latency is high
  kube-controller-manager.rules.yaml: |+
    groups:
-    - name: ./kube-controller-manager.rules
+    - name: kube-controller-manager.rules
      rules:
      - alert: K8SControllerManagerDown
        expr: absent(up{kubernetes_name="kube-controller-manager"} == 1)
@ -242,8 +207,53 @@ data:
          summary: Controller manager is down
  kube-scheduler.rules.yaml: |+
    groups:
-    - name: ./kube-scheduler.rules
+    - name: kube-scheduler.rules
      rules:
+      - record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
+        expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.99"
+      - record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
+        expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.9"
+      - record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
+        expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.5"
+      - record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
+        expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.99"
+      - record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
+        expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.9"
+      - record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
+        expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.5"
+      - record: cluster:scheduler_binding_latency_seconds:quantile
+        expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.99"
+      - record: cluster:scheduler_binding_latency_seconds:quantile
+        expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.9"
+      - record: cluster:scheduler_binding_latency_seconds:quantile
+        expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket)
+          BY (le, cluster)) / 1e+06
+        labels:
+          quantile: "0.5"
      - alert: K8SSchedulerDown
        expr: absent(up{kubernetes_name="kube-scheduler"} == 1)
        for: 5m
@ -253,9 +263,65 @@ data:
          description: There is no running K8S scheduler. New pods are not being assigned
            to nodes.
          summary: Scheduler is down
+  kube-state-metrics.rules.yaml: |+
+    groups:
+    - name: kube-state-metrics.rules
+      rules:
+      - alert: DeploymentGenerationMismatch
+        expr: kube_deployment_status_observed_generation != kube_deployment_metadata_generation
+        for: 15m
+        labels:
+          severity: warning
+        annotations:
+          description: Observed deployment generation does not match expected one for
+            deployment {{$labels.namespaces}}{{$labels.deployment}}
+      - alert: DeploymentReplicasNotUpdated
+        expr: ((kube_deployment_status_replicas_updated != kube_deployment_spec_replicas)
+          or (kube_deployment_status_replicas_available != kube_deployment_spec_replicas))
+          unless (kube_deployment_spec_paused == 1)
+        for: 15m
+        labels:
+          severity: warning
+        annotations:
+          description: Replicas are not updated and available for deployment {{$labels.namespaces}}/{{$labels.deployment}}
+      - alert: DaemonSetRolloutStuck
+        expr: kube_daemonset_status_current_number_ready / kube_daemonset_status_desired_number_scheduled
+          * 100 < 100
+        for: 15m
+        labels:
+          severity: warning
+        annotations:
+          description: Only {{$value}}% of desired pods scheduled and ready for daemon
+            set {{$labels.namespaces}}/{{$labels.daemonset}}
+      - alert: K8SDaemonSetsNotScheduled
+        expr: kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled
+          > 0
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: A number of daemonsets are not scheduled.
+          summary: Daemonsets are not scheduled correctly
+      - alert: DaemonSetsMissScheduled
+        expr: kube_daemonset_status_number_misscheduled > 0
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: A number of daemonsets are running where they are not supposed
+            to run.
+          summary: Daemonsets are not scheduled correctly
+      - alert: PodFrequentlyRestarting
+        expr: increase(kube_pod_container_status_restarts[1h]) > 5
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: Pod {{$labels.namespaces}}/{{$labels.pod}} is was restarted {{$value}}
+            times within the last hour
  kubelet.rules.yaml: |+
    groups:
-    - name: ./kubelet.rules
+    - name: kubelet.rules
      rules:
      - alert: K8SNodeNotReady
        expr: kube_node_status_condition{condition="Ready",status="true"} == 0
@ -274,20 +340,17 @@ data:
        labels:
          severity: critical
        annotations:
-          description: '{{ $value }} Kubernetes nodes (more than 10% are in the NotReady
-            state).'
-          summary: Many Kubernetes nodes are Not Ready
+          description: '{{ $value }}% of Kubernetes nodes are not ready'
      - alert: K8SKubeletDown
-        expr: count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"}) > 0.03
+        expr: count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"}) * 100 > 3
        for: 1h
        labels:
          severity: warning
        annotations:
          description: Prometheus failed to scrape {{ $value }}% of kubelets.
-          summary: Many Kubelets cannot be scraped
      - alert: K8SKubeletDown
-        expr: absent(up{job="kubernetes-nodes"} == 1) or count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"})
-          > 0.1
+        expr: (absent(up{job="kubernetes-nodes"} == 1) or count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"}))
+          * 100 > 1
        for: 1h
        labels:
          severity: critical
@ -297,6 +360,7 @@ data:
          summary: Many Kubelets cannot be scraped
      - alert: K8SKubeletTooManyPods
        expr: kubelet_running_pod_count > 100
+        for: 10m
        labels:
          severity: warning
        annotations:
@ -305,124 +369,112 @@ data:
          summary: Kubelet is close to pod limit
  kubernetes.rules.yaml: |+
    groups:
-    - name: ./kubernetes.rules
+    - name: kubernetes.rules
      rules:
-      - record: cluster_namespace_controller_pod_container:spec_memory_limit_bytes
-        expr: sum(label_replace(container_spec_memory_limit_bytes{container_name!=""},
-          "controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
-          controller, pod_name, container_name)
-      - record: cluster_namespace_controller_pod_container:spec_cpu_shares
-        expr: sum(label_replace(container_spec_cpu_shares{container_name!=""}, "controller",
-          "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
-          container_name)
-      - record: cluster_namespace_controller_pod_container:cpu_usage:rate
-        expr: sum(label_replace(irate(container_cpu_usage_seconds_total{container_name!=""}[5m]),
-          "controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
-          controller, pod_name, container_name)
-      - record: cluster_namespace_controller_pod_container:memory_usage:bytes
-        expr: sum(label_replace(container_memory_usage_bytes{container_name!=""}, "controller",
-          "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
-          container_name)
-      - record: cluster_namespace_controller_pod_container:memory_working_set:bytes
-        expr: sum(label_replace(container_memory_working_set_bytes{container_name!=""},
-          "controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
-          controller, pod_name, container_name)
-      - record: cluster_namespace_controller_pod_container:memory_rss:bytes
-        expr: sum(label_replace(container_memory_rss{container_name!=""}, "controller",
-          "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
-          container_name)
-      - record: cluster_namespace_controller_pod_container:memory_cache:bytes
-        expr: sum(label_replace(container_memory_cache{container_name!=""}, "controller",
-          "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
-          container_name)
-      - record: cluster_namespace_controller_pod_container:disk_usage:bytes
-        expr: sum(label_replace(container_disk_usage_bytes{container_name!=""}, "controller",
-          "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace, controller, pod_name,
-          container_name)
-      - record: cluster_namespace_controller_pod_container:memory_pagefaults:rate
-        expr: sum(label_replace(irate(container_memory_failures_total{container_name!=""}[5m]),
-          "controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
-          controller, pod_name, container_name, scope, type)
-      - record: cluster_namespace_controller_pod_container:memory_oom:rate
-        expr: sum(label_replace(irate(container_memory_failcnt{container_name!=""}[5m]),
-          "controller", "$1", "pod_name", "^(.*)-[a-z0-9]+")) BY (cluster, namespace,
-          controller, pod_name, container_name, scope, type)
-      - record: cluster:memory_allocation:percent
-        expr: 100 * sum(container_spec_memory_limit_bytes{pod_name!=""}) BY (cluster)
-          / sum(machine_memory_bytes) BY (cluster)
-      - record: cluster:memory_used:percent
-        expr: 100 * sum(container_memory_usage_bytes{pod_name!=""}) BY (cluster) / sum(machine_memory_bytes)
-          BY (cluster)
-      - record: cluster:cpu_allocation:percent
-        expr: 100 * sum(container_spec_cpu_shares{pod_name!=""}) BY (cluster) / sum(container_spec_cpu_shares{id="/"}
-          * ON(cluster, instance) machine_cpu_cores) BY (cluster)
-      - record: cluster:node_cpu_use:percent
-        expr: 100 * sum(rate(node_cpu{mode!="idle"}[5m])) BY (cluster) / sum(machine_cpu_cores)
-          BY (cluster)
-      - record: cluster_resource_verb:apiserver_latency:quantile_seconds
-        expr: histogram_quantile(0.99, sum(apiserver_request_latencies_bucket) BY (le,
-          cluster, job, resource, verb)) / 1e+06
+      - record: pod_name:container_memory_usage_bytes:sum
+        expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
+          (pod_name)
+      - record: pod_name:container_spec_cpu_shares:sum
+        expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) BY (pod_name)
+      - record: pod_name:container_cpu_usage:sum
+        expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
+          BY (pod_name)
+      - record: pod_name:container_fs_usage_bytes:sum
+        expr: sum(container_fs_usage_bytes{container_name!="POD",pod_name!=""}) BY (pod_name)
+      - record: namespace:container_memory_usage_bytes:sum
+        expr: sum(container_memory_usage_bytes{container_name!=""}) BY (namespace)
+      - record: namespace:container_spec_cpu_shares:sum
+        expr: sum(container_spec_cpu_shares{container_name!=""}) BY (namespace)
+      - record: namespace:container_cpu_usage:sum
+        expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m]))
+          BY (namespace)
+      - record: cluster:memory_usage:ratio
+        expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
+          (cluster) / sum(machine_memory_bytes) BY (cluster)
+      - record: cluster:container_spec_cpu_shares:ratio
+        expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) / 1000
+          / sum(machine_cpu_cores)
+      - record: cluster:container_cpu_usage:ratio
+        expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
+          / sum(machine_cpu_cores)
+      - record: apiserver_latency_seconds:quantile
+        expr: histogram_quantile(0.99, rate(apiserver_request_latencies_bucket[5m])) /
+          1e+06
        labels:
          quantile: "0.99"
-      - record: cluster_resource_verb:apiserver_latency:quantile_seconds
-        expr: histogram_quantile(0.9, sum(apiserver_request_latencies_bucket) BY (le,
-          cluster, job, resource, verb)) / 1e+06
+      - record: apiserver_latency:quantile_seconds
+        expr: histogram_quantile(0.9, rate(apiserver_request_latencies_bucket[5m])) /
+          1e+06
        labels:
          quantile: "0.9"
-      - record: cluster_resource_verb:apiserver_latency:quantile_seconds
-        expr: histogram_quantile(0.5, sum(apiserver_request_latencies_bucket) BY (le,
-          cluster, job, resource, verb)) / 1e+06
+      - record: apiserver_latency_seconds:quantile
+        expr: histogram_quantile(0.5, rate(apiserver_request_latencies_bucket[5m])) /
+          1e+06
        labels:
          quantile: "0.5"
-      - record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
-        expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+      - alert: APIServerLatencyHigh
+        expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
+          > 1
+        for: 10m
        labels:
-          quantile: "0.99"
-      - record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
-        expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+          severity: warning
+        annotations:
+          description: the API server has a 99th percentile latency of {{ $value }} seconds
+            for {{$labels.verb}} {{$labels.resource}}
+      - alert: APIServerLatencyHigh
+        expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
+          > 4
+        for: 10m
        labels:
-          quantile: "0.9"
-      - record: cluster:scheduler_e2e_scheduling_latency:quantile_seconds
-        expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+          severity: critical
+        annotations:
+          description: the API server has a 99th percentile latency of {{ $value }} seconds
+            for {{$labels.verb}} {{$labels.resource}}
+      - alert: APIServerErrorsHigh
+        expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
+          * 100 > 2
+        for: 10m
        labels:
-          quantile: "0.5"
-      - record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
-        expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+          severity: warning
+        annotations:
+          description: API server returns errors for {{ $value }}% of requests
+      - alert: APIServerErrorsHigh
+        expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
+          * 100 > 5
+        for: 10m
        labels:
-          quantile: "0.99"
-      - record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
-        expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
+          severity: critical
+        annotations:
+          description: API server returns errors for {{ $value }}% of requests
+      - alert: K8SApiserverDown
+        expr: absent(up{job="kubernetes-apiservers"} == 1)
+        for: 20m
        labels:
-          quantile: "0.9"
-      - record: cluster:scheduler_scheduling_algorithm_latency:quantile_seconds
-        expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
-        labels:
-          quantile: "0.5"
-      - record: cluster:scheduler_binding_latency:quantile_seconds
-        expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
-        labels:
-          quantile: "0.99"
-      - record: cluster:scheduler_binding_latency:quantile_seconds
-        expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
-        labels:
-          quantile: "0.9"
-      - record: cluster:scheduler_binding_latency:quantile_seconds
-        expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket)
-          BY (le, cluster)) / 1e+06
-        labels:
-          quantile: "0.5"
+          severity: critical
+        annotations:
+          description: No API servers are reachable or all have disappeared from service
+            discovery
  node.rules.yaml: |+
    groups:
-    - name: ./node.rules
+    - name: node.rules
      rules:
+      - record: instance:node_cpu:rate:sum
+        expr: sum(rate(node_cpu{mode!="idle",mode!="iowait",mode!~"^(?:guest.*)$"}[3m]))
+          BY (instance)
+      - record: instance:node_filesystem_usage:sum
+        expr: sum((node_filesystem_size{mountpoint="/"} - node_filesystem_free{mountpoint="/"}))
+          BY (instance)
+      - record: instance:node_network_receive_bytes:rate:sum
+        expr: sum(rate(node_network_receive_bytes[3m])) BY (instance)
+      - record: instance:node_network_transmit_bytes:rate:sum
+        expr: sum(rate(node_network_transmit_bytes[3m])) BY (instance)
+      - record: instance:node_cpu:ratio
+        expr: sum(rate(node_cpu{mode!="idle"}[5m])) WITHOUT (cpu, mode) / ON(instance)
+          GROUP_LEFT() count(sum(node_cpu) BY (instance, cpu)) BY (instance)
+      - record: cluster:node_cpu:sum_rate5m
+        expr: sum(rate(node_cpu{mode!="idle"}[5m]))
+      - record: cluster:node_cpu:ratio
+        expr: cluster:node_cpu:rate5m / count(sum(node_cpu) BY (instance, cpu))
      - alert: NodeExporterDown
        expr: absent(up{kubernetes_name="node-exporter"} == 1)
        for: 10m
@ -430,43 +482,65 @@ data:
          severity: warning
        annotations:
          description: Prometheus could not scrape a node-exporter for more than 10m,
-            or node-exporters have disappeared from discovery.
-          summary: node-exporter cannot be scraped
-      - alert: K8SNodeOutOfDisk
-        expr: kube_node_status_condition{condition="OutOfDisk",status="true"} == 1
+            or node-exporters have disappeared from discovery
+      - alert: NodeDiskRunningFull
+        expr: predict_linear(node_filesystem_free[6h], 3600 * 24) < 0
+        for: 30m
+        labels:
+          severity: warning
+        annotations:
+          description: device {{$labels.device}} on node {{$labels.instance}} is running
+            full within the next 24 hours (mounted at {{$labels.mountpoint}})
+      - alert: NodeDiskRunningFull
+        expr: predict_linear(node_filesystem_free[30m], 3600 * 2) < 0
+        for: 10m
        labels:
-          service: k8s
          severity: critical
        annotations:
-          description: '{{ $labels.node }} has run out of disk space.'
-          summary: Node ran out of disk space.
-      - alert: K8SNodeMemoryPressure
-        expr: kube_node_status_condition{condition="MemoryPressure",status="true"} ==
-          1
-        labels:
-          service: k8s
-          severity: warning
-        annotations:
-          description: '{{ $labels.node }} is under memory pressure.'
-          summary: Node is under memory pressure.
-      - alert: K8SNodeDiskPressure
-        expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1
-        labels:
-          service: k8s
-          severity: warning
-        annotations:
-          description: '{{ $labels.node }} is under disk pressure.'
-          summary: Node is under disk pressure.
+          description: device {{$labels.device}} on node {{$labels.instance}} is running
+            full within the next 2 hours (mounted at {{$labels.mountpoint}})
  prometheus.rules.yaml: |+
    groups:
-    - name: ./prometheus.rules
+    - name: prometheus.rules
      rules:
-      - alert: FailedReload
+      - alert: PrometheusConfigReloadFailed
        expr: prometheus_config_last_reload_successful == 0
        for: 10m
        labels:
          severity: warning
        annotations:
-          description: Reloading Prometheus' configuration has failed for {{ $labels.namespace
-            }}/{{ $labels.pod}}.
-          summary: Prometheus configuration reload has failed
+          description: Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}}
+      - alert: PrometheusNotificationQueueRunningFull
+        expr: predict_linear(prometheus_notifications_queue_length[5m], 60 * 30) > prometheus_notifications_queue_capacity
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: Prometheus' alert notification queue is running full for {{$labels.namespace}}/{{
+            $labels.pod}}
+      - alert: PrometheusErrorSendingAlerts
+        expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
+          > 0.01
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
+            $labels.pod}} to Alertmanager {{$labels.Alertmanager}}
+      - alert: PrometheusErrorSendingAlerts
+        expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
+          > 0.03
+        for: 10m
+        labels:
+          severity: critical
+        annotations:
+          description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
+            $labels.pod}} to Alertmanager {{$labels.Alertmanager}}
+      - alert: PrometheusNotConnectedToAlertmanagers
+        expr: prometheus_notifications_alertmanagers_discovered < 1
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          description: Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected
+            to any Alertmanagers
--- a/aws/container-linux/kubernetes/README.md
+++ b/aws/container-linux/kubernetes/README.md
@ -1,4 +1,4 @@
-# Typhoon
+# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">

 Typhoon is a minimal and free Kubernetes distribution.

@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.

 Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.

-## Features
+## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.9.1 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
 * Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
--- a/aws/container-linux/kubernetes/bootkube.tf
+++ b/aws/container-linux/kubernetes/bootkube.tf
@ -1,13 +1,14 @@
 # Self-hosted Kubernetes assets (kubeconfig, manifests)
 module "bootkube" {
-  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
+  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=b83e321b350ac549c45ed6a05ffd8683336fb9f4"

-  cluster_name = "${var.cluster_name}"
-  api_servers  = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
-  etcd_servers = ["${aws_route53_record.etcds.*.fqdn}"]
-  asset_dir    = "${var.asset_dir}"
-  networking   = "${var.networking}"
-  network_mtu  = "${var.network_mtu}"
-  pod_cidr     = "${var.pod_cidr}"
-  service_cidr = "${var.service_cidr}"
+  cluster_name          = "${var.cluster_name}"
+  api_servers           = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
+  etcd_servers          = ["${aws_route53_record.etcds.*.fqdn}"]
+  asset_dir             = "${var.asset_dir}"
+  networking            = "${var.networking}"
+  network_mtu           = "${var.network_mtu}"
+  pod_cidr              = "${var.pod_cidr}"
+  service_cidr          = "${var.service_cidr}"
+  cluster_domain_suffix = "${var.cluster_domain_suffix}"
 }
--- a/aws/container-linux/kubernetes/cl/controller.yaml.tmpl
+++ b/aws/container-linux/kubernetes/cl/controller.yaml.tmpl
@ -7,7 +7,7 @@ systemd:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
-            Environment="ETCD_IMAGE_TAG=v3.2.0"
+            Environment="ETCD_IMAGE_TAG=v3.2.13"
            Environment="ETCD_NAME=${etcd_name}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -41,11 +41,12 @@ systemd:
        ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
        [Install]
        RequiredBy=kubelet.service
+        RequiredBy=etcd-member.service
    - name: kubelet.service
      enable: true
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -72,7 +73,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --kubeconfig=/etc/kubernetes/kubeconfig \
@ -128,7 +129,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.1
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -147,11 +148,9 @@ storage:
          # Wrapper for bootkube start
          set -e
          # Move experimental manifests
-          [ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
-          [ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
-          [ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
+          [ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
          BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
-          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
+          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.9.1}"
          BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
          exec /usr/bin/rkt run \
            --trust-keys-from-https \
--- a/aws/container-linux/kubernetes/cl/worker.yaml.tmpl
+++ b/aws/container-linux/kubernetes/cl/worker.yaml.tmpl
@ -22,7 +22,7 @@ systemd:
      enable: true
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -49,7 +49,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --kubeconfig=/etc/kubernetes/kubeconfig \
@ -103,7 +103,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.1
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -121,7 +121,7 @@ storage:
            --volume config,kind=host,source=/etc/kubernetes \
            --mount volume=config,target=/etc/kubernetes \
            --insecure-options=image \
-            docker://gcr.io/google_containers/hyperkube:v1.8.3 \
+            docker://gcr.io/google_containers/hyperkube:v1.9.1 \
            --net=host \
            --dns=host \
            --exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
--- a/aws/container-linux/kubernetes/controllers.tf
+++ b/aws/container-linux/kubernetes/controllers.tf
@ -54,6 +54,7 @@ data "template_file" "controller_config" {

    k8s_dns_service_ip      = "${cidrhost(var.service_cidr, 10)}"
    ssh_authorized_key      = "${var.ssh_authorized_key}"
+    cluster_domain_suffix   = "${var.cluster_domain_suffix}"
    kubeconfig_ca_cert      = "${module.bootkube.ca_cert}"
    kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
    kubeconfig_kubelet_key  = "${module.bootkube.kubelet_key}"
--- a/aws/container-linux/kubernetes/variables.tf
+++ b/aws/container-linux/kubernetes/variables.tf
@ -94,3 +94,9 @@ EOD
  type    = "string"
  default = "10.3.0.0/16"
 }
+
+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
--- a/aws/container-linux/kubernetes/workers.tf
+++ b/aws/container-linux/kubernetes/workers.tf
@ -59,6 +59,7 @@ data "template_file" "worker_config" {
    k8s_dns_service_ip      = "${cidrhost(var.service_cidr, 10)}"
    k8s_etcd_service_ip     = "${cidrhost(var.service_cidr, 15)}"
    ssh_authorized_key      = "${var.ssh_authorized_key}"
+    cluster_domain_suffix   = "${var.cluster_domain_suffix}"
    kubeconfig_ca_cert      = "${module.bootkube.ca_cert}"
    kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
    kubeconfig_kubelet_key  = "${module.bootkube.kubelet_key}"
--- a/bare-metal/container-linux/kubernetes/README.md
+++ b/bare-metal/container-linux/kubernetes/README.md
@ -1,4 +1,4 @@
-# Typhoon
+# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">

 Typhoon is a minimal and free Kubernetes distribution.

@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.

 Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.

-## Features
+## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.9.1 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
 * Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
--- a/bare-metal/container-linux/kubernetes/bootkube.tf
+++ b/bare-metal/container-linux/kubernetes/bootkube.tf
@ -1,13 +1,14 @@
 # Self-hosted Kubernetes assets (kubeconfig, manifests)
 module "bootkube" {
-  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
+  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=b83e321b350ac549c45ed6a05ffd8683336fb9f4"

-  cluster_name = "${var.cluster_name}"
-  api_servers  = ["${var.k8s_domain_name}"]
-  etcd_servers = ["${var.controller_domains}"]
-  asset_dir    = "${var.asset_dir}"
-  networking   = "${var.networking}"
-  network_mtu  = "${var.network_mtu}"
-  pod_cidr     = "${var.pod_cidr}"
-  service_cidr = "${var.service_cidr}"
+  cluster_name          = "${var.cluster_name}"
+  api_servers           = ["${var.k8s_domain_name}"]
+  etcd_servers          = ["${var.controller_domains}"]
+  asset_dir             = "${var.asset_dir}"
+  networking            = "${var.networking}"
+  network_mtu           = "${var.network_mtu}"
+  pod_cidr              = "${var.pod_cidr}"
+  service_cidr          = "${var.service_cidr}"
+  cluster_domain_suffix = "${var.cluster_domain_suffix}"
 }
--- a/bare-metal/container-linux/kubernetes/cl/controller.yaml.tmpl
+++ b/bare-metal/container-linux/kubernetes/cl/controller.yaml.tmpl
@ -7,7 +7,7 @@ systemd:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
-            Environment="ETCD_IMAGE_TAG=v3.2.0"
+            Environment="ETCD_IMAGE_TAG=v3.2.13"
            Environment="ETCD_NAME=${etcd_name}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
@ -50,10 +50,11 @@ systemd:
        ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
        [Install]
        RequiredBy=kubelet.service
+        RequiredBy=etcd-member.service
    - name: kubelet.service
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -73,6 +74,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -80,7 +82,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --hostname-override=${domain_name} \
@ -89,7 +91,8 @@ systemd:
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/master \
          --pod-manifest-path=/etc/kubernetes/manifests \
-          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule
+          --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=10
@ -114,7 +117,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.1
    - path: /etc/hostname
      filesystem: root
      mode: 0644
@ -139,11 +142,9 @@ storage:
          # Wrapper for bootkube start
          set -e
          # Move experimental manifests
-          [ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
-          [ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
-          [ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
+          [ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
          BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
-          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
+          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.9.1}"
          BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
          exec /usr/bin/rkt run \
            --trust-keys-from-https \
--- a/bare-metal/container-linux/kubernetes/cl/worker.yaml.tmpl
+++ b/bare-metal/container-linux/kubernetes/cl/worker.yaml.tmpl
@ -30,7 +30,7 @@ systemd:
    - name: kubelet.service
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -50,6 +50,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -57,7 +58,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --hostname-override=${domain_name} \
@ -65,7 +66,8 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/node \
-          --pod-manifest-path=/etc/kubernetes/manifests
+          --pod-manifest-path=/etc/kubernetes/manifests \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=5
@ -80,7 +82,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.1
    - path: /etc/hostname
      filesystem: root
      mode: 0644
--- a/bare-metal/container-linux/kubernetes/profiles.tf
+++ b/bare-metal/container-linux/kubernetes/profiles.tf
@ -8,6 +8,7 @@ resource "matchbox_profile" "container-linux-install" {
  ]

  args = [
+    "initrd=coreos_production_pxe_image.cpio.gz",
    "coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
    "coreos.first_boot=yes",
    "console=tty0",
@ -44,6 +45,7 @@ resource "matchbox_profile" "cached-container-linux-install" {
  ]

  args = [
+    "initrd=coreos_production_pxe_image.cpio.gz",
    "coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
    "coreos.first_boot=yes",
    "console=tty0",
@ -82,11 +84,12 @@ data "template_file" "controller-configs" {
  template = "${file("${path.module}/cl/controller.yaml.tmpl")}"

  vars {
-    domain_name          = "${element(var.controller_domains, count.index)}"
-    etcd_name            = "${element(var.controller_names, count.index)}"
-    etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
-    k8s_dns_service_ip   = "${module.bootkube.kube_dns_service_ip}"
-    ssh_authorized_key   = "${var.ssh_authorized_key}"
+    domain_name           = "${element(var.controller_domains, count.index)}"
+    etcd_name             = "${element(var.controller_names, count.index)}"
+    etcd_initial_cluster  = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
+    k8s_dns_service_ip    = "${module.bootkube.kube_dns_service_ip}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
+    ssh_authorized_key    = "${var.ssh_authorized_key}"

    # Terraform evaluates both sides regardless and element cannot be used on 0 length lists
    networkd_content = "${length(var.controller_networkds) == 0 ? "" : element(concat(var.controller_networkds, list("")), count.index)}"
@ -106,9 +109,10 @@ data "template_file" "worker-configs" {
  template = "${file("${path.module}/cl/worker.yaml.tmpl")}"

  vars {
-    domain_name        = "${element(var.worker_domains, count.index)}"
-    k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
-    ssh_authorized_key = "${var.ssh_authorized_key}"
+    domain_name           = "${element(var.worker_domains, count.index)}"
+    k8s_dns_service_ip    = "${module.bootkube.kube_dns_service_ip}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
+    ssh_authorized_key    = "${var.ssh_authorized_key}"

    # Terraform evaluates both sides regardless and element cannot be used on 0 length lists
    networkd_content = "${length(var.worker_networkds) == 0 ? "" : element(concat(var.worker_networkds, list("")), count.index)}"
--- a/bare-metal/container-linux/kubernetes/variables.tf
+++ b/bare-metal/container-linux/kubernetes/variables.tf
@ -92,6 +92,12 @@ EOD

 # optional

+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
+
 variable "cached_install" {
  type        = "string"
  default     = "false"
--- a/bare-metal/container-linux/pxe-worker/cl/bootkube-worker.yaml.tmpl
+++ b/bare-metal/container-linux/pxe-worker/cl/bootkube-worker.yaml.tmpl
@ -30,7 +30,7 @@ systemd:
    - name: kubelet.service
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -50,6 +50,7 @@ systemd:
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
        ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
        ExecStartPre=/bin/mkdir -p /var/lib/cni
+        ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
        ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
        ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
        ExecStart=/usr/lib/coreos/kubelet-wrapper \
@ -57,7 +58,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns={{.k8s_dns_service_ip}} \
-          --cluster_domain=cluster.local \
+          --cluster_domain={{.cluster_domain_suffix}} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --hostname-override={{.domain_name}} \
@ -65,7 +66,8 @@ systemd:
          --lock-file=/var/run/lock/kubelet.lock \
          --network-plugin=cni \
          --node-labels=node-role.kubernetes.io/node \
-          --pod-manifest-path=/etc/kubernetes/manifests
+          --pod-manifest-path=/etc/kubernetes/manifests \
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
        ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
        Restart=always
        RestartSec=5
@ -96,7 +98,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.1
    - path: /etc/hostname
      filesystem: root
      mode: 0644
--- a/bare-metal/container-linux/pxe-worker/groups.tf
+++ b/bare-metal/container-linux/pxe-worker/groups.tf
@ -13,9 +13,10 @@ resource "matchbox_group" "workers" {
    etcd_endpoints = "${join(",", formatlist("%s:2379", var.controller_domains))}"

    # TODO
-    etcd_on_host        = "true"
-    k8s_etcd_service_ip = "10.3.0.15"
-    k8s_dns_service_ip  = "${var.kube_dns_service_ip}"
-    ssh_authorized_key  = "${var.ssh_authorized_key}"
+    etcd_on_host          = "true"
+    k8s_etcd_service_ip   = "10.3.0.15"
+    k8s_dns_service_ip    = "${var.kube_dns_service_ip}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
+    ssh_authorized_key    = "${var.ssh_authorized_key}"
  }
 }
--- a/bare-metal/container-linux/pxe-worker/profiles.tf
+++ b/bare-metal/container-linux/pxe-worker/profiles.tf
@ -8,6 +8,7 @@ resource "matchbox_profile" "bootkube-worker-pxe" {
  ]

  args = [
+    "initrd=coreos_production_pxe_image.cpio.gz",
    "coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
    "coreos.first_boot=yes",
    "console=tty0",
--- a/bare-metal/container-linux/pxe-worker/variables.tf
+++ b/bare-metal/container-linux/pxe-worker/variables.tf
@ -64,3 +64,9 @@ variable "kernel_args" {
    "root=/dev/sda1",
  ]
 }
+
+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
--- a/digital-ocean/container-linux/kubernetes/README.md
+++ b/digital-ocean/container-linux/kubernetes/README.md
@ -1,4 +1,4 @@
-# Typhoon
+# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">

 Typhoon is a minimal and free Kubernetes distribution.

@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.

 Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.

-## Features
+## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.9.1 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
 * Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
--- a/digital-ocean/container-linux/kubernetes/bootkube.tf
+++ b/digital-ocean/container-linux/kubernetes/bootkube.tf
@ -1,13 +1,14 @@
 # Self-hosted Kubernetes assets (kubeconfig, manifests)
 module "bootkube" {
-  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
+  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=b83e321b350ac549c45ed6a05ffd8683336fb9f4"

-  cluster_name = "${var.cluster_name}"
-  api_servers  = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
-  etcd_servers = "${digitalocean_record.etcds.*.fqdn}"
-  asset_dir    = "${var.asset_dir}"
-  networking   = "${var.networking}"
-  network_mtu  = 1440
-  pod_cidr     = "${var.pod_cidr}"
-  service_cidr = "${var.service_cidr}"
+  cluster_name          = "${var.cluster_name}"
+  api_servers           = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
+  etcd_servers          = "${digitalocean_record.etcds.*.fqdn}"
+  asset_dir             = "${var.asset_dir}"
+  networking            = "${var.networking}"
+  network_mtu           = 1440
+  pod_cidr              = "${var.pod_cidr}"
+  service_cidr          = "${var.service_cidr}"
+  cluster_domain_suffix = "${var.cluster_domain_suffix}"
 }
--- a/digital-ocean/container-linux/kubernetes/cl/controller.yaml.tmpl
+++ b/digital-ocean/container-linux/kubernetes/cl/controller.yaml.tmpl
@ -7,7 +7,7 @@ systemd:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
-            Environment="ETCD_IMAGE_TAG=v3.2.0"
+            Environment="ETCD_IMAGE_TAG=v3.2.13"
            Environment="ETCD_NAME=${etcd_name}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -50,10 +50,11 @@ systemd:
        ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
        [Install]
        RequiredBy=kubelet.service
+        RequiredBy=etcd-member.service
    - name: kubelet.service
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Requires=coreos-metadata.service
        After=coreos-metadata.service
        Wants=rpc-statd.service
@ -83,7 +84,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
@ -119,7 +120,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.1
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -138,11 +139,9 @@ storage:
          # Wrapper for bootkube start
          set -e
          # Move experimental manifests
-          [ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
-          [ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
-          [ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
+          [ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
          BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
-          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
+          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.9.1}"
          BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
          exec /usr/bin/rkt run \
            --trust-keys-from-https \
--- a/digital-ocean/container-linux/kubernetes/cl/worker.yaml.tmpl
+++ b/digital-ocean/container-linux/kubernetes/cl/worker.yaml.tmpl
@ -30,7 +30,7 @@ systemd:
    - name: kubelet.service
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Requires=coreos-metadata.service
        After=coreos-metadata.service
        Wants=rpc-statd.service
@ -60,7 +60,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
@ -94,7 +94,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.1
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -112,7 +112,7 @@ storage:
            --volume config,kind=host,source=/etc/kubernetes \
            --mount volume=config,target=/etc/kubernetes \
            --insecure-options=image \
-            docker://gcr.io/google_containers/hyperkube:v1.8.3 \
+            docker://gcr.io/google_containers/hyperkube:v1.9.1 \
            --net=host \
            --dns=host \
            --exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
--- a/digital-ocean/container-linux/kubernetes/controllers.tf
+++ b/digital-ocean/container-linux/kubernetes/controllers.tf
@ -69,8 +69,9 @@ data "template_file" "controller_config" {
    etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"

    # etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
-    etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
-    k8s_dns_service_ip   = "${cidrhost(var.service_cidr, 10)}"
+    etcd_initial_cluster  = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
+    k8s_dns_service_ip    = "${cidrhost(var.service_cidr, 10)}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
  }
 }

--- a/digital-ocean/container-linux/kubernetes/variables.tf
+++ b/digital-ocean/container-linux/kubernetes/variables.tf
@ -76,3 +76,10 @@ EOD
  type    = "string"
  default = "10.3.0.0/16"
 }
+
+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
+
--- a/digital-ocean/container-linux/kubernetes/workers.tf
+++ b/digital-ocean/container-linux/kubernetes/workers.tf
@ -43,8 +43,9 @@ data "template_file" "worker_config" {
  template = "${file("${path.module}/cl/worker.yaml.tmpl")}"

  vars = {
-    k8s_dns_service_ip  = "${cidrhost(var.service_cidr, 10)}"
-    k8s_etcd_service_ip = "${cidrhost(var.service_cidr, 15)}"
+    k8s_dns_service_ip    = "${cidrhost(var.service_cidr, 10)}"
+    k8s_etcd_service_ip   = "${cidrhost(var.service_cidr, 15)}"
+    cluster_domain_suffix = "${var.cluster_domain_suffix}"
  }
 }

--- a/docs/addons/cluo.md
+++ b/docs/addons/cluo.md
@ -12,13 +12,13 @@ kubectl apply -f addons/cluo -R

 ## Usage

-`update-agent` runs as a DaemonSet and annotates a node when `update-engine.service` indiates an update has been installed and a reboot is needed. It also adds additional labels and annotations to nodes.
+`update-agent` runs as a DaemonSet and annotates a node when `update-engine.service` indicates an update has been installed and a reboot is needed. It also adds additional labels and annotations to nodes.

 ```
 $ kubectl get nodes --show-labels
 ...
 container-linux-update.v1.coreos.com/group=stable
-container-linux-update.v1.coreos.com/version=1465.6.0
+container-linux-update.v1.coreos.com/version=1576.5.0
 ```

 `update-operator` ensures one node reboots at a time and that pods are drained prior to reboot.
--- a/docs/addons/dashboard.md
+++ b/docs/addons/dashboard.md
@ -1,5 +1,8 @@
 # Kubernetes Dashboard

+!!! warning
+    The Kubernetes Dashboard takes [unusual approaches](https://github.com/kubernetes/dashboard/wiki/Access-control#authorization-header) to security and is often a point of security escalations. We recommend you do don't deploy it and get familiar with `kubectl`, if possible.
+
 The Kubernetes [Dashboard](https://github.com/kubernetes/dashboard) provides a web UI to manage a Kubernetes cluster for those who prefer an alternative to `kubectl`.

 ## Create
--- a/docs/addons/heapster.md
+++ b/docs/addons/heapster.md
@ -1,6 +1,6 @@
 # Heapster

-[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command and Kubernetes dashbard graphs.
+[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command and Kubernetes dashboard graphs.

 ## Create

--- a/docs/addons/prometheus.md
+++ b/docs/addons/prometheus.md
@ -61,7 +61,7 @@ Use `kubectl` to authenticate to the apiserver and create a local port-forward t
 kubectl port-forward grafana-POD-ID 8080 -n monitoring
 ```

-Visit [127.0.0.1:8080](http://127.0.0.1:8080), add the prometheus data-source (http://prometheus.monitoring.svc.cluster.local), and import your desired dashboard (e.g. 315).
+Visit [127.0.0.1:8080](http://127.0.0.1:8080), add the prometheus data-source (http://prometheus.monitoring.svc.cluster.local), and import your desired dashboard (e.g. [Grafana Dashboard 315](https://grafana.com/dashboards/315)).

 ![Grafana Dashboard](/img/grafana-dashboard.png)

--- a/docs/aws.md
+++ b/docs/aws.md
@ -1,6 +1,6 @@
 # AWS

-In this tutorial, we'll create a Kubernetes v1.8.3 cluster on AWS.
+In this tutorial, we'll create a Kubernetes v1.9.1 cluster on AWS.

 We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module. On apply, a VPC, gateway, subnets, auto-scaling groups of controllers and workers, network load balancers for controllers and workers, and security groups will be created.

@ -10,11 +10,11 @@ Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube

 * AWS Account and IAM credentials
 * AWS Route53 DNS Zone (registered Domain Name or delegated subdomain)
-* Terraform v0.10.4+ and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
+* Terraform v0.10.x and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally

 ## Terraform Setup

-Install [Terraform](https://www.terraform.io/downloads.html) v0.10.1 on your system.
+Install [Terraform](https://www.terraform.io/downloads.html) v0.10.x on your system.

 ```sh
 $ terraform version
@ -103,7 +103,7 @@ ssh-add -L
 ```

 !!! warning
-    `terrafrom apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
+    `terraform apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.

 ## Apply

@ -119,7 +119,7 @@ Get or update Terraform modules.
 $ terraform get            # downloads missing modules
 $ terraform get --update   # updates all modules
 Get: git::https://github.com/poseidon/typhoon (update)
-Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.8.2 (update)
+Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.9.1 (update)
 ```

 Plan the resources to be created.
@ -148,12 +148,12 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
 [Install kubectl](https://coreos.com/kubernetes/docs/latest/configure-kubectl.html) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.

 ```
-$ KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
+$ export KUBECONFIG=/home/user/.secrets/clusters/tempest/auth/kubeconfig
 $ kubectl get nodes
 NAME             STATUS    AGE       VERSION        
-ip-10-0-12-221   Ready     34m       v1.8.3
-ip-10-0-19-112   Ready     34m       v1.8.3
-ip-10-0-4-22     Ready     34m       v1.8.3
+ip-10-0-12-221   Ready     34m       v1.9.1
+ip-10-0-19-112   Ready     34m       v1.9.1
+ip-10-0-4-22     Ready     34m       v1.9.1
 ```

 List the pods.
@ -179,7 +179,7 @@ kube-system   pod-checkpointer-4kxtl-ip-10-0-12-221     1/1    Running   0

 ## Going Further

-Learn about [version pinning](concepts.md#versioning), maintenance, and [addons](addons/overview.md).
+Learn about [version pinning](concepts.md#versioning), [maintenance](topics/maintenance.md), and [addons](addons/overview.md).

 !!! note
    On Container Linux clusters, install the `container-linux-update-operator` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
@ -201,7 +201,7 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]

 Clusters create a DNS A record `${cluster_name}.${dns_zone}` to resolve a network load balancer backed by controller instances. This FQDN is used by workers and `kubectl` to access the apiserver. In this example, the cluster's apiserver would be accessible at `tempest.aws.example.com`.

-You'll need a registered domain name or subdomain registered in a AWS Route53 DNS zone. You can set this up once and create many clusters with unqiue names.
+You'll need a registered domain name or subdomain registered in a AWS Route53 DNS zone. You can set this up once and create many clusters with unique names.

 ```tf
 resource "aws_route53_zone" "zone-for-clusters" {
@ -227,7 +227,8 @@ Reference the DNS zone id with `"${aws_route53_zone.zone-for-clusters.zone_id}"`
 | network_mtu | CNI interface MTU (calico only) | 1480 | 8981 |
 | host_cidr | CIDR range to assign to EC2 instances | "10.0.0.0/16" | "10.1.0.0/16" |
 | pod_cidr | CIDR range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
-| service_cidr | CIDR range to assgin to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
+| service_cidr | CIDR range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
+| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by kube-dns. | "cluster.local" | "k8s.example.com" |

 Check the list of valid [instance types](https://aws.amazon.com/ec2/instance-types/).

--- a/docs/bare-metal.md
+++ b/docs/bare-metal.md
@ -1,6 +1,6 @@
 # Bare-Metal

-In this tutorial, we'll network boot and provison a Kubernetes v1.8.3 cluster on bare-metal.
+In this tutorial, we'll network boot and provision a Kubernetes v1.9.1 cluster on bare-metal.

 First, we'll deploy a [Matchbox](https://github.com/coreos/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Container Linux to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers.

@ -12,7 +12,7 @@ Controllers are provisioned as etcd peers and run `etcd-member` (etcd3) and `kub
 * PXE-enabled [network boot](https://coreos.com/matchbox/docs/latest/network-setup.html) environment
 * Matchbox v0.6+ deployment with API enabled
 * Matchbox credentials `client.crt`, `client.key`, `ca.crt`
-* Terraform v0.10.4+ and [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) installed locally
+* Terraform v0.10.x and [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) installed locally

 ## Machines

@ -31,7 +31,7 @@ Configure each machine to boot from the disk [^1] through IPMI or the BIOS menu.
 ipmitool -H node1 -U USER -P PASS chassis bootdev disk options=persistent
 ```
 
-During provisioning, you'll explicitly set the boot device to `pxe` for the next boot only. Machines will install (overwrite) the operting system to disk on PXE boot and reboot into the disk install.
+During provisioning, you'll explicitly set the boot device to `pxe` for the next boot only. Machines will install (overwrite) the operating system to disk on PXE boot and reboot into the disk install.

 !!! tip ""
    Ask your hardware vendor to provide MACs and preconfigure IPMI, if possible. With it, you can rack new servers, `terraform apply` with new info, and power on machines that network boot and provision into clusters.
@ -105,11 +105,11 @@ Read about the [many ways](https://coreos.com/matchbox/docs/latest/network-setup
 * Place Matchbox behind a menu entry (timeout and default to Matchbox)

 !!! note ""
-    TFTP chainloding to modern boot firmware, like iPXE, avoids issues with old NICs and allows faster transfer protocols like HTTP to be used.
+    TFTP chainloading to modern boot firmware, like iPXE, avoids issues with old NICs and allows faster transfer protocols like HTTP to be used.

 ## Terraform Setup

-Install [Terraform](https://www.terraform.io/downloads.html) v0.9.2+ on your system.
+Install [Terraform](https://www.terraform.io/downloads.html) v0.10.x on your system.

 ```sh
 $ terraform version
@ -162,7 +162,7 @@ module "bare-metal-mercury" {
  # install
  matchbox_http_endpoint  = "http://matchbox.example.com"
  container_linux_channel = "stable"
-  container_linux_version = "1520.6.0"
+  container_linux_version = "1576.5.0"
  ssh_authorized_key      = "ssh-rsa AAAAB3Nz..."

  # cluster
@ -203,7 +203,7 @@ ssh-add -L
 ```

 !!! warning
-    `terrafrom apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
+    `terraform apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.

 ## Apply

@ -219,7 +219,7 @@ Get or update Terraform modules.
 $ terraform get            # downloads missing modules
 $ terraform get --update   # updates all modules
 Get: git::https://github.com/poseidon/typhoon (update)
-Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.8.2 (update)
+Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.9.1 (update)
 ```

 Plan the resources to be created.
@ -287,12 +287,12 @@ bootkube[5]: Tearing down temporary bootstrap control plane...
 [Install kubectl](https://coreos.com/kubernetes/docs/latest/configure-kubectl.html) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.

 ```
-$ KUBECONFIG=/home/user/.secrets/clusters/mercury/auth/kubeconfig
+$ export KUBECONFIG=/home/user/.secrets/clusters/mercury/auth/kubeconfig
 $ kubectl get nodes
 NAME                STATUS    AGE       VERSION
-node1.example.com   Ready     11m       v1.8.3
-node2.example.com   Ready     11m       v1.8.3
-node3.example.com   Ready     11m       v1.8.3
+node1.example.com   Ready     11m       v1.9.1
+node2.example.com   Ready     11m       v1.9.1
+node3.example.com   Ready     11m       v1.9.1
 ```

 List the pods.
@ -319,7 +319,7 @@ kube-system   pod-checkpointer-wf65d-node1.example.com   1/1       Running   0

 ## Going Further

-Learn about [version pinning](concepts.md#versioning), maintenance, and [addons](addons/overview.md).
+Learn about [version pinning](concepts.md#versioning), [maintenance](topics/maintenance.md), and [addons](addons/overview.md).

 !!! note
    On Container Linux clusters, install the `container-linux-update-operator` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
@ -332,7 +332,7 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]
 |:-----|:------------|:--------|
 | matchbox_http_endpoint | Matchbox HTTP read-only endpoint | http://matchbox.example.com:8080 |
 | container_linux_channel | Container Linux channel | stable, beta, alpha |
-| container_linux_version | Container Linux version of the kernel/initrd to PXE and the image to install | 1465.6.0 |
+| container_linux_version | Container Linux version of the kernel/initrd to PXE and the image to install | 1576.5.0 |
 | cluster_name | Cluster name | mercury |
 | k8s_domain_name | FQDN resolving to the controller(s) nodes. Workers and kubectl will communicate with this endpoint | "myk8s.example.com" |
 | ssh_authorized_key | SSH public key for ~/.ssh/authorized_keys | "ssh-rsa AAAAB3Nz..." |
@ -354,6 +354,7 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]
 | networking | Choice of networking provider | "calico" | "calico" or "flannel" |
 | network_mtu | CNI interface MTU (calico-only) | 1480 | - | 
 | pod_cidr | CIDR range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
-| service_cidr | CIDR range to assgin to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
+| service_cidr | CIDR range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
+| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by kube-dns. | "cluster.local" | "k8s.example.com" |
 | kernel_args | Additional kernel args to provide at PXE boot | [] | "kvm-intel.nested=1" |

--- a/docs/digital-ocean.md
+++ b/docs/digital-ocean.md
@ -1,6 +1,6 @@
 # Digital Ocean

-In this tutorial, we'll create a Kubernetes v1.8.3 cluster on Digital Ocean.
+In this tutorial, we'll create a Kubernetes v1.9.1 cluster on Digital Ocean.

 We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module. On apply, firewall rules, DNS records, tags, and droplets for Kubernetes controllers and workers will be created.

@ -10,11 +10,11 @@ Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube

 * Digital Ocean Account and Token
 * Digital Ocean Domain (registered Domain Name or delegated subdomain)
-* Terraform v0.10.4+ and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
+* Terraform v0.10.x and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally

 ## Terraform Setup

-Install [Terraform](https://www.terraform.io/downloads.html) v0.10.1+ on your system.
+Install [Terraform](https://www.terraform.io/downloads.html) v0.10.x on your system.

 ```sh
 $ terraform version
@ -98,7 +98,7 @@ ssh-add -L
 ```

 !!! warning
-    `terrafrom apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
+    `terraform apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.

 ## Apply

@ -114,7 +114,7 @@ Get or update Terraform modules.
 $ terraform get            # downloads missing modules
 $ terraform get --update   # updates all modules
 Get: git::https://github.com/poseidon/typhoon (update)
-Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.8.2 (update)
+Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.9.1 (update)
 ```

 Plan the resources to be created.
@ -144,12 +144,12 @@ In 3-6 minutes, the Kubernetes cluster will be ready.
 [Install kubectl](https://coreos.com/kubernetes/docs/latest/configure-kubectl.html) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.

 ```
-$ KUBECONFIG=/home/user/.secrets/clusters/nemo/auth/kubeconfig
+$ export KUBECONFIG=/home/user/.secrets/clusters/nemo/auth/kubeconfig
 $ kubectl get nodes
 NAME             STATUS    AGE       VERSION
-10.132.110.130   Ready     10m       v1.8.3
-10.132.115.81    Ready     10m       v1.8.3
-10.132.124.107   Ready     10m       v1.8.3
+10.132.110.130   Ready     10m       v1.9.1
+10.132.115.81    Ready     10m       v1.9.1
+10.132.124.107   Ready     10m       v1.9.1
 ```

 List the pods.
@ -174,7 +174,7 @@ kube-system   pod-checkpointer-pr1lq-10.132.115.81       1/1       Running   0

 ## Going Further

-Learn about [version pinning](concepts.md#versioning), maintenance, and [addons](addons/overview.md).
+Learn about [version pinning](concepts.md#versioning), [maintenance](topics/maintenance.md), and [addons](addons/overview.md).

 !!! note
    On Container Linux clusters, install the `container-linux-update-operator` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
@ -195,7 +195,7 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]

 Clusters create DNS A records `${cluster_name}.${dns_zone}` to resolve to controller droplets (round robin). This FQDN is used by workers and `kubectl` to access the apiserver. In this example, the cluster's apiserver would be accessible at `nemo.do.example.com`.

-You'll need a registered domain name or subdomain registered in Digital Ocean Domains (i.e. DNS zones). You can set this up once and create many clusters with unqiue names.
+You'll need a registered domain name or subdomain registered in Digital Ocean Domains (i.e. DNS zones). You can set this up once and create many clusters with unique names.

 ```tf
 resource "digitalocean_domain" "zone-for-clusters" {
@ -237,7 +237,8 @@ If you uploaded an SSH key to DigitalOcean (not required), find the fingerprint
 | worker_type | Digital Ocean droplet size | 512mb | 512mb, 1gb, 2gb, 4gb |
 | networking | Choice of networking provider | "flannel" | "flannel" |
 | pod_cidr | CIDR range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
-| service_cidr | CIDR range to assgin to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
+| service_cidr | CIDR range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
+| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by kube-dns. | "cluster.local" | "k8s.example.com" |

 !!! warning
    Do not choose a `controller_type` smaller than `2gb`. The `1gb` droplet is not sufficient for running a controller and bootstrapping will fail.
--- a/docs/google-cloud.md
+++ b/docs/google-cloud.md
@ -1,6 +1,6 @@
 # Google Cloud

-In this tutorial, we'll create a Kubernetes v1.8.3 cluster on Google Compute Engine (not GKE).
+In this tutorial, we'll create a Kubernetes v1.9.1 cluster on Google Compute Engine (not GKE).

 We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module. On apply, a network, firewall rules, managed instance groups of Kubernetes controllers and workers, network load balancers for controllers and workers, and health checks will be created.

@ -10,11 +10,11 @@ Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube

 * Google Cloud Account and Service Account
 * Google Cloud DNS Zone (registered Domain Name or delegated subdomain)
-* Terraform v0.10.4+ and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally
+* Terraform v0.10.x and [terraform-provider-ct](https://github.com/coreos/terraform-provider-ct) installed locally

 ## Terraform Setup

-Install [Terraform](https://www.terraform.io/downloads.html) v0.9.2+ on your system.
+Install [Terraform](https://www.terraform.io/downloads.html) v0.10.x on your system.

 ```sh
 $ terraform version
@ -80,7 +80,7 @@ module "google-cloud-yavin" {
  region        = "us-central1"
  dns_zone      = "example.com"
  dns_zone_name = "example-zone"
-  os_image      = "coreos-stable-1520-6-0-v20171012"
+  os_image      = "coreos-stable-1576-5-0-v20180105"

  cluster_name       = "yavin"
  controller_count   = 1
@ -104,7 +104,7 @@ ssh-add -L
 ```

 !!! warning
-    `terrafrom apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.
+    `terraform apply` will hang connecting to a controller if `ssh-agent` does not contain the SSH key.

 ## Apply

@ -120,7 +120,7 @@ Get or update Terraform modules.
 $ terraform get            # downloads missing modules
 $ terraform get --update   # updates all modules
 Get: git::https://github.com/poseidon/typhoon (update)
-Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.8.2 (update)
+Get: git::https://github.com/poseidon/bootkube-terraform.git?ref=v0.9.1 (update)
 ```

 Plan the resources to be created.
@ -151,12 +151,12 @@ In 4-8 minutes, the Kubernetes cluster will be ready.
 [Install kubectl](https://coreos.com/kubernetes/docs/latest/configure-kubectl.html) on your system. Use the generated `kubeconfig` credentials to access the Kubernetes cluster and list nodes.

 ```
-$ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
+$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
 $ kubectl get nodes
 NAME                                          STATUS   AGE    VERSION
-yavin-controller-0.c.example-com.internal     Ready    6m     v1.8.3
-yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.8.3
-yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.8.3
+yavin-controller-0.c.example-com.internal     Ready    6m     v1.9.1
+yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.9.1
+yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.9.1
 ```

 List the pods.
@ -181,7 +181,7 @@ kube-system   pod-checkpointer-l6lrt                    1/1    Running   0

 ## Going Further

-Learn about [version pinning](concepts.md#versioning), maintenance, and [addons](addons/overview.md).
+Learn about [version pinning](concepts.md#versioning), [maintenance](topics/maintenance.md), and [addons](addons/overview.md).

 !!! note
    On Container Linux clusters, install the `container-linux-update-operator` addon to coordinate reboots and drains when nodes auto-update. Otherwise, updates may not be applied until the next reboot.
@ -197,7 +197,7 @@ Learn about [version pinning](concepts.md#versioning), maintenance, and [addons]
 | dns_zone | Google Cloud DNS zone | "google-cloud.example.com" |
 | dns_zone_name | Google Cloud DNS zone name | "example-zone" |
 | ssh_authorized_key | SSH public key for ~/.ssh_authorized_keys | "ssh-rsa AAAAB3NZ..." |
-| os_image | OS image for compute instances | "coreos-stable-1465-6-0-v20170817" |
+| os_image | OS image for compute instances | "coreos-stable-1576-5-0-v20180105" |
 | asset_dir | Path to a directory where generated assets should be placed (contains secrets) | "/home/user/.secrets/clusters/yavin" |

 Check the list of valid [regions](https://cloud.google.com/compute/docs/regions-zones/regions-zones) and list Container Linux [images](https://cloud.google.com/compute/docs/images) with `gcloud compute images list | grep coreos`.
@ -206,7 +206,7 @@ Check the list of valid [regions](https://cloud.google.com/compute/docs/regions-

 Clusters create a DNS A record `${cluster_name}.${dns_zone}` to resolve a network load balancer backed by controller instances. This FQDN is used by workers and `kubectl` to access the apiserver. In this example, the cluster's apiserver would be accessible at `yavin.google-cloud.example.com`.

-You'll need a registered domain name or subdomain registered in a Google Cloud DNS zone. You can set this up once and create many clusters with unqiue names.
+You'll need a registered domain name or subdomain registered in a Google Cloud DNS zone. You can set this up once and create many clusters with unique names.

 ```tf
 resource "google_dns_managed_zone" "zone-for-clusters" {
@ -229,7 +229,8 @@ resource "google_dns_managed_zone" "zone-for-clusters" {
 | worker_preemptible | If enabled, Compute Engine will terminate controllers randomly within 24 hours | false | true |
 | networking | Choice of networking provider | "calico" | "calico" or "flannel" |
 | pod_cidr | CIDR range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
-| service_cidr | CIDR range to assgin to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
+| service_cidr | CIDR range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
+| cluster_domain_suffix | FQDN suffix for Kubernetes services answered by kube-dns. | "cluster.local" | "k8s.example.com" |

 Check the list of valid [machine types](https://cloud.google.com/compute/docs/machine-types).

--- a/docs/img/typhoon-logo.png
+++ b/docs/img/typhoon-logo.png
--- a/docs/img/typhoon.png
+++ b/docs/img/typhoon.png
--- a/docs/index.md
+++ b/docs/index.md
@ -1,4 +1,4 @@
-# Typhoon <img align="right" src="https://storage.googleapis.com/dghubble/spin.png">
+# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">

 Typhoon is a minimal and free Kubernetes distribution.

@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.

 Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.

-## Features
+## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.9.1 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
 * Ready for Ingress, Dashboards, Metrics and other optional [addons](addons/overview.md)
@ -49,7 +49,7 @@ module "google-cloud-yavin" {
  region        = "us-central1"
  dns_zone      = "example.com"
  dns_zone_name = "example-zone"
-  os_image      = "coreos-stable-1465-6-0-v20170817"
+  os_image      = "coreos-stable-1576-5-0-v20180105"

  cluster_name       = "yavin"
  controller_count   = 1
@ -74,12 +74,12 @@ Apply complete! Resources: 64 added, 0 changed, 0 destroyed.
 In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.

 ```
-$ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
+$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
 $ kubectl get nodes
 NAME                                          STATUS   AGE    VERSION
-yavin-controller-0.c.example-com.internal     Ready    6m     v1.8.3
-yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.8.3
-yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.8.3
+yavin-controller-0.c.example-com.internal     Ready    6m     v1.9.1
+yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.9.1
+yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.9.1
 ```

 List the pods.
--- a/docs/topics/hardware.md
+++ b/docs/topics/hardware.md
@ -2,9 +2,9 @@

 While bare-metal Kubernetes clusters have no special hardware requirements (beyond the [min reqs](/bare-metal.md#requirements)), Typhoon does ensure certain router and server hardware integrates well with Kubernetes.

-## Ubiquitiy
+## Ubiquiti

-Ubiquity EdgeRouters work well with bare-metal Kubernetes clusters. Knowledge about how to setup an EdgeRouter and use the CLI is required.
+Ubiquiti EdgeRouters work well with bare-metal Kubernetes clusters. Knowledge about how to setup an EdgeRouter and use the CLI is required.

 ### PXE

@ -50,7 +50,7 @@ Add `dnsmasq` command line options to enable the TFTP file server.

 ```
 configure
-show servce dns forwarding
+show service dns forwarding
 set service dns forwarding options enable-tftp
 set service dns forwarding options tftp-root=/var/lib/tftpboot
 commit-confirm
@ -143,7 +143,7 @@ commit-confirm

 ### BGP

-Add the EdgeRouter as a global BGP peer for nodes in a Kubernetes cluster (requires Calico). Neighbors will exchange `podCIDR` routes and individual pods will become routeable on the LAN.
+Add the EdgeRouter as a global BGP peer for nodes in a Kubernetes cluster (requires Calico). Neighbors will exchange `podCIDR` routes and individual pods will become routable on the LAN.

 Configure node(s) as BGP neighbors.

--- a/docs/topics/maintenance.md
+++ b/docs/topics/maintenance.md
@ -0,0 +1,129 @@
+# Maintenance
+
+## Best Practices
+
+* Run multiple Kubernetes clusters. Run across platforms. Plan for regional and cloud outages.
+* Require applications be platform agnostic. Moving an application between a Kubernetes AWS cluster and a Kubernetes bare-metal cluster should be normal.
+* Strive to make single-cluster outages tolerable. Practice performing failovers.
+* Strive to make single-cluster outages a non-event. Load balance applications between multiple clusters, automate failover behaviors, and adjust alerting behaviors.
+
+## Versioning
+
+Typhoon provides tagged releases to allow clusters to be versioned using ordinary Terraform configs.
+
+```
+module "google-cloud-yavin" {
+  source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.8.6"
+  ...
+}
+
+module "bare-metal-mercury" {
+  source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes?ref=v1.9.1"
+  ...
+}
+```
+
+Master is updated regularly, so it is recommended to [pin](https://www.terraform.io/docs/modules/sources.html) modules to a [release tag](https://github.com/poseidon/typhoon/releases) or [commit](https://github.com/poseidon/typhoon/commits/master) hash. Pinning ensures `terraform get --update` only fetches the desired version.
+
+## Upgrades
+
+Typhoon recommends upgrading clusters using a blue-green replacement strategy and migrating workloads.
+
+1. Launch new (candidate) clusters from tagged releases
+2. Apply workloads from existing cluster(s)
+3. Evaluate application health and performance
+4. Migrate application traffic to the new cluster
+5. Compare metrics and delete old cluster when ready
+
+Blue-green replacement reduces risk for clusters running critical applications. Candidate clusters allow baseline properties of clusters to be assessed (e.g. pod-to-pod bandwidth). Applying application workloads allows health to be assessed before being subjected to traffic (e.g. detect any changes in Kubernetes behavior between versions). Migration to the new cluster can be controlled according to requirements. Migration may mean updating DNS records to resolve the new cluster's ingress or may involve a load balancer gradually shifting traffic to the new cluster "backend". Retain the old cluster for a time to compare metrics or for fallback if issues arise.
+
+Blue-green replacement provides some subtler benefits as well:
+
+* Encourages investment in tooling for traffic migration and failovers. When a cluster incident arises, shifting applications to a healthy cluster will be second nature.
+* Discourages reliance on in-place opqaue state. Retain confidence in your ability to create infrastructure from scratch.
+* Allows Typhoon to make architecture changes between releases and eases the burden on Typhoon maintainers. By contrast, distros promising in-place upgrades get stuck with their mistakes or require complex and error-prone migrations.
+
+### Bare-Metal
+
+Typhoon bare-metal clusters are provisioned by a PXE-enabled network boot environment and a [Matchbox](https://github.com/coreos/matchbox) service. To upgrade, re-provision machines into a new cluster.
+
+Failover application workloads to another cluster (varies).
+
+```
+kubectl config use-context other-context
+kubectl apply -f mercury -R
+# DNS or load balancer changes
+```
+
+Power off bare-metal machines and set their next boot device to PXE.
+
+```
+ipmitool -H node1.example.com -U USER -P PASS power off
+ipmitool -H node1.example.com -U USER -P PASS chassis bootdev pxe
+```
+
+Delete or comment the Terraform config for the cluster.
+
+```
+- module "bare-metal-mercury" {
+-   source = "git::https://github.com/poseidon/typhoon//bare-metal/container-linux/kubernetes"
+-   ...
+-}
+```
+
+Apply to delete old provisioning configs from Matchbox.
+
+```
+$ terraform apply  
+Apply complete! Resources: 0 added, 0 changed, 55 destroyed.
+```
+
+Re-provision a new cluster by following the bare-metal [tutorial](../bare-metal.md#cluster).
+
+### Cloud
+
+Create a new cluster following the tutorials. Failover application workloads to the new cluster (varies).
+
+```
+kubectl config use-context other-context
+kubectl apply -f mercury -R
+# DNS or load balancer changes
+```
+
+Once you're confident in the new cluster, delete the Terraform config for the old cluster.
+
+```
+- module "google-cloud-yavin" {
+-   source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes"
+-   ...
+-}
+```
+
+Apply to delete the cluster.
+
+```
+$ terraform apply  
+Apply complete! Resources: 0 added, 0 changed, 55 destroyed.
+```
+
+### Alternatives
+
+#### In-place Edits
+
+Typhoon uses a self-hosted Kubernetes control plane which allows certain manifest upgrades to be performed in-place. Components like `apiserver`, `controller-manager`, `scheduler`, `flannel`/`calico`, `kube-dns`, and `kube-proxy` are run on Kubernetes itself and can be edited via `kubectl`. If you're interested, see the bootkube [upgrade docs](https://github.com/kubernetes-incubator/bootkube/blob/master/Documentation/upgrading.md).
+
+In certain scenarios, in-place edits can be useful for quickly rolling out security patches (e.g. bumping `kube-dns`) or prioritizing speed over the safety of a proper cluster re-provision and transition.
+
+!!! note
+    Rarely, we may test certain security in-place edits and mention them as an option in release notes.
+
+!!! warning
+    Typhoon does not support or document in-place edits as an upgrade strategy. They involve inherent risks and we choose not to make recommendations or guarentees about the safety of different in-place upgrades. Its explicitly a non-goal.
+
+#### Node Replacement
+
+Typhoon supports multi-controller clusters, so it is possible to upgrade a cluster by deleting and replacing nodes one by one.
+
+!!! warning
+    Typhoon does not support or document node replacement as an upgrade strategy. It limits Typhoon's ability to make infrastructure and architectural changes between tagged releases. 
+
--- a/docs/topics/performance.md
+++ b/docs/topics/performance.md
@ -19,7 +19,7 @@ Notes:

 ## Network Performance

-Network performance varies based on the platform and CNI plugin. `iperf` was used to measture the bandwidth between different hosts and different pods. Host-to-host shows typical bandwidth between host machines. Pod-to-pod shows the bandwidth between two `iperf` containers.
+Network performance varies based on the platform and CNI plugin. `iperf` was used to measure the bandwidth between different hosts and different pods. Host-to-host shows typical bandwidth between host machines. Pod-to-pod shows the bandwidth between two `iperf` containers.

 | Platform / Plugin          | Theory | Host to Host | Pod to Pod   |
 |----------------------------|-------:|-------------:|-------------:|
@ -36,7 +36,7 @@ Network performance varies based on the platform and CNI plugin. `iperf` was use

 Notes:

-* Calico and Flannel have comparable performance. Platform and configuration differenes dominate.
+* Calico and Flannel have comparable performance. Platform and configuration differences dominate.
 * Neither CNI provider seems to be able to leverage bonded NICs (bare-metal)
 * AWS and Digital Ocean network bandwidth fluctuates more than on other platforms.
 * Only [certain AWS EC2 instance types](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html#jumbo_frame_instances) allow jumbo frames. This is why the default MTU on AWS must be 1480.
--- a/google-cloud/container-linux/kubernetes/README.md
+++ b/google-cloud/container-linux/kubernetes/README.md
@ -1,4 +1,4 @@
-# Typhoon
+# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">

 Typhoon is a minimal and free Kubernetes distribution.

@ -9,9 +9,9 @@ Typhoon is a minimal and free Kubernetes distribution.

 Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.

-## Features
+## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>

-* Kubernetes v1.8.3 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
+* Kubernetes v1.9.1 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
 * Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
 * On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
 * Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
--- a/google-cloud/container-linux/kubernetes/bootkube.tf
+++ b/google-cloud/container-linux/kubernetes/bootkube.tf
@ -1,13 +1,14 @@
 # Self-hosted Kubernetes assets (kubeconfig, manifests)
 module "bootkube" {
-  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.8.2"
+  source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=b83e321b350ac549c45ed6a05ffd8683336fb9f4"

-  cluster_name = "${var.cluster_name}"
-  api_servers  = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
-  etcd_servers = "${module.controllers.etcd_fqdns}"
-  asset_dir    = "${var.asset_dir}"
-  networking   = "${var.networking}"
-  network_mtu  = 1440
-  pod_cidr     = "${var.pod_cidr}"
-  service_cidr = "${var.service_cidr}"
+  cluster_name          = "${var.cluster_name}"
+  api_servers           = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
+  etcd_servers          = "${module.controllers.etcd_fqdns}"
+  asset_dir             = "${var.asset_dir}"
+  networking            = "${var.networking}"
+  network_mtu           = 1440
+  pod_cidr              = "${var.pod_cidr}"
+  service_cidr          = "${var.service_cidr}"
+  cluster_domain_suffix = "${var.cluster_domain_suffix}"
 }
--- a/google-cloud/container-linux/kubernetes/cluster.tf
+++ b/google-cloud/container-linux/kubernetes/cluster.tf
@ -15,6 +15,7 @@ module "controllers" {
  # configuration
  networking              = "${var.networking}"
  service_cidr            = "${var.service_cidr}"
+  cluster_domain_suffix   = "${var.cluster_domain_suffix}"
  kubeconfig_ca_cert      = "${module.bootkube.ca_cert}"
  kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
  kubeconfig_kubelet_key  = "${module.bootkube.kubelet_key}"
@ -36,6 +37,7 @@ module "workers" {

  # configuration
  service_cidr            = "${var.service_cidr}"
+  cluster_domain_suffix   = "${var.cluster_domain_suffix}"
  kubeconfig_ca_cert      = "${module.bootkube.ca_cert}"
  kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
  kubeconfig_kubelet_key  = "${module.bootkube.kubelet_key}"
--- a/google-cloud/container-linux/kubernetes/controllers/cl/controller.yaml.tmpl
+++ b/google-cloud/container-linux/kubernetes/controllers/cl/controller.yaml.tmpl
@ -7,7 +7,7 @@ systemd:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
-            Environment="ETCD_IMAGE_TAG=v3.2.0"
+            Environment="ETCD_IMAGE_TAG=v3.2.13"
            Environment="ETCD_NAME=${etcd_name}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
@ -41,11 +41,12 @@ systemd:
        ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
        [Install]
        RequiredBy=kubelet.service
+        RequiredBy=etcd-member.service
    - name: kubelet.service
      enable: true
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -73,7 +74,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --kubeconfig=/etc/kubernetes/kubeconfig \
@ -129,7 +130,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.1
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -148,11 +149,9 @@ storage:
          # Wrapper for bootkube start
          set -e
          # Move experimental manifests
-          [ -d /opt/bootkube/assets/manifests-* ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
-          [ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
-          [ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
+          [ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
          BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
-          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.8.2}"
+          BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.9.1}"
          BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
          exec /usr/bin/rkt run \
            --trust-keys-from-https \
--- a/google-cloud/container-linux/kubernetes/controllers/controllers.tf
+++ b/google-cloud/container-linux/kubernetes/controllers/controllers.tf
@ -66,6 +66,7 @@ data "template_file" "controller_config" {
    etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"

    k8s_dns_service_ip      = "${cidrhost(var.service_cidr, 10)}"
+    cluster_domain_suffix   = "${var.cluster_domain_suffix}"
    ssh_authorized_key      = "${var.ssh_authorized_key}"
    kubeconfig_ca_cert      = "${var.kubeconfig_ca_cert}"
    kubeconfig_kubelet_cert = "${var.kubeconfig_kubelet_cert}"
--- a/google-cloud/container-linux/kubernetes/controllers/variables.tf
+++ b/google-cloud/container-linux/kubernetes/controllers/variables.tf
@ -69,6 +69,12 @@ EOD
  default = "10.3.0.0/16"
 }

+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
+
 // kubeconfig

 variable "kubeconfig_ca_cert" {
--- a/google-cloud/container-linux/kubernetes/variables.tf
+++ b/google-cloud/container-linux/kubernetes/variables.tf
@ -80,3 +80,9 @@ EOD
  type    = "string"
  default = "10.3.0.0/16"
 }
+
+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
--- a/google-cloud/container-linux/kubernetes/workers/cl/worker.yaml.tmpl
+++ b/google-cloud/container-linux/kubernetes/workers/cl/worker.yaml.tmpl
@ -22,7 +22,7 @@ systemd:
      enable: true
      contents: |
        [Unit]
-        Description=Kubelet via Hyperkube ACI
+        Description=Kubelet via Hyperkube
        Wants=rpc-statd.service
        [Service]
        EnvironmentFile=/etc/kubernetes/kubelet.env
@ -50,7 +50,7 @@ systemd:
          --anonymous-auth=false \
          --client-ca-file=/etc/kubernetes/ca.crt \
          --cluster_dns=${k8s_dns_service_ip} \
-          --cluster_domain=cluster.local \
+          --cluster_domain=${cluster_domain_suffix} \
          --cni-conf-dir=/etc/kubernetes/cni/net.d \
          --exit-on-lock-contention \
          --kubeconfig=/etc/kubernetes/kubeconfig \
@ -104,7 +104,7 @@ storage:
      contents:
        inline: |
          KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
-          KUBELET_IMAGE_TAG=v1.8.3
+          KUBELET_IMAGE_TAG=v1.9.1
    - path: /etc/sysctl.d/max-user-watches.conf
      filesystem: root
      contents:
@ -122,7 +122,7 @@ storage:
            --volume config,kind=host,source=/etc/kubernetes \
            --mount volume=config,target=/etc/kubernetes \
            --insecure-options=image \
-            docker://gcr.io/google_containers/hyperkube:v1.8.3 \
+            docker://gcr.io/google_containers/hyperkube:v1.9.1 \
            --net=host \
            --dns=host \
            --exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
--- a/google-cloud/container-linux/kubernetes/workers/variables.tf
+++ b/google-cloud/container-linux/kubernetes/workers/variables.tf
@ -59,6 +59,12 @@ EOD
  default = "10.3.0.0/16"
 }

+variable "cluster_domain_suffix" {
+  description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
+  type        = "string"
+  default     = "cluster.local"
+}
+
 # kubeconfig

 variable "kubeconfig_ca_cert" {
--- a/google-cloud/container-linux/kubernetes/workers/workers.tf
+++ b/google-cloud/container-linux/kubernetes/workers/workers.tf
@ -24,6 +24,7 @@ data "template_file" "worker_config" {
  vars = {
    k8s_dns_service_ip      = "${cidrhost(var.service_cidr, 10)}"
    k8s_etcd_service_ip     = "${cidrhost(var.service_cidr, 15)}"
+    cluster_domain_suffix   = "${var.cluster_domain_suffix}"
    ssh_authorized_key      = "${var.ssh_authorized_key}"
    kubeconfig_ca_cert      = "${var.kubeconfig_ca_cert}"
    kubeconfig_kubelet_cert = "${var.kubeconfig_kubelet_cert}"
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -1,13 +1,11 @@
 site_name: Typhoon
-theme: material
-site_favicon: 'img/favicon.ico'
-repo_name: 'poseidon/typhoon'
-repo_url: 'https://github.com/poseidon/typhoon'
-extra:
+theme:
+  name: 'material'
  palette:
    primary: 'blue'
    accent: 'light blue'
  logo: 'img/spin.png'
+  favicon: 'img/favicon.ico'
  font:
    text: 'Roboto Slab'
    code: 'Roboto Mono'
@ -16,6 +14,8 @@ extra:
      link: 'https://github.com/poseidon'
    - type: 'twitter'
      link: 'https://twitter.com/typhoon8s'
+repo_name: 'poseidon/typhoon'
+repo_url: 'https://github.com/poseidon/typhoon'
 google_analytics:
  - 'UA-38995133-6'
  - 'auto'
@ -52,6 +52,7 @@ pages:
    - 'Prometheus': 'addons/prometheus.md'
    - 'Dashboard': 'addons/dashboard.md'
  - 'Topics':
+    - 'Maintenance': 'topics/maintenance.md'
    - 'Hardware': 'topics/hardware.md'
    - 'Security': 'topics/security.md'
    - 'Performance': 'topics/performance.md'
--- a/requirements.txt
+++ b/requirements.txt
@ -1,4 +1,5 @@
-mkdocs==0.16.3
-mkdocs-material==1.8.0
+mkdocs==0.17.2
+mkdocs-material==2.2.6
 pygments==2.2.0
 pymdown-extensions==3.5
+six==1.10.0
Author	SHA1	Message	Date
Dalton Hubble	527b5ca602	Update CHANGELOG.md for v1.9.1	2018-01-09 07:03:04 -08:00
Dalton Hubble	ecd6a9443b	Add maintenance docs with upgrade policies * Add best practices for maintenance * Describe blue-green replacement strategy * Mention unsupported in-place edit and node replacement strategies	2018-01-09 06:54:44 -08:00
Dalton Hubble	2523d64f95	Fix docs to show exporting KUBECONFIG	2018-01-06 16:55:06 -08:00
Dalton Hubble	fc455c8624	Remove old mention of ACIs in bootkube.service description	2018-01-06 16:20:34 -08:00
Dalton Hubble	7a0a60708e	Bump Container Linux version shown in docs * Be sure docs and examples list Container Linux versions that have been patched for Meltdown just in case someone copy-pastes or sees them as recent versions	2018-01-06 14:58:38 -08:00
Dalton Hubble	51a5f64024	Enable portmap plugin alongside Calico to fix hostPort * https://github.com/poseidon/terraform-render-bootkube/pull/36	2018-01-06 14:01:18 -08:00
Dalton Hubble	e1f2125f02	Update etcd from 3.2.0 to 3.2.13 * https://github.com/coreos/etcd/releases/tag/v3.2.13	2018-01-06 14:01:18 -08:00
Dalton Hubble	9329b775f6	Update Kubernetes from v1.8.6 to v1.9.1	2018-01-06 14:01:16 -08:00
Dalton Hubble	e04cce1201	Update mkdocs and material docs theme	2018-01-06 10:59:56 -08:00
Dalton Hubble	201a38bd90	Update CHANGELOG.md for v1.8.6	2017-12-22 13:00:18 -08:00
Dalton Hubble	fbdd946601	Update Kubernetes from v1.8.5 to v1.8.6	2017-12-21 11:20:37 -08:00
Barak Michener	19102636a9	Add link to dashboard 315	2017-12-15 18:52:40 -08:00
Dalton Hubble	21e540159b	addons: Update grafana from v4.6.2 to v4.6.3 * https://github.com/grafana/grafana/releases/tag/v4.6.3	2017-12-15 16:09:14 -08:00
Dalton Hubble	43e65a4d13	Update CHANGELOG.md for v1.8.5	2017-12-15 02:04:13 -08:00
Barak Michener	e79088baa0	Add optional cluster_domain_suffix variable * Allow kube-dns to respond to DNS queries with a custom suffix, instead of the default 'cluster.local' * Useful when multiple clusters exist on the same local network and wish to query services on one another	2017-12-15 01:45:52 -08:00
Dalton Hubble	495e33e213	Update bootkube and terraform-render-bootkube to v0.9.1	2017-12-15 01:45:02 -08:00
Dalton Hubble	63f5a26a72	Eliminate steps to move self-hosted etcd assets * bootkube/assets/experimental/* assets corresponded to self-hosted etcd manifests, which are no longer an option in Typhoon	2017-12-13 01:06:56 -08:00
Lars Fenneberg	eea79e895d	Fix manifest consolidation in bootkube start wrapper * Fix manifest existence test in /opt/bootkube/bootkube-start to also work with more than one directory	2017-12-12 23:08:22 -08:00
Dalton Hubble	99c07661c6	Fix old Container Linux versions mentioned in docs	2017-12-11 23:36:16 -08:00
Dalton Hubble	521a1f0fee	addons: Update heapster from v1.4.3 to v1.5.0 * Rollback addon-resizer to 1.7 to address issues in large clusters https://github.com/kubernetes/kubernetes/pull/52536	2017-12-11 23:34:25 -08:00
Dalton Hubble	7345cb6419	addons: Update nginx-ingress to 0.9.0	2017-12-11 00:48:15 -08:00
Dalton Hubble	a481d71d7d	addons: Update nginx-ingress to 0.9.0-beta.19 * Undo rollback `f00ecde854` * Port binding regression only occurs with --enable-ssl-passthrough, which isn't used in these examples. See https://github.com/kubernetes/ingress-nginx/issues/1788	2017-12-11 00:44:32 -08:00
Dalton Hubble	831a5c976c	Add Kubernetes Dashboard warning and improve changelog	2017-12-09 22:38:27 -08:00
Dalton Hubble	85e6783503	Recommend Container Linux images with Docker 17.09 * Container Linux stable and beta now provide Docker 17.09 (instead of 1.12). Recommend images which provide 17.09. * Older clusters (with CLUO addon) auto-update node's Container Linux version and will begin using Docker 17.09.	2017-12-09 22:14:13 -08:00
Dalton Hubble	165396d6aa	Update Kubernetes from v1.8.4 to v1.8.5	2017-12-09 21:28:31 -08:00
Vincent Palmer	ce49a93d5d	Fix issue with etcd-member failing to resolve peers * When restarting masters, `etcd-member.service` may fail to lookup peers if /etc/resolv.conf hasn't been populated yet. Require the wait-for-dns.service.	2017-12-09 20:12:49 -08:00
Khris Richardson	e623439eec	Fix typos in docs and CONTRIBUTING.md	2017-12-09 19:58:09 -08:00
Dalton Hubble	9548572d98	Add kubelet --volume-plugin-dir flag on bare-metal * Kubelet will search path for flexvolume plugins	2017-12-05 13:12:53 -08:00
Dalton Hubble	f00ecde854	Rollback nginx-ingress on GCE to 0.9.0-beta.17 * https://github.com/kubernetes/ingress-nginx/issues/1788	2017-12-02 14:06:22 -08:00
Dalton Hubble	d85300f947	Clarify only Terraform v0.10.x should be used * It is not safe to update to Terraform v0.11.x yet * https://github.com/hashicorp/terraform/issues/16824	2017-12-02 01:31:39 -08:00
Dalton Hubble	65f006e6cc	addons: Sync prometheus alerts to upstream * https://github.com/coreos/prometheus-operator/pull/774	2017-12-01 23:24:08 -08:00
Dalton Hubble	8d3817e0ae	addons: Update nginx-ingress to 0.9.0-beta.19 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.9.0-beta.19	2017-12-01 22:32:33 -08:00
Dalton Hubble	5f5eec1175	Update bootkube and terraform-render-bootkube to v0.9.0	2017-12-01 22:27:48 -08:00
Dalton Hubble	5308fde3d3	Add Kubernetes certification badge	2017-11-29 19:26:49 -08:00
Dalton Hubble	9ab61d7bf5	Add Typhoon images with and without text * Serve images from GCS poseidon, rather than dghubble	2017-11-29 01:01:01 -08:00
Dalton Hubble	6483f613c5	Update Kubernetes from v1.8.3 to v1.8.4	2017-11-28 21:52:11 -08:00
Dalton Hubble	56c6bf431a	Update terraform-render-bootkube for Kubernetes v1.8.4 * Update hyperkube from v1.8.3 to v1.8.4 * Remove flock from bootstrap-apiserver and kube-apiserver * Remove unused critical-pod annotations in manifests * Use service accounts for kube-proxy and pod-checkpointer * Update Calico from v2.6.1 to v2.6.3 * Update flannel from v0.9.0 to v0.9.1 * Remove Calico termination grace period to prevent calico from getting stuck for extended periods * https://github.com/poseidon/terraform-render-bootkube/pull/29	2017-11-28 21:42:26 -08:00
Dalton Hubble	63ab117205	addons: Add prometheus rules for DaemonSets * https://github.com/coreos/prometheus-operator/pull/755	2017-11-16 23:51:21 -08:00
Dalton Hubble	1cd262e712	addons: Fix prometheus K8SApiServerLatency alert rule * https://github.com/coreos/prometheus-operator/issues/751	2017-11-16 23:37:15 -08:00
Dalton Hubble	32bdda1b6c	addons: Update Grafana from v4.6.1 to v4.6.2 * https://github.com/grafana/grafana/releases/tag/v4.6.2	2017-11-16 23:34:36 -08:00
Dalton Hubble	07d257aa7b	Add initrd kernel argument needed by UEFI clients * https://github.com/coreos/bugs/issues/1239	2017-11-16 23:19:51 -08:00