Run etcd cluster on-host, across controllers on Google Cloud

* Change controllers from a managed group to individual instances * Create discrete DNS records to each controller's private IP for etcd * Change etcd to run on-host, across controllers (etcd-member.service) * Reduce time to bootstrap a cluster * Deprecate self-hosted-etcd on the Google Cloud platform
2025-09-14 23:39:44 +02:00 · 2017-11-05 11:01:50 -08:00
parent ae07a21e3d
commit 7b38271212
17 changed files with 212 additions and 93 deletions
--- a/docs/aws.md
+++ b/docs/aws.md
@@ -4,7 +4,7 @@ In this tutorial, we'll create a Kubernetes v1.8.2 cluster on AWS.

 We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module. On apply, a VPC, gateway, subnets, auto-scaling groups of controllers and workers, network load balancers for controllers and workers, and security groups will be created.

-Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube](https://github.com/kubernetes-incubator/bootkube) bootstrap schedules `etcd`, `apiserver`, `scheduler`, `controller-manager`, and `kube-dns` on controllers and runs `kube-proxy` and `flannel` or `calico` on each node. A generated `kubeconfig` provides `kubectl` access to the cluster.
+Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube](https://github.com/kubernetes-incubator/bootkube) bootstrap schedules `etcd`, `apiserver`, `scheduler`, `controller-manager`, and `kube-dns` on controllers and runs `kube-proxy` and `calico` or `flannel` on each node. A generated `kubeconfig` provides `kubectl` access to the cluster.

 !!! warning "Alpha"
    Typhoon Kubernetes clusters on AWS are marked as "alpha".
--- a/docs/bare-metal.md
+++ b/docs/bare-metal.md
@@ -4,7 +4,7 @@ In this tutorial, we'll network boot and provison a Kubernetes v1.8.2 cluster on

 First, we'll deploy a [Matchbox](https://github.com/coreos/matchbox) service and setup a network boot environment. Then, we'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Container Linux to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers.

-Controllers are provisioned as etcd peers and run `etcd-member` (etcd3) and `kubelet`. Workers are provisioned to run a `kubelet`. A one-time [bootkube](https://github.com/kubernetes-incubator/bootkube) bootstrap schedules an `apiserver`, `scheduler`, `controller-manager`, and `kube-dns` on controllers and runs `kube-proxy` and `flannel` or `calico` on each node. A generated `kubeconfig` provides `kubectl` access to the cluster.
+Controllers are provisioned as etcd peers and run `etcd-member` (etcd3) and `kubelet`. Workers are provisioned to run a `kubelet`. A one-time [bootkube](https://github.com/kubernetes-incubator/bootkube) bootstrap schedules an `apiserver`, `scheduler`, `controller-manager`, and `kube-dns` on controllers and runs `kube-proxy` and `calico` or `flannel` on each node. A generated `kubeconfig` provides `kubectl` access to the cluster.

 ## Requirements

--- a/docs/faq.md
+++ b/docs/faq.md
@@ -8,7 +8,7 @@ Formats rise and evolve. Typhoon may choose to adapt the format over time (with

 ## Self-hosted etcd

-AWS and Google Cloud clusters run etcd as "self-hosted" pods, managed by the [etcd-operator](https://github.com/coreos/etcd-operator). By contrast, Typhoon bare-metal and Digital Ocean run an etcd peer as a systemd `etcd-member.service` on each controller (i.e. on-host).
+AWS clusters run etcd as "self-hosted" pods, managed by the [etcd-operator](https://github.com/coreos/etcd-operator). By contrast, Typhoon bare-metal, Digital Ocean, and Google Cloud run an etcd peer as a systemd `etcd-member.service` on each controller (i.e. on-host).

 In practice, self-hosted etcd has proven to be *ok*, but not ideal. Running the apiserver's etcd atop Kubernetes itself is inherently complex, but works in most cases. It can be opaque to debug if complex edge cases with upstream Kubernetes bugs arise.

--- a/docs/google-cloud.md
+++ b/docs/google-cloud.md
@@ -4,7 +4,7 @@ In this tutorial, we'll create a Kubernetes v1.8.2 cluster on Google Compute Eng

 We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform module. On apply, a network, firewall rules, managed instance groups of Kubernetes controllers and workers, network load balancers for controllers and workers, and health checks will be created.

-Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube](https://github.com/kubernetes-incubator/bootkube) bootstrap schedules `etcd`, `apiserver`, `scheduler`, `controller-manager`, and `kube-dns` on controllers and runs `kube-proxy` and `flannel` on each node. A generated `kubeconfig` provides `kubectl` access to the cluster.
+Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube](https://github.com/kubernetes-incubator/bootkube) bootstrap schedules an `apiserver`, `scheduler`, `controller-manager`, and `kube-dns` on controllers and runs `kube-proxy` and `calico` or `flannel` on each node. A generated `kubeconfig` provides `kubectl` access to the cluster.

 ## Requirements

@@ -155,7 +155,7 @@ In 5-10 minutes, the Kubernetes cluster will be ready.
 $ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
 $ kubectl get nodes
 NAME                                          STATUS   AGE    VERSION
-yavin-controller-1682.c.example-com.internal  Ready    6m     v1.8.2
+yavin-controller-0.c.example-com.internal     Ready    6m     v1.8.2
 yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.8.2
 yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.8.2
 ```
@@ -168,13 +168,10 @@ NAMESPACE     NAME                                      READY  STATUS    RESTART
 kube-system   calico-node-1cs8z                         2/2    Running   0         6m
 kube-system   calico-node-d1l5b                         2/2    Running   0         6m
 kube-system   calico-node-sp9ps                         2/2    Running   0         6m
-kube-system   etcd-operator-3329263108-f443m            1/1    Running   1         6m
 kube-system   kube-apiserver-zppls                      1/1    Running   0         6m
 kube-system   kube-controller-manager-3271970485-gh9kt  1/1    Running   0         6m
 kube-system   kube-controller-manager-3271970485-h90v8  1/1    Running   1         6m
 kube-system   kube-dns-1187388186-zj5dl                 3/3    Running   0         6m
-kube-system   kube-etcd-0000                            1/1    Running   0         5m
-kube-system   kube-etcd-network-checkpointer-crznb      1/1    Running   0         6m
 kube-system   kube-proxy-117v6                          1/1    Running   0         6m
 kube-system   kube-proxy-9886n                          1/1    Running   0         6m
 kube-system   kube-proxy-njn47                          1/1    Running   0         6m
--- a/docs/index.md
+++ b/docs/index.md
@@ -78,7 +78,7 @@ In 5-10 minutes (varies by platform), the cluster will be ready. This Google Clo
 $ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
 $ kubectl get nodes
 NAME                                          STATUS   AGE    VERSION
-yavin-controller-1682.c.example-com.internal  Ready    6m     v1.8.2
+yavin-controller-0.c.example-com.internal     Ready    6m     v1.8.2
 yavin-worker-jrbf.c.example-com.internal      Ready    5m     v1.8.2
 yavin-worker-mzdm.c.example-com.internal      Ready    5m     v1.8.2
 ```
@@ -91,13 +91,10 @@ NAMESPACE     NAME                                      READY  STATUS    RESTART
 kube-system   calico-node-1cs8z                         2/2    Running   0         6m
 kube-system   calico-node-d1l5b                         2/2    Running   0         6m
 kube-system   calico-node-sp9ps                         2/2    Running   0         6m
-kube-system   etcd-operator-3329263108-f443m            1/1    Running   1         6m
 kube-system   kube-apiserver-zppls                      1/1    Running   0         6m
 kube-system   kube-controller-manager-3271970485-gh9kt  1/1    Running   0         6m
 kube-system   kube-controller-manager-3271970485-h90v8  1/1    Running   1         6m
 kube-system   kube-dns-1187388186-zj5dl                 3/3    Running   0         6m
-kube-system   kube-etcd-0000                            1/1    Running   0         5m
-kube-system   kube-etcd-network-checkpointer-crznb      1/1    Running   0         6m
 kube-system   kube-proxy-117v6                          1/1    Running   0         6m
 kube-system   kube-proxy-9886n                          1/1    Running   0         6m
 kube-system   kube-proxy-njn47                          1/1    Running   0         6m
--- a/docs/topics/performance.md
+++ b/docs/topics/performance.md
@@ -9,7 +9,7 @@ Provisioning times vary based on the platform. Sampling the time to create (appl
 | AWS           | 20 min | 8 min 10 sec |
 | Bare-Metal    | 10-14 min | NA  |
 | Digital Ocean | 3 min 30 sec | 20 sec |
-| Google Cloud  | 6 min 10 sec | 4 min 30 sec |
+| Google Cloud  | 4 min | 4 min 30 sec |

 Notes: