diff --git a/CHANGES.md b/CHANGES.md index 18fb1ee2..e4b3d3a5 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -4,6 +4,9 @@ Notable changes between versions. ## Latest +* All platforms run etcd on-host, across controllers +* AWS platform promoted to beta + #### Google Cloud * Add required variable `region` (e.g. "us-central1") @@ -17,8 +20,10 @@ Notable changes between versions. #### AWS +* Promote AWS platform to beta * Reduce time to bootstrap a cluster * Change etcd to run on-host, across controllers (etcd-member.service) +* Fix firewall rules for multi-controller kubelet scraping and node-exporter * Remove support for self-hosted etcd ## v1.8.2 diff --git a/README.md b/README.md index c9924724..5fa97704 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ Typhoon provides a Terraform Module for each supported operating system and plat | Platform | Operating System | Terraform Module | Status | |---------------|------------------|------------------|--------| -| AWS | Container Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | alpha | +| AWS | Container Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | beta | | Bare-Metal | Container Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable | | Digital Ocean | Container Linux | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta | | Google Cloud | Container Linux | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | beta | @@ -72,7 +72,7 @@ $ terraform apply Apply complete! Resources: 37 added, 0 changed, 0 destroyed. ``` -In 5-10 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes. +In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes. ```sh $ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig diff --git a/docs/aws.md b/docs/aws.md index 833d959a..9a0cd09e 100644 --- a/docs/aws.md +++ b/docs/aws.md @@ -6,12 +6,6 @@ We'll declare a Kubernetes cluster in Terraform using the Typhoon Terraform modu Controllers and workers are provisioned to run a `kubelet`. A one-time [bootkube](https://github.com/kubernetes-incubator/bootkube) bootstrap schedules an `apiserver`, `scheduler`, `controller-manager`, and `kube-dns` on controllers and runs `kube-proxy` and `calico` or `flannel` on each node. A generated `kubeconfig` provides `kubectl` access to the cluster. -!!! warning "Alpha" - Typhoon Kubernetes clusters on AWS are marked as "alpha". - -!!! warning "Disabled" - Clusters do not use EC2 instances with elevated IAM roles. Kubernetes AWS integrations are not enabled. - ## Requirements * AWS Account and IAM credentials @@ -87,7 +81,7 @@ module "aws-tempest" { dns_zone = "aws.example.com" dns_zone_id = "Z3PAABBCFAKEC0" controller_count = 1 - controller_type = "t2.small" + controller_type = "t2.medium" worker_count = 2 worker_type = "t2.small" ssh_authorized_key = "ssh-rsa AAAAB3Nz..." @@ -147,7 +141,7 @@ module.aws-tempest.null_resource.bootkube-start: Creation complete after 11m8s ( Apply complete! Resources: 98 added, 0 changed, 0 destroyed. ``` -In 5-10 minutes, the Kubernetes cluster will be ready. +In 4-8 minutes, the Kubernetes cluster will be ready. ## Verify diff --git a/docs/google-cloud.md b/docs/google-cloud.md index 7ff67612..14de17e2 100644 --- a/docs/google-cloud.md +++ b/docs/google-cloud.md @@ -137,14 +137,14 @@ $ terraform apply module.google-cloud-yavin.null_resource.bootkube-start: Still creating... (10s elapsed) ... -module.google-cloud-yavin.null_resource.bootkube-start: Still creating... (8m30s elapsed) -module.google-cloud-yavin.null_resource.bootkube-start: Still creating... (8m40s elapsed) +module.google-cloud-yavin.null_resource.bootkube-start: Still creating... (5m30s elapsed) +module.google-cloud-yavin.null_resource.bootkube-start: Still creating... (5m40s elapsed) module.google-cloud-yavin.null_resource.bootkube-start: Creation complete (ID: 5768638456220583358) Apply complete! Resources: 64 added, 0 changed, 0 destroyed. ``` -In 5-10 minutes, the Kubernetes cluster will be ready. +In 4-8 minutes, the Kubernetes cluster will be ready. ## Verify diff --git a/docs/index.md b/docs/index.md index fd2a7ec0..ebed4f8c 100644 --- a/docs/index.md +++ b/docs/index.md @@ -23,7 +23,7 @@ Typhoon provides a Terraform Module for each supported operating system and plat | Platform | Operating System | Terraform Module | Status | |---------------|------------------|------------------|--------| -| AWS | Container Linux | [aws/container-linux/kubernetes](aws.md) | alpha | +| AWS | Container Linux | [aws/container-linux/kubernetes](aws.md) | beta | | Bare-Metal | Container Linux | [bare-metal/container-linux/kubernetes](bare-metal.md) | stable | | Digital Ocean | Container Linux | [digital-ocean/container-linux/kubernetes](digital-ocean.md) | beta | | Google Cloud | Container Linux | [google-cloud/container-linux/kubernetes](google-cloud.md) | beta | @@ -71,7 +71,7 @@ $ terraform apply Apply complete! Resources: 64 added, 0 changed, 0 destroyed. ``` -In 5-10 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes. +In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes. ``` $ KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig diff --git a/docs/topics/performance.md b/docs/topics/performance.md index 65b0199d..4b55eaf8 100644 --- a/docs/topics/performance.md +++ b/docs/topics/performance.md @@ -6,21 +6,20 @@ Provisioning times vary based on the platform. Sampling the time to create (appl | Platform | Apply | Destroy | |---------------|-------|---------| -| AWS | 5 min | 5 min | +| AWS | 6 min | 5 min | | Bare-Metal | 10-14 min | NA | | Digital Ocean | 3 min 30 sec | 20 sec | | Google Cloud | 4 min | 4 min 30 sec | Notes: -* AWS is alpha -* DNS propagation times have a large impact on provision time +* SOA TTL and NXDOMAIN caching can have a large impact on provision time * Platforms with auto-scaling take more time to provision (AWS, Google) -* Bare-metal provision times vary depending on the time for machines to POST and network bandwidth to download images. +* Bare-metal POST times and network bandwidth will affect provision times ## Network Performance -Network performance varies based on the platform and CNI plugin. `iperf` was used to measture the bandwidth between different hosts and different pods. Host-to-host indicates the typical bandwidth offered by the provider. Pod-to-pod shows the bandwidth between two `iperf` containers. The difference provides some idea about the overhead. +Network performance varies based on the platform and CNI plugin. `iperf` was used to measture the bandwidth between different hosts and different pods. Host-to-host shows typical bandwidth between host machines. Pod-to-pod shows the bandwidth between two `iperf` containers. | Platform / Plugin | Theory | Host to Host | Pod to Pod | |----------------------------|-------:|-------------:|-------------:| @@ -37,9 +36,7 @@ Network performance varies based on the platform and CNI plugin. `iperf` was use Notes: -* AWS is alpha -* Network bandwidth fluctuates on AWS and Digital Ocean. +* Calico and Flannel have comparable performance. Platform and configuration differenes dominate. +* Neither CNI provider seems to be able to leverage bonded NICs (bare-metal) +* AWS and Digital Ocean network bandwidth fluctuates more than on other platforms. * Only [certain AWS EC2 instance types](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html#jumbo_frame_instances) allow jumbo frames. This is why the default MTU on AWS must be 1480. -* Between Flannel and Calico, performance differences are usually minimal. Platform and configuration differenes dominate. -* Pods do not seem to be able to leverage the hosts' bonded NIC setup. Possibly a testing artifact. -* Observing the same bonded NIC pod-to-pod limit suggests the bottleneck lies below flannel and calico.