From c13d060b38f0397882fc1d762b1db66e923fe37c Mon Sep 17 00:00:00 2001
From: Dalton Hubble <dghubble@gmail.com>
Date: Thu, 18 Aug 2022 09:02:38 -0700
Subject: [PATCH] Add docs for GCP MIG update and AWS instance refresh

* Document that worker instances are rolling replaced when
changes to their configuration are applied
---
 CHANGES.md                         |   6 +-
 docs/fedora-coreos/google-cloud.md |   2 +-
 docs/topics/maintenance.md         | 146 +++++++++++++++++++++++------
 3 files changed, 121 insertions(+), 33 deletions(-)

diff --git a/CHANGES.md b/CHANGES.md
index 9756ff28..e6c7d1c4 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -35,7 +35,7 @@ version: 1.0.0
 
 ### AWS
 
-* [Refresh](https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html) instances in autoscaling group when launch configuration changes ([#1208](https://github.com/poseidon/typhoon/pull/1208)) (**important**)
+* [Refresh](https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html) instances in autoscaling group when launch configuration changes ([#1208](https://github.com/poseidon/typhoon/pull/1208)) ([docs](https://typhoon.psdn.io/topics/maintenance/#node-configuration-updates), **important**)
   * Worker launch configuration changes start an autoscaling group instance refresh to replace instances
   * Instance refresh creates surge instances, waits for a warm-up period, then deletes old instances
   * Changing `worker_type`, `disk_*`, `worker_price`, `worker_target_groups`, or Butane `worker_snippets` on existing worker nodes will replace instances
@@ -46,11 +46,11 @@ version: 1.0.0
 
 ### Google
 
-* [Roll](https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups) instance template changes to worker managed instance groups ([#1207](https://github.com/poseidon/typhoon/pull/1207)) (**important**)
+* [Roll](https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups) instance template changes to worker managed instance groups ([#1207](https://github.com/poseidon/typhoon/pull/1207)) ([docs](https://typhoon.psdn.io/topics/maintenance/#node-configuration-updates), **important**)
   * Worker instance template changes roll out by gradually replacing instances
   * Automatic rollouts create surge instances, wait for Kubelet health checks, then delete old instances (0 unavailable instances)
   * Changing `worker_type`, `disk_size`, `worker_preemptible`, or Butane `worker_snippets` on existing worker nodes will replace instances
-  * New AMIs or changing `os_stream` will be ignored, to allow Fedora CoreOS or Flatcar Linux to keep themselves updated
+  * New compute images or changing `os_stream` will be ignored, to allow Fedora CoreOS or Flatcar Linux to keep themselves updated
   * Previously, new instance templates were made in the same way, but not applied to instances unless manually replaced
 * Add health checks to worker managed instance groups (i.e. "autohealing") ([#1207](https://github.com/poseidon/typhoon/pull/1207))
   * Use health checks to probe kube-proxy every 30s
diff --git a/docs/fedora-coreos/google-cloud.md b/docs/fedora-coreos/google-cloud.md
index 1e524065..df947c96 100644
--- a/docs/fedora-coreos/google-cloud.md
+++ b/docs/fedora-coreos/google-cloud.md
@@ -73,7 +73,7 @@ Define a Kubernetes cluster using the module `google-cloud/fedora-coreos/kuberne
 
 ```tf
 module "yavin" {
-  source = "git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=development-sha"
+  source = "git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=v1.24.4"
 
   # Google Cloud
   cluster_name  = "yavin"
diff --git a/docs/topics/maintenance.md b/docs/topics/maintenance.md
index 32ceb11b..1b09d87e 100644
--- a/docs/topics/maintenance.md
+++ b/docs/topics/maintenance.md
@@ -23,9 +23,27 @@ module "mercury" {
 }
 ```
 
-Master is updated regularly, so it is recommended to [pin](https://www.terraform.io/docs/modules/sources.html) modules to a [release tag](https://github.com/poseidon/typhoon/releases) or [commit](https://github.com/poseidon/typhoon/commits/master) hash. Pinning ensures `terraform get --update` only fetches the desired version.
+Main is updated regularly, so it is recommended to [pin](https://www.terraform.io/docs/modules/sources.html) modules to a [release tag](https://github.com/poseidon/typhoon/releases) or [commit](https://github.com/poseidon/typhoon/commits/main) hash. Pinning ensures `terraform get --update` only fetches the desired version.
 
-## Upgrades
+## Terraform Versions
+
+Typhoon modules support Terraform v0.13.x and higher. Poseidon publishes [providers](/topics/security/#terraform-providers) to the Terraform Provider Registry for automatic install via `terraform init`.
+
+| Typhoon Release   | Terraform version   |
+|-------------------|---------------------|
+| v1.21.2 - ?       | v0.13.x, v0.14.4+, v0.15.x, v1.0.x |
+| v1.21.1 - v1.21.1 | v0.13.x, v0.14.4+, v0.15.x |
+| v1.20.2 - v1.21.0 | v0.13.x, v0.14.4+   |
+| v1.20.0 - v1.20.2 | v0.13.x             |
+| v1.18.8 - v1.19.4 | v0.12.26+, v0.13.x  |
+| v1.15.0 - v1.18.8 | v0.12.x             |
+| v1.10.3 - v1.15.0 | v0.11.x             |
+| v1.9.2 - v1.10.2  | v0.10.4+ or v0.11.x |
+| v1.7.3 - v1.9.1   | v0.10.x             |
+| v1.6.4 - v1.7.2   | v0.9.x              |
+
+
+## Cluster Upgrades
 
 Typhoon recommends upgrading clusters using a blue-green replacement strategy and migrating workloads.
 
@@ -127,9 +145,99 @@ Typhoon supports multi-controller clusters, so it is possible to upgrade a clust
 !!! warning
     Typhoon does not support or document node replacement as an upgrade strategy. It limits Typhoon's ability to make infrastructure and architectural changes between tagged releases.
 
-### Upgrade terraform-provider-ct
+## Node Configuration Updates
 
-The [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin parses, validates, and converts Fedora CoreOS or Flatcar Linux Configs into Ignition user-data for provisioning instances. Since Typhoon v1.12.2+, the plugin can be updated in-place so that on apply, only workers will be replaced.
+Typhoon worker instance groups (default workers and [worker pools](../advanced/worker-pools.md)) on AWS and Google Cloud gradually rolling replace worker instances when their configuration is altered.
+
+### AWS
+
+On AWS, worker instances belong to an auto-scaling group. When an auto-scaling group's launch configuration changes, an AWS [Instance Refresh](https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html) gradually replaces worker instances.
+
+Instance refresh creates surge instances, waits for a warm-up period, then deletes old instances.
+
+```diff
+module "tempest" {
+  source = "git::https://github.com/poseidon/typhoon//aws/VARIANT/kubernetes?ref=VERSION"
+
+  # AWS
+  cluster_name = "tempest"
+  ...
+
+  # optional
+  worker_count = 2
+- worker_type  = "t3.small"
++ worker_type  = "t3a.small"
+
+  # change from on-demand to spot
++ worker_price = "0.0309"
+
+  # default is 30GB
++ disk_size = 50
+
+  # change worker snippets
++ worker_snippets = [
++   file("butane/feature.yaml"),
++ ]
+}
+```
+
+Applying edits to most worker fields will start an instance refresh:
+
+* `worker_type`
+* `disk_*`
+* `worker_price` (i.e. spot)
+* `worker_target_groups`
+* `worker_snippets`
+
+However, changing `os_stream`/`os_channel` or new AMIs becoming available will NOT change the launch configuration or trigger an Instance Refresh. This allows Fedora CoreOS or Flatcar Linux to auto-update themselves via reboots and avoids unexpected terraform diffs for new AMIs.
+
+!!! note
+    Before Typhoon v1.24.4, worker nodes only used new launch configurations when replaced manually (or due to failure). If you must change node configuration manually, it's still possible. Create a new [worker pool](../advanced/worker-pools.md), then scale down the old worker pool as desired.
+
+### Google Cloud
+
+On Google Cloud, worker instances belong to a [managed instance group](https://cloud.google.com/compute/docs/instance-groups#managed_instance_groups). When a group's launch template changes, a [rolling update](https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups) gradually replaces worker instances.
+
+The rolling update creates surge instances, waits for instances to be healthy, then deletes old instances.
+
+```diff
+module "yavin" {
+  source = "git::https://github.com/poseidon/typhoon//google-cloud/VARIANT/kubernetes?ref=VERSION"
+
+  # Google Cloud
+  cluster_name  = "yavin"
+  ...
+
+  # optional
+  worker_count = 2
++ worker_type = "n2-standard-2"
++ worker_preemptible = true
+
+  # default is 30GB
++ disk_size = 50
+
+  # change worker snippets
++ worker_snippets = [
++   file("butane/feature.yaml"),
++ ]
+}
+```
+
+Applying edits to most worker fields will start an instance refresh:
+
+* `worker_type`
+* `disk_*`
+* `worker_preemptible` (i.e. spot)
+* `worker_snippets`
+
+However, changing `os_stream`/`os_channel` or new compute images becoming available will NOT change the launch template or update instances. This allows Fedora CoreOS or Flatcar Linux to auto-update themselves via reboots and avoids unexpected terraform diffs for new AMIs.
+
+!!! note
+    Before Typhoon v1.24.4, worker nodes only used new launch templates when replaced manually (or due to failure). If you must change node configuration manually, it's still possible. Create a new [worker pool](../advanced/worker-pools.md), then scale down the old worker pool as desired.
+
+## Upgrade poseidon/ct
+
+The [poseidon/ct](https://github.com/poseidon/terraform-provider-ct) Terraform provider plugin parses, validates, and converts Butane Configs to Ignition user-data for provisioning instances. Since Typhoon v1.12.2+, the plugin can be updated in-place so that on apply, only workers will be replaced.
 
 Update the version of the `ct` plugin in each Terraform working directory. Typhoon clusters managed in the working directory **must** be v1.12.2 or higher.
 
@@ -140,8 +248,8 @@ terraform {
   required_providers {
     ct = {
       source  = "poseidon/ct"
--     version = "0.8.0"
-+     version = "0.9.0"
+-     version = "0.10.0"
++     version = "0.11.0"
     }
     ...
   }
@@ -155,11 +263,11 @@ terraform init
 terraform plan
 ```
 
-Apply the change. Worker nodes' user-data will be changed and workers will be replaced. Rollout happens slightly differently on each platform:
+Apply the change. If worker nodes' user-data is changed and workers will be replaced. Rollout happens slightly differently on each platform:
 
 #### AWS
 
-AWS creates a new worker ASG, then removes the old ASG. New workers join the cluster and old workers disappear. `terraform apply` will hang during this process.
+See AWS node [config updates](#aws).
 
 #### Azure
 
@@ -187,24 +295,4 @@ Expect downtime.
 
 #### Google Cloud
 
-Google Cloud creates a new worker template and edits the worker instance group instantly. Manually terminate workers and replacement workers will use the user-data.
-
-## Terraform Versions
-
-Terraform [v0.13](https://www.hashicorp.com/blog/announcing-hashicorp-terraform-0-13) introduced major changes to the provider plugin system. Terraform `init` can automatically install both `hashicorp` and `poseidon` provider plugins, eliminating the need to manually install plugin binaries.
-
-Typhoon modules have been updated for v0.13.x. Poseidon publishes [providers](/topics/security/#terraform-providers) to the Terraform Provider Registry for usage with v0.13+.
-
-| Typhoon Release   | Terraform version   |
-|-------------------|---------------------|
-| v1.21.2 - ?       | v0.13.x, v0.14.4+, v0.15.x, v1.0.x |
-| v1.21.1 - v1.21.1 | v0.13.x, v0.14.4+, v0.15.x |
-| v1.20.2 - v1.21.0 | v0.13.x, v0.14.4+   |
-| v1.20.0 - v1.20.2 | v0.13.x             |
-| v1.18.8 - v1.19.4 | v0.12.26+, v0.13.x  |
-| v1.15.0 - v1.18.8 | v0.12.x             |
-| v1.10.3 - v1.15.0 | v0.11.x             |
-| v1.9.2 - v1.10.2  | v0.10.4+ or v0.11.x |
-| v1.7.3 - v1.9.1   | v0.10.x             |
-| v1.6.4 - v1.7.2   | v0.9.x              |
-
+See Google Cloud node [config updates](#google-cloud).