Commit Graph

62 Commits

Author SHA1 Message Date
Dalton Hubble 1786e34f33 Revert Graceful Node Shutdown feature
* Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in
Kubernetes v1.25.0 https://github.com/poseidon/typhoon/pull/1222)
* Graceful node shutdown shutdown allows 30s for critical pods to
shutdown and 15s for regular pods to shutdown before releasing the
inhibitor lock to allow the host to shutdown
* Unfortunately, both pods and the node are shutdown at the same
time at the end of the 45s period without further configuration
options. As a result, regular pods and the node are shutdown at the
same time. In practice, enabling this feature leaves Error or Completed
pods in kube-apiserver state until manually cleaned up. This feature
is not ready for general use
* Fix issue where Error/Completed pods are accumulating whenever any
node restarts (or auto-updates), visible in kubectl get pods
* This issue wasn't apparent in initial testing and seems to only
affect non-critical pods (due to critical pods being killed earlier)
But its very apparent on our real clusters

Rel: https://github.com/kubernetes/kubernetes/issues/110755
2022-09-10 14:58:44 -07:00
Dalton Hubble 393a38deff Configure Graceful Node Shutdown and lengthen max inhibitor delay
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* https://github.com/kubernetes/kubernetes/issues/107043
* https://github.com/coreos/fedora-coreos-tracker/issues/821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
2022-08-28 10:37:33 -07:00
Dalton Hubble 275fc0f9e8 Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature
* Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring
feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in
Pods, as shown by conformance tests
* Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back
to alpha so its not enabled by default, but that will require a release
* Disable the feature gate directly as a workaround for now to make
Kubernetes v1.25.0 usable

```
FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478
```

Rel:

* https://github.com/kubernetes/kubernetes/pull/112076
* https://github.com/kubernetes/kubernetes/pull/107329
2022-08-27 09:49:35 -07:00
Dalton Hubble 3fb59a3289 Migrate most Kubelet flags to KubeletConfiguration file
* Add a KubeletConfiguration file to replace most Kubelet
flags, to prepare for upcoming changes
* Pass Kubelet the --config flag to specify the location of
the KubeletConfiguration
* Remove flsgs / configuration where it matches the defaults
  * Remove --cgroups-per-qos, defaults to true
  * Remove --container-runtime, defaults to remote
  * Remove enforce-node-allocatable=pods, defaults to pods

Rel:

* https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
2022-08-27 09:28:15 -07:00
Dalton Hubble a31dbceac6 Update Kubernetes from v1.24.4 to v1.25.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md
2022-08-25 09:18:14 -07:00
Dalton Hubble 760b4cd5ee Update Kubernetes from v1.24.3 to v1.24.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244
2022-08-17 20:09:30 -07:00
Dalton Hubble 52427a4271 Refresh instances in autoscaling group when launch configuration changes
* Changes to worker launch configurations start an autoscaling group instance
refresh to replace instances
* Instance refresh creates surge instances, waits for a warm-up period, then
deletes old instances
* Changing worker_type, disk_*, worker_price, worker_target_groups, or Butane
worker_snippets on existing worker nodes will replace instances
* New AMIs or changing `os_stream` will be ignored, to allow Fedora CoreOS or
Flatcar Linux to keep themselves updated
* Previously, new launch configurations were made in the same way, but not
applied to instances unless manually replaced
2022-08-14 21:43:49 -07:00
Dalton Hubble 87a8278c9d Improve AWS autoscaling group and launch config names
* Rename launch configuration to use a name_prefix named after the
cluster and worker to improve identifiability
* Shorten AWS autoscaling group name to not include the launch config
id. Years ago this used to be needed to update the ASG but the AWS
provider detects changes to the launch configuration just fine
2022-08-08 20:46:08 -07:00
Dalton Hubble 4a469513dd Migrate Flatcar Linux from Ignition spec v2.3.0 to v3.3.0
* Requires poseidon v0.11+ and Flatcar Linux 3185.0.0+ (action required)
* Previously, Flatcar Linux configs have been parsed as Container
Linux Configs to Ignition v2.2.0 specs by poseidon/ct
* Flatcar Linux starting in 3185.0.0 now supports Ignition v3.x specs
(which are rendered from Butane Configs, like Fedora CoreOS)
* poseidon/ct v0.11.0 adds support for the flatcar Butane Config
variant so that Flatcar Linux can use Ignition v3.x

Rel:

* [Flatcar Support](https://flatcar-linux.org/docs/latest/provisioning/ignition/specification/#ignition-v3)
* [poseidon/ct support](https://github.com/poseidon/terraform-provider-ct/pull/131)
2022-08-03 08:32:52 -07:00
Dalton Hubble 256b87812e Remove Terraform template provider dependency
* Use Terraform builtin templatefile functionality
* Remove dependency on deprecated Terraform template provider

Rel:

* https://registry.terraform.io/providers/hashicorp/template/2.2.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/293
2022-08-02 18:15:03 -07:00
Dalton Hubble 0db5f86110 Update Kubernetes from v1.24.2 to v1.24.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1243
2022-07-13 20:59:15 -07:00
Dalton Hubble 6d6b48b201 Update Kubernetes from v1.24.1 to v1.24.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1242
2022-06-18 18:35:42 -07:00
Dalton Hubble c5573199db Update Kubernetes from v1.24.0 to v1.24.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1241
2022-05-28 09:39:14 +01:00
Dalton Hubble b0e0b132e4 Update Kubernetes from v1.23.6 to v1.24.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1240
2022-05-04 08:27:14 -07:00
Dalton Hubble 80c6e2e7e6 Update Kubernetes from v1.23.5 to v1.23.6
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1236
2022-04-20 19:39:05 -07:00
Dalton Hubble e61d4b92da Update Kubernetes from v1.23.4 to v1.23.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1235
2022-03-16 21:01:41 -07:00
Dalton Hubble 9aa99f1996 Allow upgrading AWS Terraform provider to v4.x
* https://github.com/hashicorp/terraform-provider-aws/releases/tag/v4.0.0
2022-02-17 09:35:15 -08:00
Dalton Hubble fc38ba45b1 Update Kubernetes from v1.23.3 to v1.23.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1234
2022-02-17 09:00:31 -08:00
Dalton Hubble a527f73f5a Update Kubernetes from v1.23.2 to v1.23.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1233
2022-01-27 09:23:37 -08:00
Dalton Hubble e274a451ff Update Kubernetes from v1.23.1 to v1.23.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1232
2022-01-19 17:59:49 -08:00
Dalton Hubble 2265ab5375 Remove Kubelet `--network-plugin=cni` flag
* Now that `docker-shim` is no longer used, the Kubelet flag
is no longer needed and will be removed in v1.24
2022-01-14 10:43:07 -08:00
Dalton Hubble beb9f1477a Add experimental Flatcar Linux arm64 support on AWS
* Add `arch` variable to Flatcar Linux AWS `kubernetes` and
`workers` modules. Accept `amd64` (default) or `arm64` to support
native arm64/aarch64 clusters or mixed/hybrid clusters with arm64
workers
* Requires `flannel` or `cilium` CNI

Similar to https://github.com/poseidon/typhoon/pull/875
2022-01-14 10:24:48 -08:00
Dalton Hubble 9e3807798f Update Kubernetes from v1.23.0 to v1.23.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1231
2021-12-20 08:36:19 -08:00
Dalton Hubble ef9c6aa423 Switch Flatcar Linux to using containerd CRI
* Use containerd as the Kubernetes Container Runtime
2021-12-15 08:42:13 -08:00
Dalton Hubble 136107b448 Set Kubelet resolver config to /run/systemd/resolve/resolv.conf
* Both Flatcar Linux and Fedora CoreOS use systemd-resolved,
but they setup /etc/resolv.conf symlinks differently
* Prefer using /run/systemd/resolve/resolv.conf directly, which
also updates to reflect runtime changes (e.g. resolvectl)
2021-12-10 08:22:30 -08:00
Dalton Hubble 861021ee98 Update Kubernetes from v1.22.4 to v1.23.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1230
* With Calico, add missing caliconodestatuses CRD added in v3.21.0
https://github.com/poseidon/terraform-render-bootstrap/pull/289
2021-12-09 09:28:41 -08:00
Dalton Hubble a8fd21d250 Update minimum Terraform provider versions
* Update `null` provider to allow use of v3.1.x releases,
instead of being stuck on v2.1.2
* Update min versions in terraform-render-boostrap
https://github.com/poseidon/terraform-render-bootstrap/pull/287
* Document the recommended versions of Terraform cloud providers
2021-12-07 16:26:34 -08:00
Dalton Hubble 93594292eb Update Kubernetes from v1.22.3 to v1.22.4
* Update flannel from v0.15.0 to v0.15.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1224
2021-11-17 19:53:32 -08:00
Dalton Hubble 4fd43b39ad Fix Flatcar Linux docker driver and add cgroups v2
* Remove `/sys/fs/cgroup/systemd` mount since Flatcar Linux
uses cgroups v2
* Flatcar Linux's `docker` switched from the `cgroupfs` to
`systemd` driver without notice
2021-11-12 21:07:20 -08:00
Dalton Hubble dd4a5a4e7e Update Kubernetes from v1.22.2 to v1.22.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1223
2021-10-28 10:11:06 -07:00
Dalton Hubble bb7f31822e Update Kubernetes from v1.22.1 to v1.22.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1222
2021-09-15 19:56:24 -07:00
Dalton Hubble fcbdb50d93 Update Kubernetes from v1.22.0 to v1.22.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1221
2021-08-19 21:12:02 -07:00
Dalton Hubble 9bac641511 Update Kubernetes from v1.21.3 to v1.22.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1220
2021-08-04 22:09:19 -07:00
Dalton Hubble fdade5b40c Update poseidon/ct provider from v0.8.0 to v0.9.0
* Continue targeting Ignition v3.2.0 for some time
2021-07-18 09:05:02 -07:00
Dalton Hubble 171fd2c998 Update Kubernetes from v1.21.2 to v1.21.3
* https://github.com/kubernetes/kubernetes/releases/tag/v1.21.3
2021-07-17 18:22:24 -07:00
Dalton Hubble 66e7354c8a Change AWS default disk type from gp2 to gp3
* https://aws.amazon.com/about-aws/whats-new/2020/12/introducing-new-amazon-ebs-general-purpose-volumes-gp3/
2021-07-04 10:43:05 -07:00
Dalton Hubble 0b276b6b7e Update Kubernetes from v1.21.1 to v1.21.2
* https://github.com/kubernetes/kubernetes/releases/tag/v1.21.2
2021-06-17 16:15:20 -07:00
Dalton Hubble e8513e58bb Add support for Terraform v1.0.0
* https://github.com/hashicorp/terraform/releases/tag/v1.0.0
2021-06-17 13:32:56 -07:00
Dalton Hubble 2076a779a3 Update Kubernetes from v1.21.0 to v1.21.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1211
2021-05-13 11:23:26 -07:00
Dalton Hubble b152b9f973 Reduce the default disk_size from 40GB to 30GB
* We're typically reducing the `disk_size` in real clusters
since the space is under used. The default should be lower.
2021-04-26 11:43:26 -07:00
Dalton Hubble 67047ead08 Update Terraform version to allow v0.15.0
* Require Terraform version v0.13 <= x < v0.16
2021-04-16 09:46:01 -07:00
Dalton Hubble ebd9570ede Update Fedora CoreOS Config version from v1.1.0 to v1.2.0
* Require [poseidon/ct](https://github.com/poseidon/terraform-provider-ct)
Terraform provider v0.8+
* Require any [snippets](https://typhoon.psdn.io/advanced/customization/#hosts)
customizations to update to v1.2.0

See upgrade [notes](https://typhoon.psdn.io/topics/maintenance/#upgrade-terraform-provider-ct)
2021-04-11 15:26:54 -07:00
Dalton Hubble 084e8bea49 Allow custom initial node taints on worker pool nodes
* Add `node_taints` variable to worker modules to set custom
initial node taints on cloud platforms that support auto-scaling
worker pools of heterogeneous nodes (i.e. AWS, Azure, GCP)
* Worker pools could use custom `node_labels` to allowed workloads
to select among differentiated nodes, while custom `node_taints`
allows a worker pool's nodes to be tainted as special to prevent
scheduling, except by workloads that explicitly tolerate the
taint
* Expose `daemonset_tolerations` in AWS, Azure, and GCP kubernetes
cluster modules, to determine whether `kube-system` components
should tolerate the custom taint (advanced use covered in docs)

Rel: #550, #663
Closes #429
2021-04-11 15:00:11 -07:00
Dalton Hubble d73621c838 Update Kubernetes from v1.20.5 to v1.21.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1210
2021-04-08 21:44:31 -07:00
Dalton Hubble 798ec9a92f Change CNI config directory to /etc/cni/net.d
* Change CNI config directory from `/etc/kubernetes/cni/net.d`
to `/etc/cni/net.d` (Kubelet default)
* https://github.com/poseidon/terraform-render-bootstrap/pull/255
2021-04-02 00:03:48 -07:00
Anshul Sirur 507c646e8b Add Kubelet provider-id on AWS
* Set the Kubelet `--provider-id` on AWS based on metadata from
Fedora CoreOS afterburn or Flatcar Linux coreos-metadata
* Based on https://github.com/poseidon/typhoon/pull/951
2021-03-19 12:43:37 -07:00
Dalton Hubble 796149d122 Update Kubernetes from v1.20.4 to v1.20.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1205
2021-03-19 11:27:31 -07:00
Dalton Hubble 6a091e245e Remove Flatcar Linux Edge `os_image` option
* Flatcar Linux has not published an Edge channel image since
April 2020 and recently removed mention of the channel from
their documentation https://github.com/kinvolk/Flatcar/pull/345
* Users of Flatcar Linux Edge should move to the stable, beta, or
alpha channel, barring any alternate advice from upstream Flatcar
Linux
2021-02-20 16:09:54 -08:00
Dalton Hubble e76fe80b45 Update Kubernetes from v1.20.3 to v1.20.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1204
2021-02-19 00:02:07 -08:00
Dalton Hubble 32853aaa7b Update Kubernetes from v1.20.2 to v1.20.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1203
2021-02-17 22:29:33 -08:00