Commit Graph

146 Commits

Author SHA1 Message Date
Dalton Hubble a193762eed Update etcd from v3.5.5 to v3.5.6
* https://github.com/etcd-io/etcd/releases/tag/v3.5.6
2022-11-23 10:59:17 -08:00
Dalton Hubble adf33df99b Update Cilium from v1.12.3 to v1.12.4
* https://github.com/cilium/cilium/releases/tag/v1.12.4
2022-11-23 10:58:27 -08:00
Dalton Hubble 26dbc7e91d Update Kubernetes from v1.25.3 to v1.25.4
* Update Calico from v3.24.3 to v3.24.5
* Update Prometheus and Grafana addons
2022-11-10 09:42:21 -08:00
Dalton Hubble 937acc4b5a Re-enable Graceful Node Shutdown feature
* Kubelet GracefulNodeShutdown works, but only partially handles
gracefully stopping the Kubelet. The most noticeable drawback
is that Completed Pods are left around
* Use a project like poseidon/scuttle or a similar systemd unit
as a snippet to add drain and/or delete behaviors if desired
* This reverts commit 1786e34f33.

Rel:

* https://www.psdn.io/posts/kubelet-graceful-shutdown/
* https://github.com/poseidon/scuttle
2022-11-02 20:49:01 -07:00
Dalton Hubble 9b733d79c7 Update Calico v3.24.2 to v3.24.3
* https://github.com/projectcalico/calico/releases/tag/v3.24.3
* Add patch to allow Kubelet kubeconfig to drain nodes if desired
in addition to just deleting them in shutdown integrations. See
https://github.com/poseidon/terraform-render-bootstrap/pull/330
2022-10-23 22:00:15 -07:00
Dalton Hubble 35a9e22b1f Update Calico from v3.24.1 to v3.24.2
* https://github.com/projectcalico/calico/releases/tag/v3.24.2
2022-10-20 09:28:19 -07:00
Dalton Hubble 0f38a6d405 Remove defunct delete-node.service from worker nodes
* delete-node.service used to be used to remove nodes from the
cluster on shutdown, but its long since it last worked properly
* If there is still a desire for this concept, it can be added
with a custom snippet and with a better systemd unit
2022-10-20 08:43:48 -07:00
Dalton Hubble 3ff2d38fa5 Update Cilium from v1.12.2 to v1.12.3
* https://github.com/cilium/cilium/releases/tag/v1.12.3
2022-10-17 17:25:23 -07:00
Dalton Hubble 651151805d Update Kubernetes v1.25.2 to v1.25.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1253
2022-10-13 21:02:39 -07:00
Dalton Hubble 3ee462a24c Update Kubernetes from v1.25.1 to v1.25.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1252
2022-09-22 08:15:30 -07:00
Dalton Hubble 74d4d56dbd Remove workaround for v1.25.0 ConfigMap rendering issue
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: https://github.com/kubernetes/kubernetes/issues/112081
2022-09-19 09:10:24 -07:00
Dalton Hubble 5abe84b520 Update etcd from v3.5.4 to v3.5.5
* https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-3.5.md#v355
2022-09-15 09:01:45 -07:00
Dalton Hubble 951209d113 Update Cilium from v1.12.1 to v1.12.2
* https://github.com/cilium/cilium/releases/tag/v1.12.2
2022-09-15 08:28:37 -07:00
Dalton Hubble 09751cc0e8 Update Kubernetes from v1.25.0 to v1.25.1
* https://github.com/kubernetes/kubernetes/releases/tag/v1.25.1
2022-09-15 08:23:22 -07:00
Dalton Hubble c14300f0be Update Calico from v3.23.3 to v3.24.1
* https://github.com/projectcalico/calico/releases/tag/v3.24.1
2022-09-14 08:09:38 -07:00
Dalton Hubble 1786e34f33 Revert Graceful Node Shutdown feature
* Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in
Kubernetes v1.25.0 https://github.com/poseidon/typhoon/pull/1222)
* Graceful node shutdown shutdown allows 30s for critical pods to
shutdown and 15s for regular pods to shutdown before releasing the
inhibitor lock to allow the host to shutdown
* Unfortunately, both pods and the node are shutdown at the same
time at the end of the 45s period without further configuration
options. As a result, regular pods and the node are shutdown at the
same time. In practice, enabling this feature leaves Error or Completed
pods in kube-apiserver state until manually cleaned up. This feature
is not ready for general use
* Fix issue where Error/Completed pods are accumulating whenever any
node restarts (or auto-updates), visible in kubectl get pods
* This issue wasn't apparent in initial testing and seems to only
affect non-critical pods (due to critical pods being killed earlier)
But its very apparent on our real clusters

Rel: https://github.com/kubernetes/kubernetes/issues/110755
2022-09-10 14:58:44 -07:00
Dalton Hubble 393a38deff Configure Graceful Node Shutdown and lengthen max inhibitor delay
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* https://github.com/kubernetes/kubernetes/issues/107043
* https://github.com/coreos/fedora-coreos-tracker/issues/821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
2022-08-28 10:37:33 -07:00
Dalton Hubble 275fc0f9e8 Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature
* Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring
feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in
Pods, as shown by conformance tests
* Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back
to alpha so its not enabled by default, but that will require a release
* Disable the feature gate directly as a workaround for now to make
Kubernetes v1.25.0 usable

```
FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478
```

Rel:

* https://github.com/kubernetes/kubernetes/pull/112076
* https://github.com/kubernetes/kubernetes/pull/107329
2022-08-27 09:49:35 -07:00
Dalton Hubble 3fb59a3289 Migrate most Kubelet flags to KubeletConfiguration file
* Add a KubeletConfiguration file to replace most Kubelet
flags, to prepare for upcoming changes
* Pass Kubelet the --config flag to specify the location of
the KubeletConfiguration
* Remove flsgs / configuration where it matches the defaults
  * Remove --cgroups-per-qos, defaults to true
  * Remove --container-runtime, defaults to remote
  * Remove enforce-node-allocatable=pods, defaults to pods

Rel:

* https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
2022-08-27 09:28:15 -07:00
Dalton Hubble a31dbceac6 Update Kubernetes from v1.24.4 to v1.25.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md
2022-08-25 09:18:14 -07:00
Dalton Hubble 760b4cd5ee Update Kubernetes from v1.24.3 to v1.24.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244
2022-08-17 20:09:30 -07:00
Dalton Hubble fcd8ff2b17 Update Cilium from v1.12.0 to v1.12.1
* https://github.com/cilium/cilium/releases/tag/v1.12.1
2022-08-17 08:53:56 -07:00
Dalton Hubble 52427a4271 Refresh instances in autoscaling group when launch configuration changes
* Changes to worker launch configurations start an autoscaling group instance
refresh to replace instances
* Instance refresh creates surge instances, waits for a warm-up period, then
deletes old instances
* Changing worker_type, disk_*, worker_price, worker_target_groups, or Butane
worker_snippets on existing worker nodes will replace instances
* New AMIs or changing `os_stream` will be ignored, to allow Fedora CoreOS or
Flatcar Linux to keep themselves updated
* Previously, new launch configurations were made in the same way, but not
applied to instances unless manually replaced
2022-08-14 21:43:49 -07:00
Dalton Hubble 6facfca4ed Switch Kubernetes image registry from k8s.gcr.io to registry.k8s.io
* Announce: https://groups.google.com/g/kubernetes-sig-testing/c/U7b_im9vRrM

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/319
2022-08-13 16:16:21 -07:00
Dalton Hubble ed8c6a5aeb Upgrade CoreDNS from v1.8.5 to v1.9.3
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/318
2022-08-13 15:43:03 -07:00
Dalton Hubble 87a8278c9d Improve AWS autoscaling group and launch config names
* Rename launch configuration to use a name_prefix named after the
cluster and worker to improve identifiability
* Shorten AWS autoscaling group name to not include the launch config
id. Years ago this used to be needed to update the ASG but the AWS
provider detects changes to the launch configuration just fine
2022-08-08 20:46:08 -07:00
Dalton Hubble 62d47ad3f0 Update Cilium from v1.11.7 to v1.12.0
* https://github.com/cilium/cilium/releases/tag/v1.12.0
2022-08-08 19:59:03 -07:00
Dalton Hubble 4a469513dd Migrate Flatcar Linux from Ignition spec v2.3.0 to v3.3.0
* Requires poseidon v0.11+ and Flatcar Linux 3185.0.0+ (action required)
* Previously, Flatcar Linux configs have been parsed as Container
Linux Configs to Ignition v2.2.0 specs by poseidon/ct
* Flatcar Linux starting in 3185.0.0 now supports Ignition v3.x specs
(which are rendered from Butane Configs, like Fedora CoreOS)
* poseidon/ct v0.11.0 adds support for the flatcar Butane Config
variant so that Flatcar Linux can use Ignition v3.x

Rel:

* [Flatcar Support](https://flatcar-linux.org/docs/latest/provisioning/ignition/specification/#ignition-v3)
* [poseidon/ct support](https://github.com/poseidon/terraform-provider-ct/pull/131)
2022-08-03 08:32:52 -07:00
Dalton Hubble 256b87812e Remove Terraform template provider dependency
* Use Terraform builtin templatefile functionality
* Remove dependency on deprecated Terraform template provider

Rel:

* https://registry.terraform.io/providers/hashicorp/template/2.2.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/293
2022-08-02 18:15:03 -07:00
Dalton Hubble c6794f1007 Update Calico from v3.23.1 to v3.23.3
* https://github.com/projectcalico/calico/releases/tag/v3.23.3
2022-07-30 18:15:33 -07:00
Dalton Hubble f42b45451b Update Cilium from v1.11.6 to v1.11.7
* https://github.com/cilium/cilium/releases/tag/v1.11.7
2022-07-19 09:06:15 -07:00
Dalton Hubble 0db5f86110 Update Kubernetes from v1.24.2 to v1.24.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1243
2022-07-13 20:59:15 -07:00
Dalton Hubble 8398182956 Update Cilium and Calico CNI providers
* Update Cilium from v1.11.5 to v1.11.6
* Update Calico from v3.22.2 to v3.23.1
2022-06-18 19:29:01 -07:00
Dalton Hubble 6d6b48b201 Update Kubernetes from v1.24.1 to v1.24.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1242
2022-06-18 18:35:42 -07:00
Dalton Hubble b8549a1e32 Update Cilium from v1.11.4 to v1.11.5
* https://github.com/poseidon/terraform-render-bootstrap/pull/309
2022-05-31 15:23:07 +01:00
Dalton Hubble c5573199db Update Kubernetes from v1.24.0 to v1.24.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1241
2022-05-28 09:39:14 +01:00
Dalton Hubble b0e0b132e4 Update Kubernetes from v1.23.6 to v1.24.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1240
2022-05-04 08:27:14 -07:00
Dalton Hubble 91b38bf3fd Update etcd from v3.5.2 to v3.5.4
* https://github.com/etcd-io/etcd/releases/tag/v3.5.4
2022-04-27 20:57:02 -07:00
Dalton Hubble d7f55c4e46 Remove use of deprecated `key_algorithm` field in TLS assets
* Fixes warning about use of deprecated field `key_algorithm` in
the `hashicorp/tls` provider. The key algorithm can now be inferred
directly from the private key so resources don't have to output
and pass around the algorithm
2022-04-20 19:52:03 -07:00
Dalton Hubble 80c6e2e7e6 Update Kubernetes from v1.23.5 to v1.23.6
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1236
2022-04-20 19:39:05 -07:00
Dalton Hubble 2f7d2a92e0 Update Cilium and Calico CNI providers
* Update Cilium from v1.11.3 to v1.11.4
* Update Calico from v3.22.1 to v3.22.2
2022-04-19 08:28:52 -07:00
Dalton Hubble 2df1873b7f Update Cilium from v1.11.2 to v1.11.3
* https://github.com/cilium/cilium/releases/tag/v1.11.3
2022-04-01 16:44:30 -07:00
Dalton Hubble e61d4b92da Update Kubernetes from v1.23.4 to v1.23.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1235
2022-03-16 21:01:41 -07:00
Dalton Hubble 69770b4827 Update Calico from v3.21.2 to v3.22.1
* https://github.com/projectcalico/calico/releases/tag/v3.22.1
* Fix https://github.com/projectcalico/calico/issues/5011
2022-03-11 11:22:29 -08:00
Dalton Hubble f797f97675 Update Cilium from v1.11.1 to v1.11.2
* https://github.com/cilium/cilium/releases/tag/v1.11.2
2022-03-11 10:08:24 -08:00
Dalton Hubble 9aa99f1996 Allow upgrading AWS Terraform provider to v4.x
* https://github.com/hashicorp/terraform-provider-aws/releases/tag/v4.0.0
2022-02-17 09:35:15 -08:00
Dalton Hubble fc38ba45b1 Update Kubernetes from v1.23.3 to v1.23.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1234
2022-02-17 09:00:31 -08:00
Dalton Hubble 6c70d06937 Update etcd from v3.5.1 to v3.5.2
* https://github.com/etcd-io/etcd/releases/tag/v3.5.2
2022-02-07 08:10:17 -08:00
Dalton Hubble cf4beeba34 Change default CNI provider from Calico to Cilium
* Cilium (v1.8) was added to Typhoon in v1.18.5 in June 2020
and its become more impressive since then. Its currently the
leading CNI provider choice.
* Calico has grown complex, has lots of CRDs, masks its
management complexity with an operator (which we won't use),
doesn't provide multi-arch images, and hasn't been compatible
with Kubernetes v1.23 (with ipvs) for several releases.
* Both have CNCF conformance quirks (flannel used for conformance),
but that's not the main factor in choosing the default
2022-02-07 08:07:00 -08:00
Dalton Hubble a527f73f5a Update Kubernetes from v1.23.2 to v1.23.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1233
2022-01-27 09:23:37 -08:00