Commit Graph

135 Commits

Author SHA1 Message Date
Dalton Hubble a193762eed Update etcd from v3.5.5 to v3.5.6
* https://github.com/etcd-io/etcd/releases/tag/v3.5.6
2022-11-23 10:59:17 -08:00
Dalton Hubble adf33df99b Update Cilium from v1.12.3 to v1.12.4
* https://github.com/cilium/cilium/releases/tag/v1.12.4
2022-11-23 10:58:27 -08:00
Dalton Hubble 26dbc7e91d Update Kubernetes from v1.25.3 to v1.25.4
* Update Calico from v3.24.3 to v3.24.5
* Update Prometheus and Grafana addons
2022-11-10 09:42:21 -08:00
Dalton Hubble 937acc4b5a Re-enable Graceful Node Shutdown feature
* Kubelet GracefulNodeShutdown works, but only partially handles
gracefully stopping the Kubelet. The most noticeable drawback
is that Completed Pods are left around
* Use a project like poseidon/scuttle or a similar systemd unit
as a snippet to add drain and/or delete behaviors if desired
* This reverts commit 1786e34f33.

Rel:

* https://www.psdn.io/posts/kubelet-graceful-shutdown/
* https://github.com/poseidon/scuttle
2022-11-02 20:49:01 -07:00
Dalton Hubble 9b733d79c7 Update Calico v3.24.2 to v3.24.3
* https://github.com/projectcalico/calico/releases/tag/v3.24.3
* Add patch to allow Kubelet kubeconfig to drain nodes if desired
in addition to just deleting them in shutdown integrations. See
https://github.com/poseidon/terraform-render-bootstrap/pull/330
2022-10-23 22:00:15 -07:00
Dalton Hubble 35a9e22b1f Update Calico from v3.24.1 to v3.24.2
* https://github.com/projectcalico/calico/releases/tag/v3.24.2
2022-10-20 09:28:19 -07:00
Dalton Hubble 0f38a6d405 Remove defunct delete-node.service from worker nodes
* delete-node.service used to be used to remove nodes from the
cluster on shutdown, but its long since it last worked properly
* If there is still a desire for this concept, it can be added
with a custom snippet and with a better systemd unit
2022-10-20 08:43:48 -07:00
Dalton Hubble 3ff2d38fa5 Update Cilium from v1.12.2 to v1.12.3
* https://github.com/cilium/cilium/releases/tag/v1.12.3
2022-10-17 17:25:23 -07:00
Dalton Hubble 651151805d Update Kubernetes v1.25.2 to v1.25.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1253
2022-10-13 21:02:39 -07:00
Dalton Hubble 3ee462a24c Update Kubernetes from v1.25.1 to v1.25.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1252
2022-09-22 08:15:30 -07:00
Dalton Hubble 74d4d56dbd Remove workaround for v1.25.0 ConfigMap rendering issue
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: https://github.com/kubernetes/kubernetes/issues/112081
2022-09-19 09:10:24 -07:00
Dalton Hubble 5abe84b520 Update etcd from v3.5.4 to v3.5.5
* https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-3.5.md#v355
2022-09-15 09:01:45 -07:00
Dalton Hubble 951209d113 Update Cilium from v1.12.1 to v1.12.2
* https://github.com/cilium/cilium/releases/tag/v1.12.2
2022-09-15 08:28:37 -07:00
Dalton Hubble 09751cc0e8 Update Kubernetes from v1.25.0 to v1.25.1
* https://github.com/kubernetes/kubernetes/releases/tag/v1.25.1
2022-09-15 08:23:22 -07:00
Dalton Hubble c14300f0be Update Calico from v3.23.3 to v3.24.1
* https://github.com/projectcalico/calico/releases/tag/v3.24.1
2022-09-14 08:09:38 -07:00
Dalton Hubble 1786e34f33 Revert Graceful Node Shutdown feature
* Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in
Kubernetes v1.25.0 https://github.com/poseidon/typhoon/pull/1222)
* Graceful node shutdown shutdown allows 30s for critical pods to
shutdown and 15s for regular pods to shutdown before releasing the
inhibitor lock to allow the host to shutdown
* Unfortunately, both pods and the node are shutdown at the same
time at the end of the 45s period without further configuration
options. As a result, regular pods and the node are shutdown at the
same time. In practice, enabling this feature leaves Error or Completed
pods in kube-apiserver state until manually cleaned up. This feature
is not ready for general use
* Fix issue where Error/Completed pods are accumulating whenever any
node restarts (or auto-updates), visible in kubectl get pods
* This issue wasn't apparent in initial testing and seems to only
affect non-critical pods (due to critical pods being killed earlier)
But its very apparent on our real clusters

Rel: https://github.com/kubernetes/kubernetes/issues/110755
2022-09-10 14:58:44 -07:00
Dalton Hubble 393a38deff Configure Graceful Node Shutdown and lengthen max inhibitor delay
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* https://github.com/kubernetes/kubernetes/issues/107043
* https://github.com/coreos/fedora-coreos-tracker/issues/821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
2022-08-28 10:37:33 -07:00
Dalton Hubble 275fc0f9e8 Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature
* Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring
feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in
Pods, as shown by conformance tests
* Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back
to alpha so its not enabled by default, but that will require a release
* Disable the feature gate directly as a workaround for now to make
Kubernetes v1.25.0 usable

```
FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478
```

Rel:

* https://github.com/kubernetes/kubernetes/pull/112076
* https://github.com/kubernetes/kubernetes/pull/107329
2022-08-27 09:49:35 -07:00
Dalton Hubble 3fb59a3289 Migrate most Kubelet flags to KubeletConfiguration file
* Add a KubeletConfiguration file to replace most Kubelet
flags, to prepare for upcoming changes
* Pass Kubelet the --config flag to specify the location of
the KubeletConfiguration
* Remove flsgs / configuration where it matches the defaults
  * Remove --cgroups-per-qos, defaults to true
  * Remove --container-runtime, defaults to remote
  * Remove enforce-node-allocatable=pods, defaults to pods

Rel:

* https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
2022-08-27 09:28:15 -07:00
Dalton Hubble a31dbceac6 Update Kubernetes from v1.24.4 to v1.25.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md
2022-08-25 09:18:14 -07:00
Dalton Hubble 760b4cd5ee Update Kubernetes from v1.24.3 to v1.24.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244
2022-08-17 20:09:30 -07:00
Dalton Hubble fcd8ff2b17 Update Cilium from v1.12.0 to v1.12.1
* https://github.com/cilium/cilium/releases/tag/v1.12.1
2022-08-17 08:53:56 -07:00
Dalton Hubble 6facfca4ed Switch Kubernetes image registry from k8s.gcr.io to registry.k8s.io
* Announce: https://groups.google.com/g/kubernetes-sig-testing/c/U7b_im9vRrM

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/319
2022-08-13 16:16:21 -07:00
Dalton Hubble ed8c6a5aeb Upgrade CoreDNS from v1.8.5 to v1.9.3
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/318
2022-08-13 15:43:03 -07:00
Dalton Hubble 62d47ad3f0 Update Cilium from v1.11.7 to v1.12.0
* https://github.com/cilium/cilium/releases/tag/v1.12.0
2022-08-08 19:59:03 -07:00
Dalton Hubble 4a469513dd Migrate Flatcar Linux from Ignition spec v2.3.0 to v3.3.0
* Requires poseidon v0.11+ and Flatcar Linux 3185.0.0+ (action required)
* Previously, Flatcar Linux configs have been parsed as Container
Linux Configs to Ignition v2.2.0 specs by poseidon/ct
* Flatcar Linux starting in 3185.0.0 now supports Ignition v3.x specs
(which are rendered from Butane Configs, like Fedora CoreOS)
* poseidon/ct v0.11.0 adds support for the flatcar Butane Config
variant so that Flatcar Linux can use Ignition v3.x

Rel:

* [Flatcar Support](https://flatcar-linux.org/docs/latest/provisioning/ignition/specification/#ignition-v3)
* [poseidon/ct support](https://github.com/poseidon/terraform-provider-ct/pull/131)
2022-08-03 08:32:52 -07:00
Dalton Hubble 256b87812e Remove Terraform template provider dependency
* Use Terraform builtin templatefile functionality
* Remove dependency on deprecated Terraform template provider

Rel:

* https://registry.terraform.io/providers/hashicorp/template/2.2.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/293
2022-08-02 18:15:03 -07:00
Dalton Hubble c6794f1007 Update Calico from v3.23.1 to v3.23.3
* https://github.com/projectcalico/calico/releases/tag/v3.23.3
2022-07-30 18:15:33 -07:00
Dalton Hubble f42b45451b Update Cilium from v1.11.6 to v1.11.7
* https://github.com/cilium/cilium/releases/tag/v1.11.7
2022-07-19 09:06:15 -07:00
Dalton Hubble 0db5f86110 Update Kubernetes from v1.24.2 to v1.24.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1243
2022-07-13 20:59:15 -07:00
Dalton Hubble 8398182956 Update Cilium and Calico CNI providers
* Update Cilium from v1.11.5 to v1.11.6
* Update Calico from v3.22.2 to v3.23.1
2022-06-18 19:29:01 -07:00
Dalton Hubble 6d6b48b201 Update Kubernetes from v1.24.1 to v1.24.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1242
2022-06-18 18:35:42 -07:00
Dalton Hubble b8549a1e32 Update Cilium from v1.11.4 to v1.11.5
* https://github.com/poseidon/terraform-render-bootstrap/pull/309
2022-05-31 15:23:07 +01:00
Dalton Hubble c5573199db Update Kubernetes from v1.24.0 to v1.24.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1241
2022-05-28 09:39:14 +01:00
Dalton Hubble b0e0b132e4 Update Kubernetes from v1.23.6 to v1.24.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1240
2022-05-04 08:27:14 -07:00
Dalton Hubble 91b38bf3fd Update etcd from v3.5.2 to v3.5.4
* https://github.com/etcd-io/etcd/releases/tag/v3.5.4
2022-04-27 20:57:02 -07:00
Dalton Hubble d7f55c4e46 Remove use of deprecated `key_algorithm` field in TLS assets
* Fixes warning about use of deprecated field `key_algorithm` in
the `hashicorp/tls` provider. The key algorithm can now be inferred
directly from the private key so resources don't have to output
and pass around the algorithm
2022-04-20 19:52:03 -07:00
Dalton Hubble 80c6e2e7e6 Update Kubernetes from v1.23.5 to v1.23.6
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1236
2022-04-20 19:39:05 -07:00
Dalton Hubble 2f7d2a92e0 Update Cilium and Calico CNI providers
* Update Cilium from v1.11.3 to v1.11.4
* Update Calico from v3.22.1 to v3.22.2
2022-04-19 08:28:52 -07:00
Dalton Hubble 2df1873b7f Update Cilium from v1.11.2 to v1.11.3
* https://github.com/cilium/cilium/releases/tag/v1.11.3
2022-04-01 16:44:30 -07:00
Dalton Hubble e61d4b92da Update Kubernetes from v1.23.4 to v1.23.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1235
2022-03-16 21:01:41 -07:00
Dalton Hubble 69770b4827 Update Calico from v3.21.2 to v3.22.1
* https://github.com/projectcalico/calico/releases/tag/v3.22.1
* Fix https://github.com/projectcalico/calico/issues/5011
2022-03-11 11:22:29 -08:00
Dalton Hubble f797f97675 Update Cilium from v1.11.1 to v1.11.2
* https://github.com/cilium/cilium/releases/tag/v1.11.2
2022-03-11 10:08:24 -08:00
Dalton Hubble fc38ba45b1 Update Kubernetes from v1.23.3 to v1.23.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1234
2022-02-17 09:00:31 -08:00
Dalton Hubble 6c70d06937 Update etcd from v3.5.1 to v3.5.2
* https://github.com/etcd-io/etcd/releases/tag/v3.5.2
2022-02-07 08:10:17 -08:00
Dalton Hubble cf4beeba34 Change default CNI provider from Calico to Cilium
* Cilium (v1.8) was added to Typhoon in v1.18.5 in June 2020
and its become more impressive since then. Its currently the
leading CNI provider choice.
* Calico has grown complex, has lots of CRDs, masks its
management complexity with an operator (which we won't use),
doesn't provide multi-arch images, and hasn't been compatible
with Kubernetes v1.23 (with ipvs) for several releases.
* Both have CNCF conformance quirks (flannel used for conformance),
but that's not the main factor in choosing the default
2022-02-07 08:07:00 -08:00
Dalton Hubble a527f73f5a Update Kubernetes from v1.23.2 to v1.23.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1233
2022-01-27 09:23:37 -08:00
Dalton Hubble dedd17d085 Upgrade to DigitalOcean Terraform provider v2.x
* Remove deprecated `private_networking` parameter
2022-01-19 18:32:17 -08:00
Dalton Hubble e274a451ff Update Kubernetes from v1.23.1 to v1.23.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1232
2022-01-19 17:59:49 -08:00
Dalton Hubble 2265ab5375 Remove Kubelet `--network-plugin=cni` flag
* Now that `docker-shim` is no longer used, the Kubelet flag
is no longer needed and will be removed in v1.24
2022-01-14 10:43:07 -08:00