typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-04 03:14:37 +02:00

Author	SHA1	Message	Date
Dalton Hubble	1786e34f33	Revert Graceful Node Shutdown feature * Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in Kubernetes v1.25.0 https://github.com/poseidon/typhoon/pull/1222) * Graceful node shutdown shutdown allows 30s for critical pods to shutdown and 15s for regular pods to shutdown before releasing the inhibitor lock to allow the host to shutdown * Unfortunately, both pods and the node are shutdown at the same time at the end of the 45s period without further configuration options. As a result, regular pods and the node are shutdown at the same time. In practice, enabling this feature leaves Error or Completed pods in kube-apiserver state until manually cleaned up. This feature is not ready for general use * Fix issue where Error/Completed pods are accumulating whenever any node restarts (or auto-updates), visible in kubectl get pods * This issue wasn't apparent in initial testing and seems to only affect non-critical pods (due to critical pods being killed earlier) But its very apparent on our real clusters Rel: https://github.com/kubernetes/kubernetes/issues/110755	2022-09-10 14:58:44 -07:00
Dalton Hubble	393a38deff	Configure Graceful Node Shutdown and lengthen max inhibitor delay * Configure Kubelet Graceful Node Shutdown to detect system shutdown events and stop running containers gracefully when possible * Allow up to 30s for critical pods to gracefully shutdown * Allow up to 15s for regular pods to gracefully shutdown * Node will be marked as NotReady promptly, instead of having to wait for health checks * Kubelet uses systemd inhibitor locks to delay shutdown for a limited number of seconds * Raise the default max inhibitor time from 5s to 45s Verify systemd inhibitor locks are present: ``` sudo systemd-inhibit --list WHO UID USER PID COMM WHAT WHY MODE kubelet 0 root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay ``` Tail journal logs and then shutdown a node via systemctl reboot or via the cloud console to watch container shutdown Rel: * https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/ * https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ * https://github.com/kubernetes/kubernetes/issues/107043 * https://github.com/coreos/fedora-coreos-tracker/issues/821 * https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html * https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go * https://github.com/godbus/dbus/blob/master/conn.go	2022-08-28 10:37:33 -07:00
Dalton Hubble	275fc0f9e8	Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature * Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in Pods, as shown by conformance tests * Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back to alpha so its not enabled by default, but that will require a release * Disable the feature gate directly as a workaround for now to make Kubernetes v1.25.0 usable ``` FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478 ``` Rel: * https://github.com/kubernetes/kubernetes/pull/112076 * https://github.com/kubernetes/kubernetes/pull/107329	2022-08-27 09:49:35 -07:00
Dalton Hubble	3fb59a3289	Migrate most Kubelet flags to KubeletConfiguration file * Add a KubeletConfiguration file to replace most Kubelet flags, to prepare for upcoming changes * Pass Kubelet the --config flag to specify the location of the KubeletConfiguration * Remove flsgs / configuration where it matches the defaults * Remove --cgroups-per-qos, defaults to true * Remove --container-runtime, defaults to remote * Remove enforce-node-allocatable=pods, defaults to pods Rel: * https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ * https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/	2022-08-27 09:28:15 -07:00
Dalton Hubble	a31dbceac6	Update Kubernetes from v1.24.4 to v1.25.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md	2022-08-25 09:18:14 -07:00
Dalton Hubble	e87d5aabc3	Adjust Google Cloud worker health checks to use kube-proxy healthz * Change the workers managed instance group to health check nodes via HTTP probe of the kube-proxy port 10256 /healthz endpoints * Advantages: kube-proxy is a lower value target (in case there were bugs in firewalls) that Kubelet, its more representative than health checking Kubelet (Kubelet must run AND kube-proxy Daemonset must be healthy), and its already used by kube-proxy liveness probes (better discoverability via kubectl or alerts on pods crashlooping) * Another motivator is that GKE clusters also use kube-proxy port 10256 checks to assess node health	2022-08-17 20:50:52 -07:00
Dalton Hubble	760b4cd5ee	Update Kubernetes from v1.24.3 to v1.24.4 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244	2022-08-17 20:09:30 -07:00
Dalton Hubble	20b76d6e00	Roll instance template changes to worker managed instance groups * When a worker managed instance group's (MIG) instance template changes (including machine type, disk size, or Butane snippets but excluding new AMIs), use Google Cloud's rolling update features to ensure instances match declared state * Ignore new AMIs since Fedora CoreOS and Flatcar Linux nodes already auto-update and reboot themselves * Rolling updates will create surge instances, wait for health checks, then delete old instances (0 unavilable instances) * Instances are replaced to ensure new Ignition/Butane snippets are respected * Add managed instance group autohealing (i.e. health checks) to ensure new instances' Kubelet is running Renames * Name apiserver and kubelet health checks consistently * Rename MIG from `${var.name}-worker-group` to `${var.name}-worker` Rel: https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups	2022-08-14 13:06:53 -07:00
Dalton Hubble	4a469513dd	Migrate Flatcar Linux from Ignition spec v2.3.0 to v3.3.0 * Requires poseidon v0.11+ and Flatcar Linux 3185.0.0+ (action required) * Previously, Flatcar Linux configs have been parsed as Container Linux Configs to Ignition v2.2.0 specs by poseidon/ct * Flatcar Linux starting in 3185.0.0 now supports Ignition v3.x specs (which are rendered from Butane Configs, like Fedora CoreOS) * poseidon/ct v0.11.0 adds support for the flatcar Butane Config variant so that Flatcar Linux can use Ignition v3.x Rel: * [Flatcar Support](https://flatcar-linux.org/docs/latest/provisioning/ignition/specification/#ignition-v3) * [poseidon/ct support](https://github.com/poseidon/terraform-provider-ct/pull/131)	2022-08-03 08:32:52 -07:00
Dalton Hubble	256b87812e	Remove Terraform template provider dependency * Use Terraform builtin templatefile functionality * Remove dependency on deprecated Terraform template provider Rel: * https://registry.terraform.io/providers/hashicorp/template/2.2.0 * https://github.com/poseidon/terraform-render-bootstrap/pull/293	2022-08-02 18:15:03 -07:00
Dalton Hubble	0db5f86110	Update Kubernetes from v1.24.2 to v1.24.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1243	2022-07-13 20:59:15 -07:00
Dalton Hubble	6d6b48b201	Update Kubernetes from v1.24.1 to v1.24.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1242	2022-06-18 18:35:42 -07:00
Dalton Hubble	c5573199db	Update Kubernetes from v1.24.0 to v1.24.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1241	2022-05-28 09:39:14 +01:00
Dalton Hubble	b0e0b132e4	Update Kubernetes from v1.23.6 to v1.24.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1240	2022-05-04 08:27:14 -07:00
Dalton Hubble	80c6e2e7e6	Update Kubernetes from v1.23.5 to v1.23.6 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1236	2022-04-20 19:39:05 -07:00
Dalton Hubble	e61d4b92da	Update Kubernetes from v1.23.4 to v1.23.5 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1235	2022-03-16 21:01:41 -07:00
Dalton Hubble	fc38ba45b1	Update Kubernetes from v1.23.3 to v1.23.4 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1234	2022-02-17 09:00:31 -08:00
Dalton Hubble	e06ee042ee	Switch to using Flatcar Linux images on Google Cloud * Use the official Kinvolk Flatcar Linux image on Google Cloud * Change `os_image` from a custom image name to `flatcar-stable` (default), `flatcar-beta`, or `flatcar-alpha` (action required) * Change `os_image` from a required to an optional variable * Promote Typhoon on Flatcar Linux / Google Cloud to stable * Remove docs about needing to upload a Flatcar Linux image manually on Google Cloud and drop support for custom images	2022-01-28 21:04:10 -08:00
Dalton Hubble	a527f73f5a	Update Kubernetes from v1.23.2 to v1.23.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1233	2022-01-27 09:23:37 -08:00
Dalton Hubble	e274a451ff	Update Kubernetes from v1.23.1 to v1.23.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1232	2022-01-19 17:59:49 -08:00
Dalton Hubble	2265ab5375	Remove Kubelet `--network-plugin=cni` flag * Now that `docker-shim` is no longer used, the Kubelet flag is no longer needed and will be removed in v1.24	2022-01-14 10:43:07 -08:00
Dalton Hubble	9e3807798f	Update Kubernetes from v1.23.0 to v1.23.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1231	2021-12-20 08:36:19 -08:00
Dalton Hubble	ef9c6aa423	Switch Flatcar Linux to using containerd CRI * Use containerd as the Kubernetes Container Runtime	2021-12-15 08:42:13 -08:00
Dalton Hubble	136107b448	Set Kubelet resolver config to /run/systemd/resolve/resolv.conf * Both Flatcar Linux and Fedora CoreOS use systemd-resolved, but they setup /etc/resolv.conf symlinks differently * Prefer using /run/systemd/resolve/resolv.conf directly, which also updates to reflect runtime changes (e.g. resolvectl)	2021-12-10 08:22:30 -08:00
Dalton Hubble	861021ee98	Update Kubernetes from v1.22.4 to v1.23.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1230 * With Calico, add missing caliconodestatuses CRD added in v3.21.0 https://github.com/poseidon/terraform-render-bootstrap/pull/289	2021-12-09 09:28:41 -08:00
Dalton Hubble	a8fd21d250	Update minimum Terraform provider versions * Update `null` provider to allow use of v3.1.x releases, instead of being stuck on v2.1.2 * Update min versions in terraform-render-boostrap https://github.com/poseidon/terraform-render-bootstrap/pull/287 * Document the recommended versions of Terraform cloud providers	2021-12-07 16:26:34 -08:00
Dalton Hubble	93594292eb	Update Kubernetes from v1.22.3 to v1.22.4 * Update flannel from v0.15.0 to v0.15.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1224	2021-11-17 19:53:32 -08:00
Dalton Hubble	4fd43b39ad	Fix Flatcar Linux docker driver and add cgroups v2 * Remove `/sys/fs/cgroup/systemd` mount since Flatcar Linux uses cgroups v2 * Flatcar Linux's `docker` switched from the `cgroupfs` to `systemd` driver without notice	2021-11-12 21:07:20 -08:00
Dalton Hubble	07db4c1143	Allow use of google Terraform provider v4.0+ * https://github.com/hashicorp/terraform-provider-google/releases/tag/v4.0.0	2021-11-11 10:17:58 -08:00
Dalton Hubble	dd4a5a4e7e	Update Kubernetes from v1.22.2 to v1.22.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1223	2021-10-28 10:11:06 -07:00
Dalton Hubble	bb7f31822e	Update Kubernetes from v1.22.1 to v1.22.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1222	2021-09-15 19:56:24 -07:00
Dalton Hubble	fcbdb50d93	Update Kubernetes from v1.22.0 to v1.22.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1221	2021-08-19 21:12:02 -07:00
Dalton Hubble	9bac641511	Update Kubernetes from v1.21.3 to v1.22.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1220	2021-08-04 22:09:19 -07:00
Dalton Hubble	fdade5b40c	Update poseidon/ct provider from v0.8.0 to v0.9.0 * Continue targeting Ignition v3.2.0 for some time	2021-07-18 09:05:02 -07:00
Dalton Hubble	171fd2c998	Update Kubernetes from v1.21.2 to v1.21.3 * https://github.com/kubernetes/kubernetes/releases/tag/v1.21.3	2021-07-17 18:22:24 -07:00
Dalton Hubble	0b276b6b7e	Update Kubernetes from v1.21.1 to v1.21.2 * https://github.com/kubernetes/kubernetes/releases/tag/v1.21.2	2021-06-17 16:15:20 -07:00
Dalton Hubble	e8513e58bb	Add support for Terraform v1.0.0 * https://github.com/hashicorp/terraform/releases/tag/v1.0.0	2021-06-17 13:32:56 -07:00
Dalton Hubble	2076a779a3	Update Kubernetes from v1.21.0 to v1.21.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1211	2021-05-13 11:23:26 -07:00
Dalton Hubble	b152b9f973	Reduce the default disk_size from 40GB to 30GB * We're typically reducing the `disk_size` in real clusters since the space is under used. The default should be lower.	2021-04-26 11:43:26 -07:00
Dalton Hubble	67047ead08	Update Terraform version to allow v0.15.0 * Require Terraform version v0.13 <= x < v0.16	2021-04-16 09:46:01 -07:00
Dalton Hubble	ebd9570ede	Update Fedora CoreOS Config version from v1.1.0 to v1.2.0 * Require [poseidon/ct](https://github.com/poseidon/terraform-provider-ct) Terraform provider v0.8+ * Require any [snippets](https://typhoon.psdn.io/advanced/customization/#hosts) customizations to update to v1.2.0 See upgrade [notes](https://typhoon.psdn.io/topics/maintenance/#upgrade-terraform-provider-ct)	2021-04-11 15:26:54 -07:00
Dalton Hubble	084e8bea49	Allow custom initial node taints on worker pool nodes * Add `node_taints` variable to worker modules to set custom initial node taints on cloud platforms that support auto-scaling worker pools of heterogeneous nodes (i.e. AWS, Azure, GCP) * Worker pools could use custom `node_labels` to allowed workloads to select among differentiated nodes, while custom `node_taints` allows a worker pool's nodes to be tainted as special to prevent scheduling, except by workloads that explicitly tolerate the taint * Expose `daemonset_tolerations` in AWS, Azure, and GCP kubernetes cluster modules, to determine whether `kube-system` components should tolerate the custom taint (advanced use covered in docs) Rel: #550, #663 Closes #429	2021-04-11 15:00:11 -07:00
Dalton Hubble	d73621c838	Update Kubernetes from v1.20.5 to v1.21.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1210	2021-04-08 21:44:31 -07:00
Dalton Hubble	798ec9a92f	Change CNI config directory to /etc/cni/net.d * Change CNI config directory from `/etc/kubernetes/cni/net.d` to `/etc/cni/net.d` (Kubelet default) * https://github.com/poseidon/terraform-render-bootstrap/pull/255	2021-04-02 00:03:48 -07:00
Dalton Hubble	796149d122	Update Kubernetes from v1.20.4 to v1.20.5 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1205	2021-03-19 11:27:31 -07:00
Dalton Hubble	e76fe80b45	Update Kubernetes from v1.20.3 to v1.20.4 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1204	2021-02-19 00:02:07 -08:00
Dalton Hubble	32853aaa7b	Update Kubernetes from v1.20.2 to v1.20.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1203	2021-02-17 22:29:33 -08:00
Dalton Hubble	05f7df9e80	Update Kubernetes from v1.20.1 to v1.20.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1202	2021-01-13 17:46:51 -08:00
Dalton Hubble	4220b9ce18	Add support for Terraform v0.14.4+ * Support Terraform v0.13.x and v0.14.4+	2021-01-12 21:43:12 -08:00
Dalton Hubble	646bdd78e4	Update Kubernetes from v1.20.0 to v1.20.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1201	2020-12-19 12:56:28 -08:00

1 2

58 Commits