typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-03 20:14:37 +02:00

Author	SHA1	Message	Date
Dalton Hubble	937acc4b5a	Re-enable Graceful Node Shutdown feature * Kubelet GracefulNodeShutdown works, but only partially handles gracefully stopping the Kubelet. The most noticeable drawback is that Completed Pods are left around * Use a project like poseidon/scuttle or a similar systemd unit as a snippet to add drain and/or delete behaviors if desired * This reverts commit `1786e34f33`. Rel: * https://www.psdn.io/posts/kubelet-graceful-shutdown/ * https://github.com/poseidon/scuttle	2022-11-02 20:49:01 -07:00
Dalton Hubble	9b733d79c7	Update Calico v3.24.2 to v3.24.3 * https://github.com/projectcalico/calico/releases/tag/v3.24.3 * Add patch to allow Kubelet kubeconfig to drain nodes if desired in addition to just deleting them in shutdown integrations. See https://github.com/poseidon/terraform-render-bootstrap/pull/330	2022-10-23 22:00:15 -07:00
Dalton Hubble	35a9e22b1f	Update Calico from v3.24.1 to v3.24.2 * https://github.com/projectcalico/calico/releases/tag/v3.24.2	2022-10-20 09:28:19 -07:00
Dalton Hubble	0f38a6d405	Remove defunct delete-node.service from worker nodes * delete-node.service used to be used to remove nodes from the cluster on shutdown, but its long since it last worked properly * If there is still a desire for this concept, it can be added with a custom snippet and with a better systemd unit	2022-10-20 08:43:48 -07:00
Dalton Hubble	a535581ef2	Remove unused Wants=network.target from etcd-member * network.target is a passive unit that's not actually pulled in by units requiring or wanting it, its only used for shutdown ordering > "Services using the network should ... avoid any Wants=network.target or even Requires=network.target" Rel: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/	2022-10-20 08:32:55 -07:00
Dalton Hubble	3ff2d38fa5	Update Cilium from v1.12.2 to v1.12.3 * https://github.com/cilium/cilium/releases/tag/v1.12.3	2022-10-17 17:25:23 -07:00
Dalton Hubble	b68f8bb2a9	Switch Azure Fedora CoreOS default worker type * Change default Azure worker_type from Standard_DS1_v2 to Standard_D2as_v5 * Get 2 VCPU, 7 GiB, 12500Mbps (vs 1 VCPU, 3.5GiB, 750 Mbps) * Small increase in pay-as-you-go price ($53.29 -> $62.78) * Small increase in spot price ($5.64/mo -> $7.37/mo) * Change from Intel to AMD EPYC (`D2as_v5` cheaper than `D2s_v5`) Rel: * https://github.com/poseidon/typhoon/pull/1248 * https://learn.microsoft.com/en-us/azure/virtual-machines/dasv5-dadsv5-series#dasv5-series * https://learn.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series#dsv2-series	2022-10-13 21:23:57 -07:00
Dalton Hubble	651151805d	Update Kubernetes v1.25.2 to v1.25.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1253	2022-10-13 21:02:39 -07:00
Dalton Hubble	3ee462a24c	Update Kubernetes from v1.25.1 to v1.25.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1252	2022-09-22 08:15:30 -07:00
Dalton Hubble	74d4d56dbd	Remove workaround for v1.25.0 ConfigMap rendering issue * LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to alpha in v1.25.1, so we don't need to explicitly disable it anymore Rel: https://github.com/kubernetes/kubernetes/issues/112081	2022-09-19 09:10:24 -07:00
Dalton Hubble	5abe84b520	Update etcd from v3.5.4 to v3.5.5 * https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-3.5.md#v355	2022-09-15 09:01:45 -07:00
Dalton Hubble	951209d113	Update Cilium from v1.12.1 to v1.12.2 * https://github.com/cilium/cilium/releases/tag/v1.12.2	2022-09-15 08:28:37 -07:00
Dalton Hubble	09751cc0e8	Update Kubernetes from v1.25.0 to v1.25.1 * https://github.com/kubernetes/kubernetes/releases/tag/v1.25.1	2022-09-15 08:23:22 -07:00
Dalton Hubble	c14300f0be	Update Calico from v3.23.3 to v3.24.1 * https://github.com/projectcalico/calico/releases/tag/v3.24.1	2022-09-14 08:09:38 -07:00
Dalton Hubble	1786e34f33	Revert Graceful Node Shutdown feature * Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in Kubernetes v1.25.0 https://github.com/poseidon/typhoon/pull/1222) * Graceful node shutdown shutdown allows 30s for critical pods to shutdown and 15s for regular pods to shutdown before releasing the inhibitor lock to allow the host to shutdown * Unfortunately, both pods and the node are shutdown at the same time at the end of the 45s period without further configuration options. As a result, regular pods and the node are shutdown at the same time. In practice, enabling this feature leaves Error or Completed pods in kube-apiserver state until manually cleaned up. This feature is not ready for general use * Fix issue where Error/Completed pods are accumulating whenever any node restarts (or auto-updates), visible in kubectl get pods * This issue wasn't apparent in initial testing and seems to only affect non-critical pods (due to critical pods being killed earlier) But its very apparent on our real clusters Rel: https://github.com/kubernetes/kubernetes/issues/110755	2022-09-10 14:58:44 -07:00
Dalton Hubble	393a38deff	Configure Graceful Node Shutdown and lengthen max inhibitor delay * Configure Kubelet Graceful Node Shutdown to detect system shutdown events and stop running containers gracefully when possible * Allow up to 30s for critical pods to gracefully shutdown * Allow up to 15s for regular pods to gracefully shutdown * Node will be marked as NotReady promptly, instead of having to wait for health checks * Kubelet uses systemd inhibitor locks to delay shutdown for a limited number of seconds * Raise the default max inhibitor time from 5s to 45s Verify systemd inhibitor locks are present: ``` sudo systemd-inhibit --list WHO UID USER PID COMM WHAT WHY MODE kubelet 0 root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay ``` Tail journal logs and then shutdown a node via systemctl reboot or via the cloud console to watch container shutdown Rel: * https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/ * https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ * https://github.com/kubernetes/kubernetes/issues/107043 * https://github.com/coreos/fedora-coreos-tracker/issues/821 * https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html * https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go * https://github.com/godbus/dbus/blob/master/conn.go	2022-08-28 10:37:33 -07:00
Dalton Hubble	76d92e9c2d	Change podman log-driver from journald to k8s-file * When podman runs the Kubelet container, logging to journald means log lines are duplicated in the journal. journalctl -u kubelet shows Kubelet's logs and the same log messages from podman. Using the k8s-file driver alleviates this problem * Fix Kubelet and etcd-member logs to be more readable and reduce unneccessary Kubelet log volume	2022-08-27 17:15:22 -07:00
Dalton Hubble	275fc0f9e8	Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature * Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in Pods, as shown by conformance tests * Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back to alpha so its not enabled by default, but that will require a release * Disable the feature gate directly as a workaround for now to make Kubernetes v1.25.0 usable ``` FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478 ``` Rel: * https://github.com/kubernetes/kubernetes/pull/112076 * https://github.com/kubernetes/kubernetes/pull/107329	2022-08-27 09:49:35 -07:00
Dalton Hubble	3fb59a3289	Migrate most Kubelet flags to KubeletConfiguration file * Add a KubeletConfiguration file to replace most Kubelet flags, to prepare for upcoming changes * Pass Kubelet the --config flag to specify the location of the KubeletConfiguration * Remove flsgs / configuration where it matches the defaults * Remove --cgroups-per-qos, defaults to true * Remove --container-runtime, defaults to remote * Remove enforce-node-allocatable=pods, defaults to pods Rel: * https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ * https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/	2022-08-27 09:28:15 -07:00
Dalton Hubble	a31dbceac6	Update Kubernetes from v1.24.4 to v1.25.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md	2022-08-25 09:18:14 -07:00
Dalton Hubble	760b4cd5ee	Update Kubernetes from v1.24.3 to v1.24.4 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244	2022-08-17 20:09:30 -07:00
Dalton Hubble	fcd8ff2b17	Update Cilium from v1.12.0 to v1.12.1 * https://github.com/cilium/cilium/releases/tag/v1.12.1	2022-08-17 08:53:56 -07:00
Dalton Hubble	6facfca4ed	Switch Kubernetes image registry from k8s.gcr.io to registry.k8s.io * Announce: https://groups.google.com/g/kubernetes-sig-testing/c/U7b_im9vRrM Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/319	2022-08-13 16:16:21 -07:00
Dalton Hubble	ed8c6a5aeb	Upgrade CoreDNS from v1.8.5 to v1.9.3 Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/318	2022-08-13 15:43:03 -07:00
Dalton Hubble	e5d0e2d48b	Rename Fedora CoreOS fcc directory to butane * Align both Fedora CoreOS and Flatcar Linux keeping Butane Configs in a directory called butane	2022-08-10 09:10:18 -07:00
Dalton Hubble	93b7f2554e	Remove ineffective iptables-legacy.stamp * Typhoon Fedora CoreOS is already using iptables nf_tables since F36. The file to pin to legacy iptables was renamed to /etc/coreos/iptables-legacy.stamp	2022-08-08 20:27:21 -07:00
Dalton Hubble	62d47ad3f0	Update Cilium from v1.11.7 to v1.12.0 * https://github.com/cilium/cilium/releases/tag/v1.12.0	2022-08-08 19:59:03 -07:00
Dalton Hubble	256b87812e	Remove Terraform template provider dependency * Use Terraform builtin templatefile functionality * Remove dependency on deprecated Terraform template provider Rel: * https://registry.terraform.io/providers/hashicorp/template/2.2.0 * https://github.com/poseidon/terraform-render-bootstrap/pull/293	2022-08-02 18:15:03 -07:00
Dalton Hubble	c6794f1007	Update Calico from v3.23.1 to v3.23.3 * https://github.com/projectcalico/calico/releases/tag/v3.23.3	2022-07-30 18:15:33 -07:00
Dalton Hubble	f42b45451b	Update Cilium from v1.11.6 to v1.11.7 * https://github.com/cilium/cilium/releases/tag/v1.11.7	2022-07-19 09:06:15 -07:00
Dalton Hubble	0db5f86110	Update Kubernetes from v1.24.2 to v1.24.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1243	2022-07-13 20:59:15 -07:00
Dalton Hubble	8398182956	Update Cilium and Calico CNI providers * Update Cilium from v1.11.5 to v1.11.6 * Update Calico from v3.22.2 to v3.23.1	2022-06-18 19:29:01 -07:00
Dalton Hubble	6d6b48b201	Update Kubernetes from v1.24.1 to v1.24.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1242	2022-06-18 18:35:42 -07:00
Dalton Hubble	b8549a1e32	Update Cilium from v1.11.4 to v1.11.5 * https://github.com/poseidon/terraform-render-bootstrap/pull/309	2022-05-31 15:23:07 +01:00
Dalton Hubble	c5573199db	Update Kubernetes from v1.24.0 to v1.24.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1241	2022-05-28 09:39:14 +01:00
Dalton Hubble	b0e0b132e4	Update Kubernetes from v1.23.6 to v1.24.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1240	2022-05-04 08:27:14 -07:00
Dalton Hubble	91b38bf3fd	Update etcd from v3.5.2 to v3.5.4 * https://github.com/etcd-io/etcd/releases/tag/v3.5.4	2022-04-27 20:57:02 -07:00
James Harmison	9a4887d028	Add bind mounts for selinux to fcos kubelets fixes #1123 Enables the use of CSI drivers with a StorageClass that lacks an explicit context mount option. In cases where the kubelet lacks mounts for `/etc/selinux` and `/sys/fs/selinux`, it is unable to set the `:Z` option for the CRI volume definition automatically. See [KEP 1710](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1710-selinux-relabeling/README.md#volume-mounting) for more information on how SELinux is passed to the CRI by Kubelet. Prior to this change, a not-explicitly-labelled mount would have an `unlabeled_t` SELinux type on the host. Following this change, the Kubelet and CRI work together to dynamically relabel mounts that lack an explicit context specification every time it is rebound to a pod with SELinux type `container_file_t` and appropriate context labels to match the specifics for the pod it is bound to. This enables applications running in containers to consume dynamically provisioned storage on SELinux enforcing systems without explicitly setting the context on the StorageClass or PersistentVolume.	2022-04-26 21:33:26 -07:00
Dalton Hubble	d7f55c4e46	Remove use of deprecated `key_algorithm` field in TLS assets * Fixes warning about use of deprecated field `key_algorithm` in the `hashicorp/tls` provider. The key algorithm can now be inferred directly from the private key so resources don't have to output and pass around the algorithm	2022-04-20 19:52:03 -07:00
Dalton Hubble	80c6e2e7e6	Update Kubernetes from v1.23.5 to v1.23.6 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1236	2022-04-20 19:39:05 -07:00
Dalton Hubble	2f7d2a92e0	Update Cilium and Calico CNI providers * Update Cilium from v1.11.3 to v1.11.4 * Update Calico from v3.22.1 to v3.22.2	2022-04-19 08:28:52 -07:00
Dalton Hubble	2df1873b7f	Update Cilium from v1.11.2 to v1.11.3 * https://github.com/cilium/cilium/releases/tag/v1.11.3	2022-04-01 16:44:30 -07:00
Dalton Hubble	93ebfc7dd0	Allow upgrading Azure Terraform Provider to v3.x * Change subnet references to source and destinations prefixes (plural) * Remove references to a resource group in some load balancing components, which no longer require it (inferred) * Rename `worker_address_prefix` output to `worker_address_prefixes`	2022-04-01 16:36:53 -07:00
Dalton Hubble	5365ce8204	Mount /etc/machine-id from host into Kubelet * Kubelet node's System UUID can be detected from the sysfs filesystem without a host mount, but if you need to distinguish between the host's machine-id and SystemUUID * On cloud platforms, MachineID and SystemUUID are identical, but on bare-metal the two differ	2022-04-01 16:32:06 -07:00
Dalton Hubble	e61d4b92da	Update Kubernetes from v1.23.4 to v1.23.5 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1235	2022-03-16 21:01:41 -07:00
Dalton Hubble	69770b4827	Update Calico from v3.21.2 to v3.22.1 * https://github.com/projectcalico/calico/releases/tag/v3.22.1 * Fix https://github.com/projectcalico/calico/issues/5011	2022-03-11 11:22:29 -08:00
Dalton Hubble	f797f97675	Update Cilium from v1.11.1 to v1.11.2 * https://github.com/cilium/cilium/releases/tag/v1.11.2	2022-03-11 10:08:24 -08:00
Dalton Hubble	fc38ba45b1	Update Kubernetes from v1.23.3 to v1.23.4 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.23.md#v1234	2022-02-17 09:00:31 -08:00
Dalton Hubble	6c70d06937	Update etcd from v3.5.1 to v3.5.2 * https://github.com/etcd-io/etcd/releases/tag/v3.5.2	2022-02-07 08:10:17 -08:00
Dalton Hubble	cf4beeba34	Change default CNI provider from Calico to Cilium * Cilium (v1.8) was added to Typhoon in v1.18.5 in June 2020 and its become more impressive since then. Its currently the leading CNI provider choice. * Calico has grown complex, has lots of CRDs, masks its management complexity with an operator (which we won't use), doesn't provide multi-arch images, and hasn't been compatible with Kubernetes v1.23 (with ipvs) for several releases. * Both have CNCF conformance quirks (flannel used for conformance), but that's not the main factor in choosing the default	2022-02-07 08:07:00 -08:00

1 2 3 4

196 Commits