typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-03 14:24:37 +02:00

Author	SHA1	Message	Date
Bill ONeill	4ef1908299	Fix: extra kernel_args added to bare-metal workers	2023-04-28 08:07:54 -07:00
Bill ONeill	2272472d59	Omit -o flag to flatcar-install unless oem_type is defined	2023-04-25 19:02:30 -07:00
Dalton Hubble	fc444d25f8	Update poseidon/ct provider and Butane Config version * Update Fedora CoreOS Butane configs from v1.4.0 to v1.5.0 * Require Fedora CoreOS Butane snippets update to v1.1.0 * Require poseidon/ct Terraform provider v0.13 or newer * Use Ignition v3.4.0 spec for all node provisioning	2023-04-21 08:58:20 -07:00
Dalton Hubble	5feb4c63f7	Update Cilium from v1.13.1 to v1.13.2 * https://github.com/cilium/cilium/releases/tag/v1.13.2	2023-04-20 08:44:31 -07:00
Dalton Hubble	501e6d25e0	Update Kubernetes from v1.27.0 to v1.27.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1271	2023-04-15 23:16:51 -07:00
Dalton Hubble	1e76e1a200	Update etcd from v3.5.7 to v3.5.8 * https://github.com/etcd-io/etcd/releases/tag/v3.5.8	2023-04-15 22:54:31 -07:00
Dalton Hubble	4322857bec	Update Kubernetes from v1.26.3 to v1.27.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1270	2023-04-15 22:49:12 -07:00
Lucas Resch	6bd2a1a528	Expose flatcar-install OEM parameter By exposing this parameter it is possible to install OEM specific software during the `flatcar-install` invocation.	2023-04-01 09:38:29 -07:00
Dalton Hubble	5f303212d2	Update Cilium to use an init container to install CNI plugins * https://github.com/poseidon/terraform-render-bootstrap/pull/348	2023-03-29 10:35:21 -07:00
Dalton Hubble	3670ec7ed7	Update Kubernetes from v1.26.2 to v1.26.3 * Update Cilium from v1.13.0 to v1.13.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1263	2023-03-21 18:18:19 -07:00
Dalton Hubble	2b3cd451d2	Update Cilium from v1.12.6 to v1.13.0 * https://github.com/cilium/cilium/releases/tag/v1.13.0	2023-03-14 11:16:14 -07:00
Dalton Hubble	76ebc08fd2	Update Kubernetes from v1.26.1 to v1.26.2 * https://github.com/poseidon/terraform-render-bootstrap/pull/345	2023-03-01 17:13:16 -08:00
Dalton Hubble	86e8484e0a	Change bare-metal workers variable to optional * To accompany the restructure of the bare-metal modules to allow discrete workers to be defined and attached to a cluster (#1295), the `workers` variable (older way, used for defining homogeneous workers inline) should be optional and default to an empty list * Add docs covering inline vs discrete metal workers Fix #1301	2023-03-01 14:37:47 -08:00
Dalton Hubble	f3c327007d	Update flannel from v0.20.2 to v0.21.1 * https://github.com/flannel-io/flannel/releases/tag/v0.21.1	2023-02-09 09:56:25 -08:00
Dalton Hubble	406fb444f0	Update Cilium from v1.12.5 to v1.12.6 * https://github.com/cilium/cilium/releases/tag/v1.12.6	2023-02-09 09:45:40 -08:00
Dalton Hubble	1caea3388c	Restructure bare-metal module to use a worker submodule * Add an internal `worker` module to the bare-metal module, to allow individual bare-metal machines to be defined and joined to an existing bare-metal cluster. This is similar to the "worker pools" modules for adding sets of nodes to cloud (AWS, GCP, Azure) clusters, but on metal, each piece of hardware is potentially unique New: Using the new `worker` module, a Kubernetes cluster can be defined without any `workers` (i.e. just a control-plane). Use the `worker` module to define each piece machine that should join the bare-metal cluster and customize it in detail. This style is quite flexible and suited for clusters with hardware that varies quite a bit. ```tf module "mercury" { source = "git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes?ref=v1.26.2" # bare-metal cluster_name = "mercury" matchbox_http_endpoint = "http://matchbox.example.com" os_channel = "flatcar-stable" os_version = "2345.3.1" # configuration k8s_domain_name = "node1.example.com" ssh_authorized_key = "ssh-rsa AAAAB3Nz..." # machines controllers = [{ name = "node1" mac = "52:54:00:a1:9c:ae" domain = "node1.example.com" }] } ``` ```tf module "mercury-node1" { source = "git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes/worker?ref=v1.26.2" cluster_name = "mercury" # bare-metal matchbox_http_endpoint = "http://matchbox.example.com" os_channel = "flatcar-stable" os_version = "2345.3.1" # configuration name = "node2" mac = "52:54:00:b2:2f:86" domain = "node2.example.com" kubeconfig = module.mercury.kubeconfig ssh_authorized_key = "ssh-rsa AAAAB3Nz..." # optional snippets = [] node_labels = [] node_tains = [] install_disk = "/dev/vda" cached_install = false } ``` For clusters with fairly similar hardware, you may continue to define `workers` directly within the cluster definition. This reduces some repetition, but is not quite as flexible. ```tf module "mercury" { source = "git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes?ref=v1.26.1" # bare-metal cluster_name = "mercury" matchbox_http_endpoint = "http://matchbox.example.com" os_channel = "flatcar-stable" os_version = "2345.3.1" # configuration k8s_domain_name = "node1.example.com" ssh_authorized_key = "ssh-rsa AAAAB3Nz..." # machines controllers = [{ name = "node1" mac = "52:54:00:a1:9c:ae" domain = "node1.example.com" }] workers = [ { name = "node2", mac = "52:54:00:b2:2f:86" domain = "node2.example.com" }, { name = "node3", mac = "52:54:00:c3:61:77" domain = "node3.example.com" } ] } ``` Optional variables `snippets`, `worker_node_labels`, and `worker_node_taints` are still defined as a map from machine name to a list of snippets, labels, or taints respectively to allow some degree of per-machine customization. However, fields like `install_disk`, `kernel_args`, `cached_install` and future options will not be designed this way. Instead, if your machines vary it is recommended to use the new `worker` module to define each node	2023-02-09 08:29:28 -08:00
Dalton Hubble	a205922d06	Update Calico from v3.24.5 to v3.25.0 * https://github.com/poseidon/terraform-render-bootstrap/pull/342	2023-01-24 08:29:08 -08:00
Dalton Hubble	b5ba65d4c2	Update etcd from v3.5.6 to v3.5.7 * https://github.com/etcd-io/etcd/releases/tag/v3.5.7	2023-01-24 08:29:08 -08:00
Dalton Hubble	f2bf5ac3fb	Update Kubernetes from v1.26.0 to v1.26.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1261	2023-01-19 08:27:56 -08:00
Dalton Hubble	0afe9d65ed	Update Cilium from v1.12.4 to v1.12.5 * https://github.com/cilium/cilium/releases/tag/v1.12.5	2022-12-21 08:13:35 -08:00
Dalton Hubble	d6cbcf9f96	Update Kubernetes from v1.26.0-rc.1 to v1.26.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1260	2022-12-08 08:47:24 -08:00
Dalton Hubble	0dc8740c77	Update Kubernetes from v1.26.0-rc.0 to v1.26.0-rc.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1260-rc1	2022-12-05 09:31:45 -08:00
Dalton Hubble	a9b12b6bca	Update Kubernetes from v1.25.4 to v1.26.0-rc.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1260-rc0	2022-11-30 08:47:40 -08:00
Dalton Hubble	a8990b3045	Fix flannel container image registry location * https://github.com/poseidon/terraform-render-bootstrap/pull/336	2022-11-23 16:18:30 -08:00
Dalton Hubble	b4857c123e	Update flannel from v0.15.1 to v0.20.1 * https://github.com/flannel-io/flannel/releases/tag/v0.20.1	2022-11-23 11:03:29 -08:00
Dalton Hubble	a193762eed	Update etcd from v3.5.5 to v3.5.6 * https://github.com/etcd-io/etcd/releases/tag/v3.5.6	2022-11-23 10:59:17 -08:00
Dalton Hubble	adf33df99b	Update Cilium from v1.12.3 to v1.12.4 * https://github.com/cilium/cilium/releases/tag/v1.12.4	2022-11-23 10:58:27 -08:00
Dalton Hubble	26dbc7e91d	Update Kubernetes from v1.25.3 to v1.25.4 * Update Calico from v3.24.3 to v3.24.5 * Update Prometheus and Grafana addons	2022-11-10 09:42:21 -08:00
Dalton Hubble	937acc4b5a	Re-enable Graceful Node Shutdown feature * Kubelet GracefulNodeShutdown works, but only partially handles gracefully stopping the Kubelet. The most noticeable drawback is that Completed Pods are left around * Use a project like poseidon/scuttle or a similar systemd unit as a snippet to add drain and/or delete behaviors if desired * This reverts commit `1786e34f33`. Rel: * https://www.psdn.io/posts/kubelet-graceful-shutdown/ * https://github.com/poseidon/scuttle	2022-11-02 20:49:01 -07:00
Dalton Hubble	9b733d79c7	Update Calico v3.24.2 to v3.24.3 * https://github.com/projectcalico/calico/releases/tag/v3.24.3 * Add patch to allow Kubelet kubeconfig to drain nodes if desired in addition to just deleting them in shutdown integrations. See https://github.com/poseidon/terraform-render-bootstrap/pull/330	2022-10-23 22:00:15 -07:00
Dalton Hubble	35a9e22b1f	Update Calico from v3.24.1 to v3.24.2 * https://github.com/projectcalico/calico/releases/tag/v3.24.2	2022-10-20 09:28:19 -07:00
Dalton Hubble	a535581ef2	Remove unused Wants=network.target from etcd-member * network.target is a passive unit that's not actually pulled in by units requiring or wanting it, its only used for shutdown ordering > "Services using the network should ... avoid any Wants=network.target or even Requires=network.target" Rel: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/	2022-10-20 08:32:55 -07:00
Dalton Hubble	3ff2d38fa5	Update Cilium from v1.12.2 to v1.12.3 * https://github.com/cilium/cilium/releases/tag/v1.12.3	2022-10-17 17:25:23 -07:00
Dalton Hubble	651151805d	Update Kubernetes v1.25.2 to v1.25.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1253	2022-10-13 21:02:39 -07:00
Dalton Hubble	3ee462a24c	Update Kubernetes from v1.25.1 to v1.25.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1252	2022-09-22 08:15:30 -07:00
Dalton Hubble	90782ea820	Remove workaround for preventing search . propagation * Kubelet v1.25.1 has the fix https://github.com/kubernetes/kubernetes/pull/112157	2022-09-19 22:37:02 -07:00
Dalton Hubble	74d4d56dbd	Remove workaround for v1.25.0 ConfigMap rendering issue * LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to alpha in v1.25.1, so we don't need to explicitly disable it anymore Rel: https://github.com/kubernetes/kubernetes/issues/112081	2022-09-19 09:10:24 -07:00
Dalton Hubble	5abe84b520	Update etcd from v3.5.4 to v3.5.5 * https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-3.5.md#v355	2022-09-15 09:01:45 -07:00
Dalton Hubble	951209d113	Update Cilium from v1.12.1 to v1.12.2 * https://github.com/cilium/cilium/releases/tag/v1.12.2	2022-09-15 08:28:37 -07:00
Dalton Hubble	09751cc0e8	Update Kubernetes from v1.25.0 to v1.25.1 * https://github.com/kubernetes/kubernetes/releases/tag/v1.25.1	2022-09-15 08:23:22 -07:00
Dalton Hubble	c14300f0be	Update Calico from v3.23.3 to v3.24.1 * https://github.com/projectcalico/calico/releases/tag/v3.24.1	2022-09-14 08:09:38 -07:00
Dalton Hubble	1786e34f33	Revert Graceful Node Shutdown feature * Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in Kubernetes v1.25.0 https://github.com/poseidon/typhoon/pull/1222) * Graceful node shutdown shutdown allows 30s for critical pods to shutdown and 15s for regular pods to shutdown before releasing the inhibitor lock to allow the host to shutdown * Unfortunately, both pods and the node are shutdown at the same time at the end of the 45s period without further configuration options. As a result, regular pods and the node are shutdown at the same time. In practice, enabling this feature leaves Error or Completed pods in kube-apiserver state until manually cleaned up. This feature is not ready for general use * Fix issue where Error/Completed pods are accumulating whenever any node restarts (or auto-updates), visible in kubectl get pods * This issue wasn't apparent in initial testing and seems to only affect non-critical pods (due to critical pods being killed earlier) But its very apparent on our real clusters Rel: https://github.com/kubernetes/kubernetes/issues/110755	2022-09-10 14:58:44 -07:00
Dalton Hubble	4ad473cd3c	Add workaround patch to strip "search ." from resolv.conf * systemd adds "search ." to hosts /run/systemd/resolve/resolv.conf on hosts with a fqdn hostname * Kubelet v1.25 began propagating "search ." from the host node into containers' `/etc/resolv.conf` * musl-based DNS resolvers don't behave correctly when `search .` is used in their `/etc/resolv.conf`. This breaks Alpine images * Adapt the same workaround used by Openshift to strip the "search ." * This only applies to bare-metal Typhoon nodes (where hostnames are set to fqdn's), nodes on cloud platforms aren't affected in the Typhoon configuration Kubernetes tracking issue: https://github.com/kubernetes/kubernetes/issues/112135 Rel: * https://github.com/systemd/systemd/pull/17201 * https://github.com/kubernetes/kubernetes/pull/109441 * https://github.com/coreos/fedora-coreos-tracker/issues/1287 * https://github.com/openshift/okd-machine-os/pull/159	2022-08-31 08:05:45 -07:00
Dalton Hubble	393a38deff	Configure Graceful Node Shutdown and lengthen max inhibitor delay * Configure Kubelet Graceful Node Shutdown to detect system shutdown events and stop running containers gracefully when possible * Allow up to 30s for critical pods to gracefully shutdown * Allow up to 15s for regular pods to gracefully shutdown * Node will be marked as NotReady promptly, instead of having to wait for health checks * Kubelet uses systemd inhibitor locks to delay shutdown for a limited number of seconds * Raise the default max inhibitor time from 5s to 45s Verify systemd inhibitor locks are present: ``` sudo systemd-inhibit --list WHO UID USER PID COMM WHAT WHY MODE kubelet 0 root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay ``` Tail journal logs and then shutdown a node via systemctl reboot or via the cloud console to watch container shutdown Rel: * https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/ * https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ * https://github.com/kubernetes/kubernetes/issues/107043 * https://github.com/coreos/fedora-coreos-tracker/issues/821 * https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html * https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go * https://github.com/godbus/dbus/blob/master/conn.go	2022-08-28 10:37:33 -07:00
Dalton Hubble	76d92e9c2d	Change podman log-driver from journald to k8s-file * When podman runs the Kubelet container, logging to journald means log lines are duplicated in the journal. journalctl -u kubelet shows Kubelet's logs and the same log messages from podman. Using the k8s-file driver alleviates this problem * Fix Kubelet and etcd-member logs to be more readable and reduce unneccessary Kubelet log volume	2022-08-27 17:15:22 -07:00
Dalton Hubble	275fc0f9e8	Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature * Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in Pods, as shown by conformance tests * Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back to alpha so its not enabled by default, but that will require a release * Disable the feature gate directly as a workaround for now to make Kubernetes v1.25.0 usable ``` FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478 ``` Rel: * https://github.com/kubernetes/kubernetes/pull/112076 * https://github.com/kubernetes/kubernetes/pull/107329	2022-08-27 09:49:35 -07:00
Dalton Hubble	3fb59a3289	Migrate most Kubelet flags to KubeletConfiguration file * Add a KubeletConfiguration file to replace most Kubelet flags, to prepare for upcoming changes * Pass Kubelet the --config flag to specify the location of the KubeletConfiguration * Remove flsgs / configuration where it matches the defaults * Remove --cgroups-per-qos, defaults to true * Remove --container-runtime, defaults to remote * Remove enforce-node-allocatable=pods, defaults to pods Rel: * https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ * https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/	2022-08-27 09:28:15 -07:00
Dalton Hubble	a31dbceac6	Update Kubernetes from v1.24.4 to v1.25.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md	2022-08-25 09:18:14 -07:00
Dalton Hubble	760b4cd5ee	Update Kubernetes from v1.24.3 to v1.24.4 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244	2022-08-17 20:09:30 -07:00
Dalton Hubble	fcd8ff2b17	Update Cilium from v1.12.0 to v1.12.1 * https://github.com/cilium/cilium/releases/tag/v1.12.1	2022-08-17 08:53:56 -07:00

1 2 3 4 5 ...

595 Commits