typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-06 07:34:37 +02:00

Author	SHA1	Message	Date
Dalton Hubble	393a38deff	Configure Graceful Node Shutdown and lengthen max inhibitor delay * Configure Kubelet Graceful Node Shutdown to detect system shutdown events and stop running containers gracefully when possible * Allow up to 30s for critical pods to gracefully shutdown * Allow up to 15s for regular pods to gracefully shutdown * Node will be marked as NotReady promptly, instead of having to wait for health checks * Kubelet uses systemd inhibitor locks to delay shutdown for a limited number of seconds * Raise the default max inhibitor time from 5s to 45s Verify systemd inhibitor locks are present: ``` sudo systemd-inhibit --list WHO UID USER PID COMM WHAT WHY MODE kubelet 0 root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay ``` Tail journal logs and then shutdown a node via systemctl reboot or via the cloud console to watch container shutdown Rel: * https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/ * https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ * https://github.com/kubernetes/kubernetes/issues/107043 * https://github.com/coreos/fedora-coreos-tracker/issues/821 * https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html * https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go * https://github.com/godbus/dbus/blob/master/conn.go	2022-08-28 10:37:33 -07:00
Dalton Hubble	76d92e9c2d	Change podman log-driver from journald to k8s-file * When podman runs the Kubelet container, logging to journald means log lines are duplicated in the journal. journalctl -u kubelet shows Kubelet's logs and the same log messages from podman. Using the k8s-file driver alleviates this problem * Fix Kubelet and etcd-member logs to be more readable and reduce unneccessary Kubelet log volume	2022-08-27 17:15:22 -07:00
Dalton Hubble	275fc0f9e8	Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature * Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in Pods, as shown by conformance tests * Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back to alpha so its not enabled by default, but that will require a release * Disable the feature gate directly as a workaround for now to make Kubernetes v1.25.0 usable ``` FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478 ``` Rel: * https://github.com/kubernetes/kubernetes/pull/112076 * https://github.com/kubernetes/kubernetes/pull/107329	2022-08-27 09:49:35 -07:00
Dalton Hubble	3fb59a3289	Migrate most Kubelet flags to KubeletConfiguration file * Add a KubeletConfiguration file to replace most Kubelet flags, to prepare for upcoming changes * Pass Kubelet the --config flag to specify the location of the KubeletConfiguration * Remove flsgs / configuration where it matches the defaults * Remove --cgroups-per-qos, defaults to true * Remove --container-runtime, defaults to remote * Remove enforce-node-allocatable=pods, defaults to pods Rel: * https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ * https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/	2022-08-27 09:28:15 -07:00
Dalton Hubble	a31dbceac6	Update Kubernetes from v1.24.4 to v1.25.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md	2022-08-25 09:18:14 -07:00
dependabot[bot]	1dcf56127b	Bump mkdocs-material from 8.4.0 to 8.4.1 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.4.0 to 8.4.1. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.4.0...8.4.1) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-08-23 08:53:12 -07:00
Dalton Hubble	bf06412dfd	Update Prometheus and Grafana addons v1.24.4	2022-08-21 08:56:00 -07:00
Dalton Hubble	505818b7d5	Update docs showing the terraform plan resources count * Although I don't plan to keep these in sync, some users are confused when the docs don't match the actual resource count	2022-08-21 08:52:35 -07:00
Dalton Hubble	0d27811265	Update recommended Terraform provider versions	2022-08-18 09:08:55 -07:00
Dalton Hubble	c13d060b38	Add docs for GCP MIG update and AWS instance refresh * Document that worker instances are rolling replaced when changes to their configuration are applied	2022-08-18 09:02:38 -07:00
Dalton Hubble	e87d5aabc3	Adjust Google Cloud worker health checks to use kube-proxy healthz * Change the workers managed instance group to health check nodes via HTTP probe of the kube-proxy port 10256 /healthz endpoints * Advantages: kube-proxy is a lower value target (in case there were bugs in firewalls) that Kubelet, its more representative than health checking Kubelet (Kubelet must run AND kube-proxy Daemonset must be healthy), and its already used by kube-proxy liveness probes (better discoverability via kubectl or alerts on pods crashlooping) * Another motivator is that GKE clusters also use kube-proxy port 10256 checks to assess node health	2022-08-17 20:50:52 -07:00
Dalton Hubble	760b4cd5ee	Update Kubernetes from v1.24.3 to v1.24.4 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244	2022-08-17 20:09:30 -07:00
Dalton Hubble	fcd8ff2b17	Update Cilium from v1.12.0 to v1.12.1 * https://github.com/cilium/cilium/releases/tag/v1.12.1	2022-08-17 08:53:56 -07:00
dependabot[bot]	ef2d2af0c7	Bump mkdocs-material from 8.3.9 to 8.4.0 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.3.9 to 8.4.0. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.3.9...8.4.0) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-08-16 08:29:51 -07:00
dependabot[bot]	8e2027ed2d	Bump pygments from 2.12.0 to 2.13.0 Bumps [pygments](https://github.com/pygments/pygments) from 2.12.0 to 2.13.0. - [Release notes](https://github.com/pygments/pygments/releases) - [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES) - [Commits](https://github.com/pygments/pygments/compare/2.12.0...2.13.0) --- updated-dependencies: - dependency-name: pygments dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-08-16 08:26:45 -07:00
Dalton Hubble	52427a4271	Refresh instances in autoscaling group when launch configuration changes * Changes to worker launch configurations start an autoscaling group instance refresh to replace instances * Instance refresh creates surge instances, waits for a warm-up period, then deletes old instances * Changing worker_type, disk_, worker_price, worker_target_groups, or Butane worker_snippets on existing worker nodes will replace instances New AMIs or changing `os_stream` will be ignored, to allow Fedora CoreOS or Flatcar Linux to keep themselves updated * Previously, new launch configurations were made in the same way, but not applied to instances unless manually replaced	2022-08-14 21:43:49 -07:00
Dalton Hubble	20b76d6e00	Roll instance template changes to worker managed instance groups * When a worker managed instance group's (MIG) instance template changes (including machine type, disk size, or Butane snippets but excluding new AMIs), use Google Cloud's rolling update features to ensure instances match declared state * Ignore new AMIs since Fedora CoreOS and Flatcar Linux nodes already auto-update and reboot themselves * Rolling updates will create surge instances, wait for health checks, then delete old instances (0 unavilable instances) * Instances are replaced to ensure new Ignition/Butane snippets are respected * Add managed instance group autohealing (i.e. health checks) to ensure new instances' Kubelet is running Renames * Name apiserver and kubelet health checks consistently * Rename MIG from `${var.name}-worker-group` to `${var.name}-worker` Rel: https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups	2022-08-14 13:06:53 -07:00
Dalton Hubble	6facfca4ed	Switch Kubernetes image registry from k8s.gcr.io to registry.k8s.io * Announce: https://groups.google.com/g/kubernetes-sig-testing/c/U7b_im9vRrM Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/319	2022-08-13 16:16:21 -07:00
Dalton Hubble	ed8c6a5aeb	Upgrade CoreDNS from v1.8.5 to v1.9.3 Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/318	2022-08-13 15:43:03 -07:00
Dalton Hubble	003af72cc8	Rename google-cloud/fedora-coreos/kubernetes/workers fcc to butane * Should have been part of https://github.com/poseidon/typhoon/pull/1203	2022-08-13 15:40:16 -07:00
Dalton Hubble	b321b90a4f	Update Grafana from v9.0.6 to v9.0.7	2022-08-13 15:39:44 -07:00
Dalton Hubble	e5d0e2d48b	Rename Fedora CoreOS fcc directory to butane * Align both Fedora CoreOS and Flatcar Linux keeping Butane Configs in a directory called butane	2022-08-10 09:10:18 -07:00
Dalton Hubble	679f8b878f	Update Grafana from v9.0.5 to v9.0.6	2022-08-10 08:23:04 -07:00
Dalton Hubble	87a8278c9d	Improve AWS autoscaling group and launch config names * Rename launch configuration to use a name_prefix named after the cluster and worker to improve identifiability * Shorten AWS autoscaling group name to not include the launch config id. Years ago this used to be needed to update the ASG but the AWS provider detects changes to the launch configuration just fine	2022-08-08 20:46:08 -07:00
Dalton Hubble	93b7f2554e	Remove ineffective iptables-legacy.stamp * Typhoon Fedora CoreOS is already using iptables nf_tables since F36. The file to pin to legacy iptables was renamed to /etc/coreos/iptables-legacy.stamp	2022-08-08 20:27:21 -07:00
Dalton Hubble	62d47ad3f0	Update Cilium from v1.11.7 to v1.12.0 * https://github.com/cilium/cilium/releases/tag/v1.12.0	2022-08-08 19:59:03 -07:00
Dalton Hubble	6eb7861f96	Update Grafana liveness and readiness probes * Use the liveness and readiness probes that Grafana recommends * Update Grafana from v9.0.3 to v9.0.5	2022-08-08 09:22:44 -07:00
Dalton Hubble	ffbacbccf7	Update node-exporter DaemonSet to fix permission denied * Add toleration to run node-exporter on controller nodes * Add HostToContainer mount propagation and security context group settings from upstream * Fix SELinux denied accessing /host/proc/1/mounts. The mounts file is has an SELinux type attribute init_t, but that won't allow running the node-exporter binary so we have to use spc_t. This should be more targeted at just the SELinux issue than making the Pod privileged * Remove excluded mount points and filesystem types, the defaults are https://github.com/prometheus/node_exporter/blob/v1.3.1/collector/filesystem_linux.go#L35 ``` caller=collector.go:169 level=error msg="collector failed" name=filesystem duration_seconds=0.000666766 err="open /host/proc/1/mounts: permission denied" ``` ``` [ 3664.880899] audit: type=1400 audit(1659639161.568:4400): avc: denied { search } for pid=28325 comm="node_exporter" name="1" dev="proc" ino=22542 scontext=system_u:system_r:container_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0 ```	2022-08-08 09:19:46 -07:00
Dalton Hubble	16c2785878	Update docs on using Butane snippets for customization * Typhoon now consistently uses Butane Configs for snippets (variant `fcos` or `flatcar`). Previously snippets were either Butane Configs (on FCOS) or Container Linux Configs (on Flatcar) * Update docs on uploading Flatcar Linux DigitalOcean images * Update docs on uploading Fedora CoreOS Azure images	2022-08-03 20:28:53 -07:00
Dalton Hubble	4a469513dd	Migrate Flatcar Linux from Ignition spec v2.3.0 to v3.3.0 * Requires poseidon v0.11+ and Flatcar Linux 3185.0.0+ (action required) * Previously, Flatcar Linux configs have been parsed as Container Linux Configs to Ignition v2.2.0 specs by poseidon/ct * Flatcar Linux starting in 3185.0.0 now supports Ignition v3.x specs (which are rendered from Butane Configs, like Fedora CoreOS) * poseidon/ct v0.11.0 adds support for the flatcar Butane Config variant so that Flatcar Linux can use Ignition v3.x Rel: * [Flatcar Support](https://flatcar-linux.org/docs/latest/provisioning/ignition/specification/#ignition-v3) * [poseidon/ct support](https://github.com/poseidon/terraform-provider-ct/pull/131)	2022-08-03 08:32:52 -07:00
Dalton Hubble	47d8431fe0	Fix bug provisioning multi-controller clusters on Google Cloud * Google Cloud Terraform provider resource google_dns_record_set's name field provides the full domain name with a trailing ".". This isn't a new behavior, Google has behaved this way as long as I can remember * etcd domain names are passed to the bootstrap module to generate TLS certificates. What seems to be new(ish?) is that etcd peers see example.foo and example.foo. as different domains during TLS SANs validation. As a result, clusters with multiple controller nodes fail to run etcd-member, which manifests as cluster provisioning hanging. Single controller/master clusters (default) are unaffected * Fix etcd-member.service error in multi-controller clusters: ``` "error":"x509: certificate is valid for conformance-etcd0.redacted., conform-etcd1.redacted., conform-etcd2.redacted., not conform-etcd1.redacted"} ```	2022-08-02 20:21:02 -07:00
Dalton Hubble	256b87812e	Remove Terraform template provider dependency * Use Terraform builtin templatefile functionality * Remove dependency on deprecated Terraform template provider Rel: * https://registry.terraform.io/providers/hashicorp/template/2.2.0 * https://github.com/poseidon/terraform-render-bootstrap/pull/293	2022-08-02 18:15:03 -07:00
Dalton Hubble	ca6eef365f	Add badges to README	2022-07-31 18:03:09 -07:00
Dalton Hubble	c6794f1007	Update Calico from v3.23.1 to v3.23.3 * https://github.com/projectcalico/calico/releases/tag/v3.23.3	2022-07-30 18:15:33 -07:00
Dalton Hubble	de6f27e119	Update FCOS iPXE initrd and kernel arg settings * Add initrd=main kernel argument for UEFI * Switch to using the coreos.live.rootfs_url kernel argument instead of passing the rootfs as an appended initrd * Remove coreos.inst.image_url kernel argument since coreos-installer now defaults to installing from the embedded live system * Remove rd.neednet=1 and dhcp=ip kernel args that aren't needed * Remove serial console kernel args by default (these can be added via var.kernel_args if needed) Rel: * https://github.com/poseidon/matchbox/pull/972 (thank you @bgilbert) * https://github.com/poseidon/matchbox/pull/978	2022-07-30 16:27:08 -07:00
Dalton Hubble	6a9c32d3a9	Migrate from internal hosting to GitHub pages * Add Twitter card customizations that have been kept in an internal fork * Add CNAME needed for GitHub pages	2022-07-27 21:56:42 -07:00
dependabot[bot]	a7e9e423f5	Bump mkdocs from 1.3.0 to 1.3.1 Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.3.0 to 1.3.1. - [Release notes](https://github.com/mkdocs/mkdocs/releases) - [Commits](https://github.com/mkdocs/mkdocs/compare/1.3.0...1.3.1) --- updated-dependencies: - dependency-name: mkdocs dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-21 09:07:21 -07:00
Dalton Hubble	83236eab57	Add table of details about static Pods * Also remote outdated mentions of rkt-fly	2022-07-21 09:03:27 -07:00
Dalton Hubble	7f445b0dba	Add release note about master to main branch rename * Update Terraform provider versions v1.24.3	2022-07-19 18:12:37 -07:00
Dalton Hubble	f42b45451b	Update Cilium from v1.11.6 to v1.11.7 * https://github.com/cilium/cilium/releases/tag/v1.11.7	2022-07-19 09:06:15 -07:00
Dalton Hubble	767a653baa	Update Prometheus, Grafana, and ingress-nginx addons * Update ingress-nginx RBAC Role to include coordination.k8s.io leases permissions that are required with ingress-nginx v1.3.0	2022-07-15 20:19:12 -07:00
Dalton Hubble	0db5f86110	Update Kubernetes from v1.24.2 to v1.24.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1243	2022-07-13 20:59:15 -07:00
dependabot[bot]	4908fdd247	Bump mkdocs-material from 8.3.8 to 8.3.9 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.3.8 to 8.3.9. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.3.8...8.3.9) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-05 17:54:48 -07:00
Dalton Hubble	42bf82b325	Update Prometheus and Grafana addons * Bump recommended Terraform provider versions	2022-07-02 11:28:34 -07:00
dependabot[bot]	61cbfc044d	Bump mkdocs-material from 8.3.6 to 8.3.8 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.3.6 to 8.3.8. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.3.6...8.3.8) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-06-29 08:11:42 -07:00
Dalton Hubble	07df0c2552	Add warning about Terraform AWS provider version * Sync Terraform provider versions with those used internally v1.24.2	2022-06-23 21:31:20 -07:00
dependabot[bot]	45d6ff2e38	Bump mkdocs-material from 8.3.4 to 8.3.6 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.3.4 to 8.3.6. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.3.4...8.3.6) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-06-20 11:46:24 -07:00
Dalton Hubble	8398182956	Update Cilium and Calico CNI providers * Update Cilium from v1.11.5 to v1.11.6 * Update Calico from v3.22.2 to v3.23.1	2022-06-18 19:29:01 -07:00
Dalton Hubble	6d6b48b201	Update Kubernetes from v1.24.1 to v1.24.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1242	2022-06-18 18:35:42 -07:00
Dalton Hubble	2a8915fee9	Update Prometheus, kube-state-metrics, and Grafana addons * Update monitoring addons	2022-06-18 18:32:17 -07:00

1 2 3 4 5 ...

1441 Commits