* When podman runs the Kubelet container, logging to journald means
log lines are duplicated in the journal. journalctl -u kubelet shows
Kubelet's logs and the same log messages from podman. Using the
k8s-file driver alleviates this problem
* Fix Kubelet and etcd-member logs to be more readable and reduce
unneccessary Kubelet log volume
* Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring
feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in
Pods, as shown by conformance tests
* Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back
to alpha so its not enabled by default, but that will require a release
* Disable the feature gate directly as a workaround for now to make
Kubernetes v1.25.0 usable
```
FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478
```
Rel:
* https://github.com/kubernetes/kubernetes/pull/112076
* https://github.com/kubernetes/kubernetes/pull/107329
* Change the workers managed instance group to health check nodes
via HTTP probe of the kube-proxy port 10256 /healthz endpoints
* Advantages: kube-proxy is a lower value target (in case there
were bugs in firewalls) that Kubelet, its more representative than
health checking Kubelet (Kubelet must run AND kube-proxy Daemonset
must be healthy), and its already used by kube-proxy liveness probes
(better discoverability via kubectl or alerts on pods crashlooping)
* Another motivator is that GKE clusters also use kube-proxy port
10256 checks to assess node health
* Changes to worker launch configurations start an autoscaling group instance
refresh to replace instances
* Instance refresh creates surge instances, waits for a warm-up period, then
deletes old instances
* Changing worker_type, disk_*, worker_price, worker_target_groups, or Butane
worker_snippets on existing worker nodes will replace instances
* New AMIs or changing `os_stream` will be ignored, to allow Fedora CoreOS or
Flatcar Linux to keep themselves updated
* Previously, new launch configurations were made in the same way, but not
applied to instances unless manually replaced
* When a worker managed instance group's (MIG) instance template
changes (including machine type, disk size, or Butane snippets
but excluding new AMIs), use Google Cloud's rolling update features
to ensure instances match declared state
* Ignore new AMIs since Fedora CoreOS and Flatcar Linux nodes
already auto-update and reboot themselves
* Rolling updates will create surge instances, wait for health
checks, then delete old instances (0 unavilable instances)
* Instances are replaced to ensure new Ignition/Butane snippets
are respected
* Add managed instance group autohealing (i.e. health checks) to
ensure new instances' Kubelet is running
Renames
* Name apiserver and kubelet health checks consistently
* Rename MIG from `${var.name}-worker-group` to `${var.name}-worker`
Rel: https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups
* Rename launch configuration to use a name_prefix named after the
cluster and worker to improve identifiability
* Shorten AWS autoscaling group name to not include the launch config
id. Years ago this used to be needed to update the ASG but the AWS
provider detects changes to the launch configuration just fine
* Typhoon Fedora CoreOS is already using iptables nf_tables since
F36. The file to pin to legacy iptables was renamed to
/etc/coreos/iptables-legacy.stamp
* Add toleration to run node-exporter on controller nodes
* Add HostToContainer mount propagation and security context group
settings from upstream
* Fix SELinux denied accessing /host/proc/1/mounts. The mounts file
is has an SELinux type attribute init_t, but that won't allow running
the node-exporter binary so we have to use spc_t. This should be more
targeted at just the SELinux issue than making the Pod privileged
* Remove excluded mount points and filesystem types, the defaults are
https://github.com/prometheus/node_exporter/blob/v1.3.1/collector/filesystem_linux.go#L35
```
caller=collector.go:169 level=error msg="collector failed" name=filesystem duration_seconds=0.000666766 err="open /host/proc/1/mounts: permission denied"
```
```
[ 3664.880899] audit: type=1400 audit(1659639161.568:4400): avc: denied { search } for pid=28325 comm="node_exporter" name="1" dev="proc" ino=22542 scontext=system_u:system_r:container_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```
* Typhoon now consistently uses Butane Configs for snippets
(variant `fcos` or `flatcar`). Previously snippets were either
Butane Configs (on FCOS) or Container Linux Configs (on Flatcar)
* Update docs on uploading Flatcar Linux DigitalOcean images
* Update docs on uploading Fedora CoreOS Azure images
* Requires poseidon v0.11+ and Flatcar Linux 3185.0.0+ (action required)
* Previously, Flatcar Linux configs have been parsed as Container
Linux Configs to Ignition v2.2.0 specs by poseidon/ct
* Flatcar Linux starting in 3185.0.0 now supports Ignition v3.x specs
(which are rendered from Butane Configs, like Fedora CoreOS)
* poseidon/ct v0.11.0 adds support for the flatcar Butane Config
variant so that Flatcar Linux can use Ignition v3.x
Rel:
* [Flatcar Support](https://flatcar-linux.org/docs/latest/provisioning/ignition/specification/#ignition-v3)
* [poseidon/ct support](https://github.com/poseidon/terraform-provider-ct/pull/131)
* Google Cloud Terraform provider resource google_dns_record_set's
name field provides the full domain name with a trailing ".". This
isn't a new behavior, Google has behaved this way as long as I can
remember
* etcd domain names are passed to the bootstrap module to generate
TLS certificates. What seems to be new(ish?) is that etcd peers
see example.foo and example.foo. as different domains during TLS
SANs validation. As a result, clusters with multiple controller
nodes fail to run etcd-member, which manifests as cluster provisioning
hanging. Single controller/master clusters (default) are unaffected
* Fix etcd-member.service error in multi-controller clusters:
```
"error":"x509: certificate is valid for conformance-etcd0.redacted.,
conform-etcd1.redacted., conform-etcd2.redacted., not conform-etcd1.redacted"}
```
* Add initrd=main kernel argument for UEFI
* Switch to using the coreos.live.rootfs_url kernel argument
instead of passing the rootfs as an appended initrd
* Remove coreos.inst.image_url kernel argument since coreos-installer
now defaults to installing from the embedded live system
* Remove rd.neednet=1 and dhcp=ip kernel args that aren't needed
* Remove serial console kernel args by default (these can be
added via var.kernel_args if needed)
Rel:
* https://github.com/poseidon/matchbox/pull/972 (thank you @bgilbert)
* https://github.com/poseidon/matchbox/pull/978