* When a worker managed instance group's (MIG) instance template
changes (including machine type, disk size, or Butane snippets
but excluding new AMIs), use Google Cloud's rolling update features
to ensure instances match declared state
* Ignore new AMIs since Fedora CoreOS and Flatcar Linux nodes
already auto-update and reboot themselves
* Rolling updates will create surge instances, wait for health
checks, then delete old instances (0 unavilable instances)
* Instances are replaced to ensure new Ignition/Butane snippets
are respected
* Add managed instance group autohealing (i.e. health checks) to
ensure new instances' Kubelet is running
Renames
* Name apiserver and kubelet health checks consistently
* Rename MIG from `${var.name}-worker-group` to `${var.name}-worker`
Rel: https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups
* Rename launch configuration to use a name_prefix named after the
cluster and worker to improve identifiability
* Shorten AWS autoscaling group name to not include the launch config
id. Years ago this used to be needed to update the ASG but the AWS
provider detects changes to the launch configuration just fine
* Typhoon Fedora CoreOS is already using iptables nf_tables since
F36. The file to pin to legacy iptables was renamed to
/etc/coreos/iptables-legacy.stamp
* Add toleration to run node-exporter on controller nodes
* Add HostToContainer mount propagation and security context group
settings from upstream
* Fix SELinux denied accessing /host/proc/1/mounts. The mounts file
is has an SELinux type attribute init_t, but that won't allow running
the node-exporter binary so we have to use spc_t. This should be more
targeted at just the SELinux issue than making the Pod privileged
* Remove excluded mount points and filesystem types, the defaults are
https://github.com/prometheus/node_exporter/blob/v1.3.1/collector/filesystem_linux.go#L35
```
caller=collector.go:169 level=error msg="collector failed" name=filesystem duration_seconds=0.000666766 err="open /host/proc/1/mounts: permission denied"
```
```
[ 3664.880899] audit: type=1400 audit(1659639161.568:4400): avc: denied { search } for pid=28325 comm="node_exporter" name="1" dev="proc" ino=22542 scontext=system_u:system_r:container_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```
* Typhoon now consistently uses Butane Configs for snippets
(variant `fcos` or `flatcar`). Previously snippets were either
Butane Configs (on FCOS) or Container Linux Configs (on Flatcar)
* Update docs on uploading Flatcar Linux DigitalOcean images
* Update docs on uploading Fedora CoreOS Azure images
* Requires poseidon v0.11+ and Flatcar Linux 3185.0.0+ (action required)
* Previously, Flatcar Linux configs have been parsed as Container
Linux Configs to Ignition v2.2.0 specs by poseidon/ct
* Flatcar Linux starting in 3185.0.0 now supports Ignition v3.x specs
(which are rendered from Butane Configs, like Fedora CoreOS)
* poseidon/ct v0.11.0 adds support for the flatcar Butane Config
variant so that Flatcar Linux can use Ignition v3.x
Rel:
* [Flatcar Support](https://flatcar-linux.org/docs/latest/provisioning/ignition/specification/#ignition-v3)
* [poseidon/ct support](https://github.com/poseidon/terraform-provider-ct/pull/131)
* Google Cloud Terraform provider resource google_dns_record_set's
name field provides the full domain name with a trailing ".". This
isn't a new behavior, Google has behaved this way as long as I can
remember
* etcd domain names are passed to the bootstrap module to generate
TLS certificates. What seems to be new(ish?) is that etcd peers
see example.foo and example.foo. as different domains during TLS
SANs validation. As a result, clusters with multiple controller
nodes fail to run etcd-member, which manifests as cluster provisioning
hanging. Single controller/master clusters (default) are unaffected
* Fix etcd-member.service error in multi-controller clusters:
```
"error":"x509: certificate is valid for conformance-etcd0.redacted.,
conform-etcd1.redacted., conform-etcd2.redacted., not conform-etcd1.redacted"}
```
* Add initrd=main kernel argument for UEFI
* Switch to using the coreos.live.rootfs_url kernel argument
instead of passing the rootfs as an appended initrd
* Remove coreos.inst.image_url kernel argument since coreos-installer
now defaults to installing from the embedded live system
* Remove rd.neednet=1 and dhcp=ip kernel args that aren't needed
* Remove serial console kernel args by default (these can be
added via var.kernel_args if needed)
Rel:
* https://github.com/poseidon/matchbox/pull/972 (thank you @bgilbert)
* https://github.com/poseidon/matchbox/pull/978
fixes#1123
Enables the use of CSI drivers with a StorageClass that lacks an explicit context mount option. In cases where the kubelet lacks mounts for `/etc/selinux` and `/sys/fs/selinux`, it is unable to set the `:Z` option for the CRI volume definition automatically. See [KEP 1710](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1710-selinux-relabeling/README.md#volume-mounting) for more information on how SELinux is passed to the CRI by Kubelet.
Prior to this change, a not-explicitly-labelled mount would have an `unlabeled_t` SELinux type on the host. Following this change, the Kubelet and CRI work together to dynamically relabel mounts that lack an explicit context specification every time it is rebound to a pod with SELinux type `container_file_t` and appropriate context labels to match the specifics for the pod it is bound to. This enables applications running in containers to consume dynamically provisioned storage on SELinux enforcing systems without explicitly setting the context on the StorageClass or PersistentVolume.