Commit Graph

95 Commits

Author SHA1 Message Date
Dalton Hubble
0669d44026
Update Kubernetes from v1.30.2 to v1.30.3
* Update builtin Cilium manifests from v1.15.6 to v1.15.7
* Update builtin flannel manifests from v0.25.4 to v0.25.5
2024-07-20 11:04:32 -07:00
Dalton Hubble
0d10d180f8
Change worker node pools from uniform to flexible orchestration mode
* Use flexible orchestration mode. Azure has started to recommend this
mode because it allows interacting with VMSS instances like regular VMs
via the CLI or via the Azure Portal
* Add options to allow workers nodes to use ephemeral local disks
  * Add `controller_disk_type` and `controller_disk_size` variables
  * Add `worker_disk_type`, `worker_disk_size`, and `worker_ephemeral_disk` variables
2024-07-14 11:58:15 -07:00
Dalton Hubble
24b7f31c55
Rename Azure cluster region variable to location
* Rename the region variable to location to align with Azure
platform conventions, where resources are created within an
Azure location, which are themselves part of broader geographical
regions
2024-07-09 07:56:58 -07:00
Dalton Hubble
48d4973957
Add IPv6 support for Typhoon Azure clusters
* Define a dual-stack virtual network with both IPv4 and IPv6 private
address space. Change `host_cidr` variable (string) to a `network_cidr`
variable (object) with "ipv4" and "ipv6" fields that list CIDR strings.
* Define dual-stack controller and worker subnets. Disable Azure
default outbound access (a deprecated fallback mechanism)
* Enable dual-stack load balancing to Kubernetes Ingress by adding
a public IPv6 frontend IP and LB rule to the load balancer.
* Enable worker outbound IPv6 connectivity through load balancer
SNAT by adding an IPv6 frontend IP and outbound rule
* Configure controller nodes with a public IPv6 address to provide
direct outbound IPv6 connectivity
* Add an IPv6 worker backend pool. Azure requires separate IPv4 and
IPv6 backend pools, though the health probe can be shared
* Extend network security group rules for IPv6 source/destinations

Checklist:

Access to controller and worker nodes via IPv6 addresses:

  * SSH access to controller nodes via public IPv6 address
  * SSH access to worker nodes via (private) IPv6 address (via
    controller)

Outbound IPv6 connectivity from controller and worker nodes:

```
nc -6 -zv ipv6.google.com 80
Ncat: Version 7.94 ( https://nmap.org/ncat )
Ncat: Connected to [2607:f8b0:4001:c16::66]:80.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.
```

Serve Ingress traffic via IPv4 or IPv6 just requires setting
up A and AAAA records and running the ingress controller with
`hostNetwork: true` since, hostPort only forwards IPv4 traffic
2024-07-09 07:55:00 -07:00
Dalton Hubble
931d6d18de
Update Kubernetes from v1.30.1 to v1.30.2
* Update CoreDNS from v1.9.4 to v1.11.1
* Update Cilium from v1.15.5 to v1.15.6
* Update flannel from v0.25.1 to v0.25.4
2024-06-17 08:20:03 -07:00
Dalton Hubble
563feacd29 Update Kubernetes from v1.30.0 to v1.30.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1301
2024-05-15 21:59:00 -07:00
dghubble-renovate[bot]
e8a42ae33e Bump provider ct to v0.13.0 2024-05-04 09:01:19 -07:00
Dalton Hubble
6ac5a0222b
Update Kubernetes from v1.29.3 to v1.30.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1300
2024-04-23 20:51:54 -07:00
Dalton Hubble
8524aa00bc Update Kubernetes from v1.29.2 to v1.29.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1293
2024-03-23 00:47:10 -07:00
Dalton Hubble
f2f625984e Update Kubernetes from v1.29.1 to v1.29.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1292
2024-02-18 18:31:31 -08:00
Dalton Hubble
e247673a20 Update Kubernetes from v1.29.0 to v1.29.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1291
2024-02-04 10:47:42 -08:00
Dalton Hubble
84e4f02917 Update Kubernetes from v1.28.4 to v1.29.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md
2023-12-22 10:27:24 -08:00
Dalton Hubble
8254d8f3db Update Kubernetes from v1.28.3 to v1.28.4
* https://github.com/kubernetes/kubernetes/releases/tag/v1.28.4
2023-11-21 06:16:58 -08:00
Dalton Hubble
005a1119f3 Update Kubernetes from v1.28.2 to v1.28.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1283
2023-10-22 18:43:54 -07:00
Dalton Hubble
0ce8dfbb95 Workaround to allow use of ed25519 keys on Azure
* Allow passing a dummy RSA key to Azure to satisfy its obtuse
requirements (recommend deleting the corresponding private key)
* Then `ssh_authorized_key` can be used to provide Fedora CoreOS
or Flatcar Linux with a modern ed25519 public key to set in the
authorized_keys via Ignition
2023-09-17 23:21:42 +02:00
Dalton Hubble
f5bc1fb1fd Update Kubernetes from v1.28.1 to v1.28.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1282
2023-09-14 13:01:33 -07:00
Dalton Hubble
126973082a Update Kubernetes from v1.28.0 to v1.28.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1281
2023-08-26 13:29:48 -07:00
Dalton Hubble
81eed2e909 Update Kubernetes from v1.27.4 to v1.28.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md#v1280
2023-08-20 15:41:23 -07:00
Dalton Hubble
0a6183f859 Update Kubernetes from v1.27.3 to v1.27.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1274
2023-07-21 08:00:50 -07:00
Dalton Hubble
7255f82d71 Update Kubernetes fromv 1.27.2 to v1.27.3
* Update Cilium v1.13.3 to v1.13.4

Rel: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1273
2023-06-16 08:28:17 -07:00
Dalton Hubble
784f60f624 Enable boot diagnostics for Azure controller and worker VMs
* When invalid Ignition snippets are provided to Typhoon, it
can be useful to view Azure's boot logs for the instance, which
requires boot diagnostics be enabled
2023-06-11 19:24:09 -07:00
Dalton Hubble
8ebf31073c Update Kubernetes from v1.27.1 to v1.27.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1272
2023-05-21 14:02:49 -07:00
Dalton Hubble
501e6d25e0 Update Kubernetes from v1.27.0 to v1.27.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1271
2023-04-15 23:16:51 -07:00
Dalton Hubble
4322857bec Update Kubernetes from v1.26.3 to v1.27.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#v1270
2023-04-15 22:49:12 -07:00
Dalton Hubble
3670ec7ed7 Update Kubernetes from v1.26.2 to v1.26.3
* Update Cilium from v1.13.0 to v1.13.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1263
2023-03-21 18:18:19 -07:00
Dalton Hubble
76ebc08fd2 Update Kubernetes from v1.26.1 to v1.26.2
* https://github.com/poseidon/terraform-render-bootstrap/pull/345
2023-03-01 17:13:16 -08:00
Dalton Hubble
f2bf5ac3fb Update Kubernetes from v1.26.0 to v1.26.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1261
2023-01-19 08:27:56 -08:00
Dalton Hubble
d6cbcf9f96 Update Kubernetes from v1.26.0-rc.1 to v1.26.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1260
2022-12-08 08:47:24 -08:00
Dalton Hubble
0dc8740c77 Update Kubernetes from v1.26.0-rc.0 to v1.26.0-rc.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1260-rc1
2022-12-05 09:31:45 -08:00
Dalton Hubble
a9b12b6bca Update Kubernetes from v1.25.4 to v1.26.0-rc.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.26.md#v1260-rc0
2022-11-30 08:47:40 -08:00
Dalton Hubble
26dbc7e91d Update Kubernetes from v1.25.3 to v1.25.4
* Update Calico from v3.24.3 to v3.24.5
* Update Prometheus and Grafana addons
2022-11-10 09:42:21 -08:00
Dalton Hubble
937acc4b5a Re-enable Graceful Node Shutdown feature
* Kubelet GracefulNodeShutdown works, but only partially handles
gracefully stopping the Kubelet. The most noticeable drawback
is that Completed Pods are left around
* Use a project like poseidon/scuttle or a similar systemd unit
as a snippet to add drain and/or delete behaviors if desired
* This reverts commit 1786e34f33.

Rel:

* https://www.psdn.io/posts/kubelet-graceful-shutdown/
* https://github.com/poseidon/scuttle
2022-11-02 20:49:01 -07:00
Dalton Hubble
0f38a6d405 Remove defunct delete-node.service from worker nodes
* delete-node.service used to be used to remove nodes from the
cluster on shutdown, but its long since it last worked properly
* If there is still a desire for this concept, it can be added
with a custom snippet and with a better systemd unit
2022-10-20 08:43:48 -07:00
Dalton Hubble
f04e1d25a8 Add Flatcar Linux ARM64 support on Azure
* Kinvolk now publishes Flatcar Linux images for ARM64
* For now, amd64 image must specify a plan while arm64 images
must NOT specify a plan due to how Kinvolk publishes.

Rel: https://github.com/flatcar/Flatcar/issues/872
2022-10-17 08:36:57 -07:00
Dalton Hubble
651151805d Update Kubernetes v1.25.2 to v1.25.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1253
2022-10-13 21:02:39 -07:00
Dalton Hubble
8d2c8b8db6 Switch to Flatcar Azure gen2 images and change worker type
* Switch from Azure Hypervisor generation 1 to generation 2
* Change default Azure `worker_type` from Standard_DS1_v2 to Standard_D2as_v5
  * Get 2 VCPU, 7 GiB, 12500Mbps (vs 1 VCPU, 3.5GiB, 750 Mbps)
  * Small increase in pay-as-you-go price ($53.29 -> $62.78)
  * Small increase in spot price ($5.64/mo -> $7.37/mo)
  * Change from Intel to AMD EPYC (`D2as_v5` cheaper than `D2s_v5`)

Notes: Azure makes you accept terms for each plan:

```
az vm image terms accept --publish kinvolk --offer flatcar-container-linux-free --plan stable-gen2
```

Rel:

* https://learn.microsoft.com/en-us/azure/virtual-machines/dasv5-dadsv5-series#dasv5-series
* https://learn.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series#dsv2-series
2022-10-13 09:57:52 -07:00
Dalton Hubble
3ee462a24c Update Kubernetes from v1.25.1 to v1.25.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1252
2022-09-22 08:15:30 -07:00
Dalton Hubble
74d4d56dbd Remove workaround for v1.25.0 ConfigMap rendering issue
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: https://github.com/kubernetes/kubernetes/issues/112081
2022-09-19 09:10:24 -07:00
Dalton Hubble
09751cc0e8 Update Kubernetes from v1.25.0 to v1.25.1
* https://github.com/kubernetes/kubernetes/releases/tag/v1.25.1
2022-09-15 08:23:22 -07:00
Dalton Hubble
1786e34f33 Revert Graceful Node Shutdown feature
* Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in
Kubernetes v1.25.0 https://github.com/poseidon/typhoon/pull/1222)
* Graceful node shutdown shutdown allows 30s for critical pods to
shutdown and 15s for regular pods to shutdown before releasing the
inhibitor lock to allow the host to shutdown
* Unfortunately, both pods and the node are shutdown at the same
time at the end of the 45s period without further configuration
options. As a result, regular pods and the node are shutdown at the
same time. In practice, enabling this feature leaves Error or Completed
pods in kube-apiserver state until manually cleaned up. This feature
is not ready for general use
* Fix issue where Error/Completed pods are accumulating whenever any
node restarts (or auto-updates), visible in kubectl get pods
* This issue wasn't apparent in initial testing and seems to only
affect non-critical pods (due to critical pods being killed earlier)
But its very apparent on our real clusters

Rel: https://github.com/kubernetes/kubernetes/issues/110755
2022-09-10 14:58:44 -07:00
Dalton Hubble
393a38deff Configure Graceful Node Shutdown and lengthen max inhibitor delay
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* https://github.com/kubernetes/kubernetes/issues/107043
* https://github.com/coreos/fedora-coreos-tracker/issues/821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
2022-08-28 10:37:33 -07:00
Dalton Hubble
275fc0f9e8 Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature
* Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring
feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in
Pods, as shown by conformance tests
* Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back
to alpha so its not enabled by default, but that will require a release
* Disable the feature gate directly as a workaround for now to make
Kubernetes v1.25.0 usable

```
FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478
```

Rel:

* https://github.com/kubernetes/kubernetes/pull/112076
* https://github.com/kubernetes/kubernetes/pull/107329
2022-08-27 09:49:35 -07:00
Dalton Hubble
3fb59a3289 Migrate most Kubelet flags to KubeletConfiguration file
* Add a KubeletConfiguration file to replace most Kubelet
flags, to prepare for upcoming changes
* Pass Kubelet the --config flag to specify the location of
the KubeletConfiguration
* Remove flsgs / configuration where it matches the defaults
  * Remove --cgroups-per-qos, defaults to true
  * Remove --container-runtime, defaults to remote
  * Remove enforce-node-allocatable=pods, defaults to pods

Rel:

* https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
2022-08-27 09:28:15 -07:00
Dalton Hubble
a31dbceac6 Update Kubernetes from v1.24.4 to v1.25.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md
2022-08-25 09:18:14 -07:00
Dalton Hubble
760b4cd5ee Update Kubernetes from v1.24.3 to v1.24.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244
2022-08-17 20:09:30 -07:00
Dalton Hubble
4a469513dd Migrate Flatcar Linux from Ignition spec v2.3.0 to v3.3.0
* Requires poseidon v0.11+ and Flatcar Linux 3185.0.0+ (action required)
* Previously, Flatcar Linux configs have been parsed as Container
Linux Configs to Ignition v2.2.0 specs by poseidon/ct
* Flatcar Linux starting in 3185.0.0 now supports Ignition v3.x specs
(which are rendered from Butane Configs, like Fedora CoreOS)
* poseidon/ct v0.11.0 adds support for the flatcar Butane Config
variant so that Flatcar Linux can use Ignition v3.x

Rel:

* [Flatcar Support](https://flatcar-linux.org/docs/latest/provisioning/ignition/specification/#ignition-v3)
* [poseidon/ct support](https://github.com/poseidon/terraform-provider-ct/pull/131)
2022-08-03 08:32:52 -07:00
Dalton Hubble
256b87812e Remove Terraform template provider dependency
* Use Terraform builtin templatefile functionality
* Remove dependency on deprecated Terraform template provider

Rel:

* https://registry.terraform.io/providers/hashicorp/template/2.2.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/293
2022-08-02 18:15:03 -07:00
Dalton Hubble
0db5f86110 Update Kubernetes from v1.24.2 to v1.24.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1243
2022-07-13 20:59:15 -07:00
Dalton Hubble
6d6b48b201 Update Kubernetes from v1.24.1 to v1.24.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1242
2022-06-18 18:35:42 -07:00
Dalton Hubble
c5573199db Update Kubernetes from v1.24.0 to v1.24.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1241
2022-05-28 09:39:14 +01:00