Commit Graph

1478 Commits

Author SHA1 Message Date
Dalton Hubble 937acc4b5a Re-enable Graceful Node Shutdown feature
* Kubelet GracefulNodeShutdown works, but only partially handles
gracefully stopping the Kubelet. The most noticeable drawback
is that Completed Pods are left around
* Use a project like poseidon/scuttle or a similar systemd unit
as a snippet to add drain and/or delete behaviors if desired
* This reverts commit 1786e34f33.

Rel:

* https://www.psdn.io/posts/kubelet-graceful-shutdown/
* https://github.com/poseidon/scuttle
2022-11-02 20:49:01 -07:00
dependabot[bot] b0a6dc8115 Bump mkdocs-material from 8.5.6 to 8.5.7
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.5.6 to 8.5.7.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.5.6...8.5.7)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-25 19:27:41 -07:00
dependabot[bot] 420ff6ff04 Bump pymdown-extensions from 9.6 to 9.7
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.6 to 9.7.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.6...9.7)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-25 17:50:48 -07:00
Dalton Hubble 9b733d79c7 Update Calico v3.24.2 to v3.24.3
* https://github.com/projectcalico/calico/releases/tag/v3.24.3
* Add patch to allow Kubelet kubeconfig to drain nodes if desired
in addition to just deleting them in shutdown integrations. See
https://github.com/poseidon/terraform-render-bootstrap/pull/330
2022-10-23 22:00:15 -07:00
Dalton Hubble 35a9e22b1f Update Calico from v3.24.1 to v3.24.2
* https://github.com/projectcalico/calico/releases/tag/v3.24.2
2022-10-20 09:28:19 -07:00
Dalton Hubble 0f38a6d405 Remove defunct delete-node.service from worker nodes
* delete-node.service used to be used to remove nodes from the
cluster on shutdown, but its long since it last worked properly
* If there is still a desire for this concept, it can be added
with a custom snippet and with a better systemd unit
2022-10-20 08:43:48 -07:00
Dalton Hubble a535581ef2 Remove unused Wants=network.target from etcd-member
* network.target is a passive unit that's not actually pulled
in by units requiring or wanting it, its only used for shutdown
ordering
> "Services using the network should ... avoid any Wants=network.target or even Requires=network.target"

Rel: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
2022-10-20 08:32:55 -07:00
Dalton Hubble 08d13e7215 Improve release notes slightly with links 2022-10-20 08:30:30 -07:00
Dalton Hubble 3ff2d38fa5 Update Cilium from v1.12.2 to v1.12.3
* https://github.com/cilium/cilium/releases/tag/v1.12.3
2022-10-17 17:25:23 -07:00
dependabot[bot] d6d8eb8d79 Bump mkdocs from 1.4.0 to 1.4.1
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.4.0...1.4.1)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-17 16:56:19 -07:00
Dalton Hubble f04e1d25a8 Add Flatcar Linux ARM64 support on Azure
* Kinvolk now publishes Flatcar Linux images for ARM64
* For now, amd64 image must specify a plan while arm64 images
must NOT specify a plan due to how Kinvolk publishes.

Rel: https://github.com/flatcar/Flatcar/issues/872
2022-10-17 08:36:57 -07:00
Dalton Hubble b68f8bb2a9 Switch Azure Fedora CoreOS default worker type
* Change default Azure worker_type from Standard_DS1_v2 to Standard_D2as_v5
  * Get 2 VCPU, 7 GiB, 12500Mbps (vs 1 VCPU, 3.5GiB, 750 Mbps)
  * Small increase in pay-as-you-go price ($53.29 -> $62.78)
  * Small increase in spot price ($5.64/mo -> $7.37/mo)
  * Change from Intel to AMD EPYC (`D2as_v5` cheaper than `D2s_v5`)

Rel:

* https://github.com/poseidon/typhoon/pull/1248
* https://learn.microsoft.com/en-us/azure/virtual-machines/dasv5-dadsv5-series#dasv5-series
* https://learn.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series#dsv2-series
2022-10-13 21:23:57 -07:00
Dalton Hubble 651151805d Update Kubernetes v1.25.2 to v1.25.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1253
2022-10-13 21:02:39 -07:00
Dalton Hubble 8d2c8b8db6 Switch to Flatcar Azure gen2 images and change worker type
* Switch from Azure Hypervisor generation 1 to generation 2
* Change default Azure `worker_type` from Standard_DS1_v2 to Standard_D2as_v5
  * Get 2 VCPU, 7 GiB, 12500Mbps (vs 1 VCPU, 3.5GiB, 750 Mbps)
  * Small increase in pay-as-you-go price ($53.29 -> $62.78)
  * Small increase in spot price ($5.64/mo -> $7.37/mo)
  * Change from Intel to AMD EPYC (`D2as_v5` cheaper than `D2s_v5`)

Notes: Azure makes you accept terms for each plan:

```
az vm image terms accept --publish kinvolk --offer flatcar-container-linux-free --plan stable-gen2
```

Rel:

* https://learn.microsoft.com/en-us/azure/virtual-machines/dasv5-dadsv5-series#dasv5-series
* https://learn.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series#dsv2-series
2022-10-13 09:57:52 -07:00
Dalton Hubble 675ac63159 Remove note about not supporting ARM64 with Calico CNI
* Calico v3.22.0 introduced multi-arch container images so Typhoon's
ARM64 support has allowed choosing Calico CNI since Typhoon v1.23.5
2022-10-11 23:21:02 -07:00
Dalton Hubble b4c8b1729c Switch addons images from k8s.gcr.io to registry.k8s.io
* Switch addon manifests to use the new Kubernetes image registry

Rel:

* https://github.com/poseidon/typhoon/pull/1206
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#moved-container-registry-service-from-k8sgcrio-to-registryk8sio
2022-10-09 16:14:28 -07:00
Dalton Hubble e82241169a Update Prometheus from v2.38.0 to v2.39.1
* https://github.com/prometheus/prometheus/releases/tag/v2.39.1
2022-10-09 16:12:35 -07:00
dependabot[bot] ffe4929ff6 Bump mkdocs-material from 8.5.3 to 8.5.6
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.5.3 to 8.5.6.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.5.3...8.5.6)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-09 14:44:06 -07:00
dependabot[bot] 88b3925318 Bump pymdown-extensions from 9.5 to 9.6
Bumps [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) from 9.5 to 9.6.
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](https://github.com/facelessuser/pymdown-extensions/compare/9.5...9.6)

---
updated-dependencies:
- dependency-name: pymdown-extensions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-03 15:34:37 -07:00
dependabot[bot] 29876dc85a Bump mkdocs from 1.3.1 to 1.4.0
Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.3.1 to 1.4.0.
- [Release notes](https://github.com/mkdocs/mkdocs/releases)
- [Commits](https://github.com/mkdocs/mkdocs/compare/1.3.1...1.4.0)

---
updated-dependencies:
- dependency-name: mkdocs
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-03 14:49:24 -07:00
dependabot[bot] 7e29e35457 Bump mkdocs-material from 8.5.2 to 8.5.3
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.5.2 to 8.5.3.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.5.2...8.5.3)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-28 08:57:03 -07:00
Dalton Hubble 3ee462a24c Update Kubernetes from v1.25.1 to v1.25.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md#v1252
2022-09-22 08:15:30 -07:00
Dalton Hubble f833b7205d Sync recommended Terraform providers in docs 2022-09-20 08:30:15 -07:00
Dalton Hubble 558e293f78 Update Nginx Ingress and Grafana addons 2022-09-20 08:28:30 -07:00
Dalton Hubble 90782ea820 Remove workaround for preventing search . propagation
* Kubelet v1.25.1 has the fix https://github.com/kubernetes/kubernetes/pull/112157
2022-09-19 22:37:02 -07:00
dependabot[bot] 8dc7cc614c Bump mkdocs-material from 8.4.4 to 8.5.2
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.4.4 to 8.5.2.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.4.4...8.5.2)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-19 22:16:32 -07:00
Dalton Hubble 74d4d56dbd Remove workaround for v1.25.0 ConfigMap rendering issue
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: https://github.com/kubernetes/kubernetes/issues/112081
2022-09-19 09:10:24 -07:00
Dalton Hubble 5abe84b520 Update etcd from v3.5.4 to v3.5.5
* https://github.com/etcd-io/etcd/blob/main/CHANGELOG/CHANGELOG-3.5.md#v355
2022-09-15 09:01:45 -07:00
Dalton Hubble 951209d113 Update Cilium from v1.12.1 to v1.12.2
* https://github.com/cilium/cilium/releases/tag/v1.12.2
2022-09-15 08:28:37 -07:00
Dalton Hubble 09751cc0e8 Update Kubernetes from v1.25.0 to v1.25.1
* https://github.com/kubernetes/kubernetes/releases/tag/v1.25.1
2022-09-15 08:23:22 -07:00
Dalton Hubble c14300f0be Update Calico from v3.23.3 to v3.24.1
* https://github.com/projectcalico/calico/releases/tag/v3.24.1
2022-09-14 08:09:38 -07:00
dependabot[bot] 37de9ca2ae Bump mkdocs-material from 8.4.2 to 8.4.4
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.4.2 to 8.4.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.4.2...8.4.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-14 07:42:59 -07:00
Dalton Hubble 1786e34f33 Revert Graceful Node Shutdown feature
* Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in
Kubernetes v1.25.0 https://github.com/poseidon/typhoon/pull/1222)
* Graceful node shutdown shutdown allows 30s for critical pods to
shutdown and 15s for regular pods to shutdown before releasing the
inhibitor lock to allow the host to shutdown
* Unfortunately, both pods and the node are shutdown at the same
time at the end of the 45s period without further configuration
options. As a result, regular pods and the node are shutdown at the
same time. In practice, enabling this feature leaves Error or Completed
pods in kube-apiserver state until manually cleaned up. This feature
is not ready for general use
* Fix issue where Error/Completed pods are accumulating whenever any
node restarts (or auto-updates), visible in kubectl get pods
* This issue wasn't apparent in initial testing and seems to only
affect non-critical pods (due to critical pods being killed earlier)
But its very apparent on our real clusters

Rel: https://github.com/kubernetes/kubernetes/issues/110755
2022-09-10 14:58:44 -07:00
Dalton Hubble 5f612c82e2 Update kube-state-metrics and Grafana addons 2022-09-01 08:58:32 -07:00
Dalton Hubble e60a321185 Sync Terraform providers shown in docs 2022-09-01 08:07:15 -07:00
dependabot[bot] 5ad74883fe Bump mkdocs-material from 8.4.1 to 8.4.2
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.4.1 to 8.4.2.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.4.1...8.4.2)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-01 08:06:34 -07:00
Dalton Hubble 4ad473cd3c Add workaround patch to strip "search ." from resolv.conf
* systemd adds "search ." to hosts /run/systemd/resolve/resolv.conf
on hosts with a fqdn hostname
* Kubelet v1.25 began propagating "search ." from the host node
into containers' `/etc/resolv.conf`
* musl-based DNS resolvers don't behave correctly when `search .`
is used in their `/etc/resolv.conf`. This breaks Alpine images
* Adapt the same workaround used by Openshift to strip the "search ."
* This only applies to bare-metal Typhoon nodes (where hostnames are
set to fqdn's), nodes on cloud platforms aren't affected in the
Typhoon configuration

Kubernetes tracking issue: https://github.com/kubernetes/kubernetes/issues/112135

Rel:

* https://github.com/systemd/systemd/pull/17201
* https://github.com/kubernetes/kubernetes/pull/109441
* https://github.com/coreos/fedora-coreos-tracker/issues/1287
* https://github.com/openshift/okd-machine-os/pull/159
2022-08-31 08:05:45 -07:00
Dalton Hubble 393a38deff Configure Graceful Node Shutdown and lengthen max inhibitor delay
* Configure Kubelet Graceful Node Shutdown to detect system shutdown
events and stop running containers gracefully when possible
* Allow up to 30s for critical pods to gracefully shutdown
* Allow up to 15s for regular pods to gracefully shutdown
* Node will be marked as NotReady promptly, instead of having to
wait for health checks
* Kubelet uses systemd inhibitor locks to delay shutdown for a limited
number of seconds
* Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

```
sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay
```

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* https://github.com/kubernetes/kubernetes/issues/107043
* https://github.com/coreos/fedora-coreos-tracker/issues/821
* https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html
* https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go
* https://github.com/godbus/dbus/blob/master/conn.go
2022-08-28 10:37:33 -07:00
Dalton Hubble 76d92e9c2d Change podman log-driver from journald to k8s-file
* When podman runs the Kubelet container, logging to journald means
log lines are duplicated in the journal. journalctl -u kubelet shows
Kubelet's logs and the same log messages from podman. Using the
k8s-file driver alleviates this problem
* Fix Kubelet and etcd-member logs to be more readable and reduce
unneccessary Kubelet log volume
2022-08-27 17:15:22 -07:00
Dalton Hubble 275fc0f9e8 Disable LocalStorageCapacityIsolationFSQuotaMonitoring feature
* Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring
feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in
Pods, as shown by conformance tests
* Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back
to alpha so its not enabled by default, but that will require a release
* Disable the feature gate directly as a workaround for now to make
Kubernetes v1.25.0 usable

```
FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478
```

Rel:

* https://github.com/kubernetes/kubernetes/pull/112076
* https://github.com/kubernetes/kubernetes/pull/107329
2022-08-27 09:49:35 -07:00
Dalton Hubble 3fb59a3289 Migrate most Kubelet flags to KubeletConfiguration file
* Add a KubeletConfiguration file to replace most Kubelet
flags, to prepare for upcoming changes
* Pass Kubelet the --config flag to specify the location of
the KubeletConfiguration
* Remove flsgs / configuration where it matches the defaults
  * Remove --cgroups-per-qos, defaults to true
  * Remove --container-runtime, defaults to remote
  * Remove enforce-node-allocatable=pods, defaults to pods

Rel:

* https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
* https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
2022-08-27 09:28:15 -07:00
Dalton Hubble a31dbceac6 Update Kubernetes from v1.24.4 to v1.25.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md
2022-08-25 09:18:14 -07:00
dependabot[bot] 1dcf56127b Bump mkdocs-material from 8.4.0 to 8.4.1
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 8.4.0 to 8.4.1.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/8.4.0...8.4.1)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-08-23 08:53:12 -07:00
Dalton Hubble bf06412dfd Update Prometheus and Grafana addons 2022-08-21 08:56:00 -07:00
Dalton Hubble 505818b7d5 Update docs showing the terraform plan resources count
* Although I don't plan to keep these in sync, some users are
confused when the docs don't match the actual resource count
2022-08-21 08:52:35 -07:00
Dalton Hubble 0d27811265 Update recommended Terraform provider versions 2022-08-18 09:08:55 -07:00
Dalton Hubble c13d060b38 Add docs for GCP MIG update and AWS instance refresh
* Document that worker instances are rolling replaced when
changes to their configuration are applied
2022-08-18 09:02:38 -07:00
Dalton Hubble e87d5aabc3 Adjust Google Cloud worker health checks to use kube-proxy healthz
* Change the workers managed instance group to health check nodes
via HTTP probe of the kube-proxy port 10256 /healthz endpoints
* Advantages: kube-proxy is a lower value target (in case there
were bugs in firewalls) that Kubelet, its more representative than
health checking Kubelet (Kubelet must run AND kube-proxy Daemonset
must be healthy), and its already used by kube-proxy liveness probes
(better discoverability via kubectl or alerts on pods crashlooping)
* Another motivator is that GKE clusters also use kube-proxy port
10256 checks to assess node health
2022-08-17 20:50:52 -07:00
Dalton Hubble 760b4cd5ee Update Kubernetes from v1.24.3 to v1.24.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#v1244
2022-08-17 20:09:30 -07:00
Dalton Hubble fcd8ff2b17 Update Cilium from v1.12.0 to v1.12.1
* https://github.com/cilium/cilium/releases/tag/v1.12.1
2022-08-17 08:53:56 -07:00