* To accompany the restructure of the bare-metal modules to
allow discrete workers to be defined and attached to a cluster
(#1295), the `workers` variable (older way, used for defining
homogeneous workers inline) should be optional and default
to an empty list
* Add docs covering inline vs discrete metal workers
Fix#1301
* Add an internal `worker` module to the bare-metal module, to
allow individual bare-metal machines to be defined and joined
to an existing bare-metal cluster. This is similar to the "worker
pools" modules for adding sets of nodes to cloud (AWS, GCP, Azure)
clusters, but on metal, each piece of hardware is potentially
unique
New: Using the new `worker` module, a Kubernetes cluster can be defined
without any `workers` (i.e. just a control-plane). Use the `worker`
module to define each piece machine that should join the bare-metal
cluster and customize it in detail. This style is quite flexible and
suited for clusters with hardware that varies quite a bit.
```tf
module "mercury" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes?ref=v1.26.2"
# bare-metal
cluster_name = "mercury"
matchbox_http_endpoint = "http://matchbox.example.com"
os_channel = "flatcar-stable"
os_version = "2345.3.1"
# configuration
k8s_domain_name = "node1.example.com"
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
# machines
controllers = [{
name = "node1"
mac = "52:54:00:a1:9c:ae"
domain = "node1.example.com"
}]
}
```
```tf
module "mercury-node1" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes/worker?ref=v1.26.2"
cluster_name = "mercury"
# bare-metal
matchbox_http_endpoint = "http://matchbox.example.com"
os_channel = "flatcar-stable"
os_version = "2345.3.1"
# configuration
name = "node2"
mac = "52:54:00:b2:2f:86"
domain = "node2.example.com"
kubeconfig = module.mercury.kubeconfig
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
# optional
snippets = []
node_labels = []
node_tains = []
install_disk = "/dev/vda"
cached_install = false
}
```
For clusters with fairly similar hardware, you may continue to
define `workers` directly within the cluster definition. This
reduces some repetition, but is not quite as flexible.
```tf
module "mercury" {
source = "git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes?ref=v1.26.1"
# bare-metal
cluster_name = "mercury"
matchbox_http_endpoint = "http://matchbox.example.com"
os_channel = "flatcar-stable"
os_version = "2345.3.1"
# configuration
k8s_domain_name = "node1.example.com"
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
# machines
controllers = [{
name = "node1"
mac = "52:54:00:a1:9c:ae"
domain = "node1.example.com"
}]
workers = [
{
name = "node2",
mac = "52:54:00:b2:2f:86"
domain = "node2.example.com"
},
{
name = "node3",
mac = "52:54:00:c3:61:77"
domain = "node3.example.com"
}
]
}
```
Optional variables `snippets`, `worker_node_labels`, and
`worker_node_taints` are still defined as a map from machine name
to a list of snippets, labels, or taints respectively to allow some
degree of per-machine customization. However, fields like
`install_disk`, `kernel_args`, `cached_install` and future options
will not be designed this way. Instead, if your machines vary it
is recommended to use the new `worker` module to define each node
* Kubelet GracefulNodeShutdown works, but only partially handles
gracefully stopping the Kubelet. The most noticeable drawback
is that Completed Pods are left around
* Use a project like poseidon/scuttle or a similar systemd unit
as a snippet to add drain and/or delete behaviors if desired
* This reverts commit 1786e34f33.
Rel:
* https://www.psdn.io/posts/kubelet-graceful-shutdown/
* https://github.com/poseidon/scuttle
* network.target is a passive unit that's not actually pulled
in by units requiring or wanting it, its only used for shutdown
ordering
> "Services using the network should ... avoid any Wants=network.target or even Requires=network.target"
Rel: https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
* Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in
Kubernetes v1.25.0 https://github.com/poseidon/typhoon/pull/1222)
* Graceful node shutdown shutdown allows 30s for critical pods to
shutdown and 15s for regular pods to shutdown before releasing the
inhibitor lock to allow the host to shutdown
* Unfortunately, both pods and the node are shutdown at the same
time at the end of the 45s period without further configuration
options. As a result, regular pods and the node are shutdown at the
same time. In practice, enabling this feature leaves Error or Completed
pods in kube-apiserver state until manually cleaned up. This feature
is not ready for general use
* Fix issue where Error/Completed pods are accumulating whenever any
node restarts (or auto-updates), visible in kubectl get pods
* This issue wasn't apparent in initial testing and seems to only
affect non-critical pods (due to critical pods being killed earlier)
But its very apparent on our real clusters
Rel: https://github.com/kubernetes/kubernetes/issues/110755
* When podman runs the Kubelet container, logging to journald means
log lines are duplicated in the journal. journalctl -u kubelet shows
Kubelet's logs and the same log messages from podman. Using the
k8s-file driver alleviates this problem
* Fix Kubelet and etcd-member logs to be more readable and reduce
unneccessary Kubelet log volume
* Kubernetes v1.25.0 moved the LocalStorageCapacityIsolationFSQuotaMonitoring
feature from alpha to beta, but it breaks Kubelet updating ConfigMaps in
Pods, as shown by conformance tests
* Kubernetes is rolling LocalStorageCapacityIsolationFSQuotaMonitoring back
to alpha so its not enabled by default, but that will require a release
* Disable the feature gate directly as a workaround for now to make
Kubernetes v1.25.0 usable
```
FailedMount: MountVolume.SetUp failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volume but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478
```
Rel:
* https://github.com/kubernetes/kubernetes/pull/112076
* https://github.com/kubernetes/kubernetes/pull/107329