typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-03 22:34:38 +02:00

Author	SHA1	Message	Date
Dalton Hubble	37f00a3882	Reduce Calcio MTU on Fedora CoreOS Azure * Change the Calico VXLAN interface for MTU from 1450 to 1410 * VXLAN on Azure should support MTU 1450. However, there is history where performance measures have shown that 1410 is needed to have expected performance. Flatcar Linux has the same MTU 1410 override and note * FCOS 31.20200323.3.2 was known to perform fine with 1450, but now in 31.20200517.3.0 the right value seems to be 1410	2020-06-19 00:24:56 -07:00
Dalton Hubble	90e23f5822	Rename controller node label and NoSchedule taint * Remove node label `node.kubernetes.io/master` from controller nodes * Use `node.kubernetes.io/controller` (present since v1.9.5, [#160](https://github.com/poseidon/typhoon/pull/160)) to node select controllers * Rename controller NoSchedule taint from `node-role.kubernetes.io/master` to `node-role.kubernetes.io/controller` * Tolerate the new taint name for workloads that may run on controller nodes and stop tolerating `node-role.kubernetes.io/master` taint	2020-06-19 00:12:13 -07:00
Dalton Hubble	c25c59058c	Update Kubernetes from v1.18.3 to v1.18.4 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1184	2020-06-17 19:53:19 -07:00
Dalton Hubble	413585681b	Remove unused Kubelet lock-file and exit-on-lock-contention * Kubelet `--lock-file` and `--exit-on-lock-contention` date back to usage of bootkube and at one point running Kubelet in a "self-hosted" style whereby an on-host Kubelet (rkt) started pods, but then a Kubelet DaemonSet was scheduled and able to take over (hence self-hosted). `lock-file` and `exit-on-lock-contention` flags supported this pivot. The pattern has been out of favor (in bootkube too) for years because of dueling Kubelet complexity * Typhoon runs Kubelet as a container via an on-host systemd unit using podman (Fedora CoreOS) or rkt (Flatcar Linux). In fact, Typhoon no longer uses bootkube or control plane pivot (let alone Kubelet pivot) and uses static pods since v1.16.0 * https://github.com/poseidon/typhoon/pull/536	2020-06-12 00:06:41 -07:00
Dalton Hubble	96711d7f17	Remove unused Kubelet cert / key Terraform state * Generated Kubelet TLS certificate and key are not longer used or distributed to machines since Kubelet TLS bootstrap is used instead. Remove the certificate and key from state	2020-06-11 21:24:36 -07:00
Dalton Hubble	a287920169	Use strict mode for Container Linux Configs * Enable terraform-provider-ct `strict` mode for parsing Container Linux Configs and snippets * Fix Container Linux Config systemd unit syntax `enable` (old) to `enabled` * Align with Fedora CoreOS which uses strict mode already	2020-06-09 23:00:36 -07:00
Dalton Hubble	20bfd69780	Change Kubelet container image publishing * Build Kubelet container images internally and publish to Quay and Dockerhub (new) as an alternative in case of registry outage or breach * Use our infra to provide single and multi-arch (default) Kublet images for possible future use * Docs: Show how to use alternative Kubelet images via snippets and a systemd dropin (builds on #737) Changes: * Update docs with changes to Kubelet image building * If you prefer to trust images built by Quay/Dockerhub, automated image builds are still available with unique tags (albeit with some limitations): * Quay automated builds are tagged `build-{short_sha}` (limit: only amd64) * Dockerhub automated builts are tagged `build-{tag}` and `build-master` (limit: only amd64, no shas) Links: * Kubelet: https://github.com/poseidon/kubelet * Docs: https://typhoon.psdn.io/topics/security/#container-images * Registries: * quay.io/poseidon/kubelet * docker.io/psdn/kubelet	2020-05-30 23:34:23 -07:00
Dalton Hubble	ba44408b76	Update Calico from v3.14.0 to v3.14.1 * https://docs.projectcalico.org/v3.14/release-notes/	2020-05-30 22:08:37 -07:00
Dalton Hubble	e72f916c8d	Update etcd from v3.4.8 to v3.4.9 * https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v349-2020-05-20	2020-05-22 00:52:20 -07:00
Dalton Hubble	ecae6679ff	Update Kubernetes from v1.18.2 to v1.18.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md	2020-05-20 20:37:39 -07:00
Dalton Hubble	4760543356	Set Kubelet image via kubelet.service KUBELET_IMAGE * Write the systemd kubelet.service to use `KUBELET_IMAGE` as the Kubelet. This provides a nice way to use systemd dropins to temporarily override the image (e.g. during a registry outage) Note: Only Typhoon Kubelet images and registries are supported.	2020-05-19 22:39:53 -07:00
Dalton Hubble	8d024d22ad	Update etcd from v3.4.7 to v3.4.8 * https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v348-2020-05-18	2020-05-18 23:50:46 -07:00
Dalton Hubble	ff4187a1fb	Use new Azure subnet to set address_prefixes list * Update Azure subnet `address_prefix` to `azure_prefixes` list * Fix warning that `address_prefix` is deprecated * Require `terraform-provider-azurerm` v2.8.0+ (action required) Rel: https://github.com/terraform-providers/terraform-provider-azurerm/pull/6493	2020-05-18 23:35:47 -07:00
Dalton Hubble	a18bd0a707	Highlight SELinux enforcing mode in features	2020-05-13 21:57:38 -07:00
Dalton Hubble	a2db4fa8c4	Update Calico from v3.13.3 to v3.14.0 * https://docs.projectcalico.org/v3.14/release-notes/	2020-05-09 16:05:30 -07:00
Dalton Hubble	358854e712	Fix Calico install-cni crash loop on Pod restarts * Set a consistent MCS level/range for Calico install-cni * Note: Rebooting a node was a workaround, because Kubelet relabels /etc/kubernetes(/cni/net.d) Background: * On SELinux enforcing systems, the Calico CNI install-cni container ran with default SELinux context and a random MCS pair. install-cni places CNI configs by first creating a temporary file and then moving them into place, which means the file MCS categories depend on the containers SELinux context. * calico-node Pod restarts creates a new install-cni container with a different MCS pair that cannot access the earlier written file (it places configs every time), causing the init container to error and calico-node to crash loop * https://github.com/projectcalico/cni-plugin/issues/874 ``` mv: inter-device move failed: '/calico.conf.tmp' to '/host/etc/cni/net.d/10-calico.conflist'; unable to remove target: Permission denied Failed to mv files. This may be caused by selinux configuration on the host, or something else. ``` Note, this isn't a host SELinux configuration issue. Related: * https://github.com/poseidon/terraform-render-bootstrap/pull/186	2020-05-09 16:01:44 -07:00
Dalton Hubble	fd044ee117	Enable Kubelet TLS bootstrap and NodeRestriction * Enable bootstrap token authentication on kube-apiserver * Generate the bootstrap.kubernetes.io/token Secret that may be used as a bootstrap token * Generate a bootstrap kubeconfig (with a bootstrap token) to be securely distributed to nodes. Each Kubelet will use the bootstrap kubeconfig to authenticate to kube-apiserver as `system:bootstrappers` and send a node-unique CSR for kube-controller-manager to automatically approve to issue a Kubelet certificate and kubeconfig (expires in 72 hours) * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the `system:node-bootstrapper` ClusterRole * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the csr nodeclient ClusterRole * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the csr selfnodeclient ClusterRole * Enable NodeRestriction admission controller to limit the scope of Node or Pod objects a Kubelet can modify to those of the node itself * Ability for a Kubelet to delete its Node object is retained as preemptible nodes or those in auto-scaling instance groups need to be able to remove themselves on shutdown. This need continues to have precedence over any risk of a node deleting itself maliciously Security notes: 1. Issued Kubelet certificates authenticate as user `system:node:NAME` and group `system:nodes` and are limited in their authorization to perform API operations by Node authorization and NodeRestriction admission. Previously, a Kubelet's authorization was broader. This is the primary security motivation. 2. The bootstrap kubeconfig credential has the same sensitivity as the previous generated TLS client-certificate kubeconfig. It must be distributed securely to nodes. Its compromise still allows an attacker to obtain a Kubelet kubeconfig 3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers a slight security improvement. * An attacker who obtains the kubeconfig can likely obtain the bootstrap kubeconfig as well, to obtain the ability to renew their access * A compromised bootstrap kubeconfig could plausibly be handled by replacing the bootstrap token Secret, distributing the token to new nodes, and expiration. Whereas a compromised TLS-client certificate kubeconfig can't be revoked (no CRL). However, replacing a bootstrap token can be impractical in real cluster environments, so the limited lifetime is mostly a theoretical benefit. * Cluster CSR objects are visible via kubectl which is nice 4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet clients have more identity information, which can improve the utility of audits and future features Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/ Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/185	2020-04-28 19:35:33 -07:00
Dalton Hubble	38a6bddd06	Update Calico from v3.13.1 to v3.13.3 * https://docs.projectcalico.org/v3.13/release-notes/	2020-04-23 23:58:02 -07:00
Dalton Hubble	d8966afdda	Remove extraneous sudo from layout asset unpacking	2020-04-22 20:28:01 -07:00
Dalton Hubble	feac94605a	Fix bootstrap mount to use shared volume SELinux label * Race: During initial bootstrap, static control plane pods could hang with Permission denied to bootstrap secrets. A manual fix involved restarting Kubelet, which relabeled mounts The race had no effect on subsequent reboots. * bootstrap.service runs podman with a private unshared mount of /etc/kubernetes/bootstrap-secrets which uses an SELinux MCS label with a category pair. However, bootstrap-secrets should be shared as its mounted by Docker pods kube-apiserver, kube-scheduler, and kube-controller-manager. Restarting Kubelet was a manual fix because Kubelet relabels all /etc/kubernetes * Fix bootstrap Pod to use the shared volume label, which leaves bootstrap-secrets files with SELinux level s0 without MCS * Also allow failed bootstrap.service to be re-applied. This was missing on bare-metal and AWS	2020-04-19 16:31:32 -07:00
Dalton Hubble	671eacb86e	Update Kubernetes from v1.18.1 to v1.18.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#changelog-since-v1181	2020-04-16 23:40:52 -07:00
Dalton Hubble	e2d4af43be	Fix Fedora CoreOS Azure MTU with Calico * With Calico VXLAN on Fedora CoreOS the 1450 MTU should be used	2020-04-12 23:20:04 -07:00
Dalton Hubble	5c4a3f73d5	Add support for Fedora CoreOS on Azure * Add `azure/fedora-coreos/kubernetes` module	2020-04-12 16:35:49 -07:00

23 Commits