typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-03 20:14:37 +02:00

Author	SHA1	Message	Date
Dalton Hubble	ab7913a061	Accept initial worker node labels and taints map on bare-metal * Add `worker_node_labels` map from node name to a list of initial node label strings * Add `worker_node_taints` map from node name to a list of initial node taint strings * Unlike cloud platforms, bare-metal node labels and taints are defined via a map from node name to list of labels/taints. Bare-metal clusters may have heterogeneous hardware so per node labels and taints are accepted * Only worker node names are allowed. Workloads are not scheduled on controller nodes so altering their labels/taints isn't suitable ``` module "mercury" { ... worker_node_labels = { "node2" = ["role=special"] } worker_node_taints = { "node2" = ["role=special:NoSchedule"] } } ``` Related: https://github.com/poseidon/typhoon/issues/429	2020-03-09 00:12:02 -07:00
Dalton Hubble	51cee6d5a4	Change Container Linux etcd-member to fetch with docker:// * Quay has historically generated ACI signatures for images to facilitate rkt's notions of verification (it allowed authors to actually sign images, though `--trust-keys-from-https` is in use since etcd and most authors don't sign images). OCI standardization didn't adopt verification ideas and checking signatures has fallen out of favor. * Fix an issue where Quay no longer seems to be generating ACI signatures for new images (e.g. quay.io/coreos/etcd:v.3.4.4) * Don't be alarmed by rkt `--insecure-options=image`. It refers to disabling image signature checking (i.e. docker pull doesn't check signatures either) * System containers for Kubelet and bootstrap have transitioned to the docker:// transport, so there is precedent and this brings all the system containers on Container Linux controllers into alignment	2020-03-02 19:57:45 -08:00
Dalton Hubble	6de5cf5a55	Update etcd from v3.4.3 to v3.4.4 * https://github.com/etcd-io/etcd/releases/tag/v3.4.4	2020-02-29 16:19:29 -08:00
Dalton Hubble	4a38fb5927	Update CoreDNS from v1.6.6 to v1.6.7 * https://coredns.io/2020/01/28/coredns-1.6.7-release/	2020-02-18 21:46:19 -08:00
Suraj Deshmukh	c4e64a9d1b	Change Kubelet /var/lib/calico mount to read-only (#643 ) * Kubelet only requires read access to /var/lib/calico Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>	2020-02-18 21:40:58 -08:00
Dalton Hubble	49d3b9e6b3	Set docker log driver to json-file on Fedora CoreOS * Fix the last minor issue for Fedora CoreOS clusters to pass CNCF's Kubernetes conformance tests * Kubelet supports a seldom used feature `kubectl logs --limit-bytes=N` to trim a log stream to a desired length. Kubelet handles this in the CRI driver. The Kubelet docker shim only supports the limit bytes feature when Docker is configured with the default `json-file` logging driver * CNCF conformance tests started requiring limit-bytes be supported, indirectly forcing the log driver choice until either the Kubelet or the conformance tests are fixed * Fedora CoreOS defaults Docker to use `journald` (desired). For now, as a workaround to offer conformant clusters, the log driver can be set back to `json-file`. RHEL CoreOS likely won't have noticed the non-conformance since its using crio runtime * https://github.com/kubernetes/kubernetes/issues/86367 Note: When upstream has a fix, the aim is to drop the docker config override and use the journald default	2020-02-11 23:00:38 -08:00
Dalton Hubble	1243f395d1	Update Kubernetes from v1.17.2 to v1.17.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1173	2020-02-11 20:22:14 -08:00
Dalton Hubble	846f11097f	Update Fedora CoreOS kernel arguments to align with upstream * Align bare-metal kernel arguments with upstream docs * Add missing initrd argument which can cause issues if not present. Fix #638 * Add tty0 and ttyS0 consoles (matches Container Linux) * Remove unused coreos.inst=yes Related: https://docs.fedoraproject.org/en-US/fedora-coreos/bare-metal/	2020-02-11 20:11:19 -08:00
Dalton Hubble	ca96a1335c	Update Calico from v3.11.2 to v3.12.0 * https://docs.projectcalico.org/release-notes/#v3120 * Remove reverse packet filter override, since Calico no longer relies on the setting * https://github.com/coreos/fedora-coreos-tracker/issues/219 * https://github.com/projectcalico/felix/pull/2189	2020-02-06 00:43:33 -08:00
Dalton Hubble	1cda5bcd2a	Update Kubernetes from v1.17.1 to v1.17.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1172	2020-01-21 18:27:39 -08:00
Dalton Hubble	dd930a2ff9	Update bare-metal Fedora CoreOS image location * Use Fedora CoreOS production download streams (change) * Use live PXE kernel and initramfs images * https://getfedora.org/coreos/download/ * Update docs example to use public images (cache is still recommended at large scale) and stable stream	2020-01-20 14:44:06 -08:00
Dalton Hubble	7daabd28b5	Update Calico from v3.11.1 to v3.11.2 * https://docs.projectcalico.org/v3.11/release-notes/	2020-01-18 13:45:24 -08:00
Dalton Hubble	b642e3b41b	Update Kubernetes from v1.17.0 to v1.17.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1171	2020-01-14 20:21:36 -08:00
Dalton Hubble	ce0569e03b	Remove unneeded Kubelet /var/run mount on Fedora CoreOS * /var/run symlinks to /run (already mounted)	2020-01-11 15:15:39 -08:00
Dalton Hubble	0223b31e1a	Ensure /etc/kubernetes exists following Kubelet inlining * Inlining the Kubelet service removed the need for the kubelet.env file declared in Ignition. However, on some platforms, this removed the guarantee that /etc/kubernetes exists. Bare-Metal and DigitalOcean distribute the kubelet kubeconfig through Terraform file provisioner (scp) and place it in (now missing) /etc/kubernetes * https://github.com/poseidon/typhoon/pull/606 * Fix bare-metal and DigitalOcean Ignition to ensure the desired directory exists following first boot from disk * Cloud platforms with worker pools distribute the kubeconfig through Ignition user data (no impact or need)	2020-01-06 21:38:20 -08:00
Dalton Hubble	43e05b9131	Enable kube-proxy metrics and allow Prometheus scrapes * Configure kube-proxy --metrics-bind-address=0.0.0.0 (default 127.0.0.1) to serve metrics on 0.0.0.0:10249 * Add firewall rules to allow Prometheus (resides on a worker) to scrape kube-proxy service endpoints on controllers or workers * Add a clusterIP: None service for kube-proxy endpoint discovery	2020-01-06 21:11:18 -08:00
Dalton Hubble	b2eb3e05d0	Disable Kubelet 127.0.0.1.10248 healthz endpoint * Kubelet runs a healthz server listening on 127.0.0.1:10248 by default. Its unused by Typhoon and can be disabled * https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/	2019-12-29 11:23:25 -08:00
Dalton Hubble	f1f4cd6fc0	Inline Container Linux kubelet.service, deprecate kubelet-wrapper * Change kubelet.service on Container Linux nodes to ExecStart Kubelet inline to replace the use of the host OS kubelet-wrapper script * Express rkt run flags and volume mounts in a clear, uniform way to make the Kubelet service easier to audit, manage, and understand * Eliminate reliance on a Container Linux kubelet-wrapper script * Typhoon for Fedora CoreOS developed a kubelet.service that similarly uses an inline ExecStart (except with podman instead of rkt) and a more minimal set of volume mounts. Adopt the volume improvements: * Change Kubelet /etc/kubernetes volume to read-only * Change Kubelet /etc/resolv.conf volume to read-only * Remove unneeded /var/lib/cni volume mount Background: * kubelet-wrapper was added in CoreOS around the time of Kubernetes v1.0 to simplify running a CoreOS-built hyperkube ACI image via rkt-fly. The script defaults are no longer ideal (e.g. rkt's notion of trust dates back to quay.io ACI image serving and signing, which informed the OCI standard images we use today, though they still lack rkt's signing ideas). * Shipping kubelet-wrapper was regretted at CoreOS, but remains in the distro for compatibility. The script is not updated to track hyperkube changes, but it is stable and kubelet.env overrides bridge most gaps * Typhoon Container Linux nodes have used kubelet-wrapper to rkt/rkt-fly run the Kubelet via the official k8s.gcr.io hyperkube image using overrides (new image registry, new image format, restart handling, new mounts, new entrypoint in v1.17). * Observation: Most of what it takes to run a Kubelet container is defined in Typhoon, not in kubelet-wrapper. The wrapper's value is now undermined by having to workaround its dated defaults. Typhoon may be better served defining Kubelet.service explicitly * Typhoon for Fedora CoreOS developed a kubelet.service without the use of a host OS kubelet-wrapper which is both clearer and eliminated some volume mounts	2019-12-29 11:17:26 -08:00
Dalton Hubble	50db3d0231	Rename CLC files and favor Terraform list index syntax * Rename Container Linux Config (CLC) files to .yaml to align with Fedora CoreOS Config (FCC) files and for syntax highlighting Replace common uses of Terraform `element` (which wraps around) with `list[index]` syntax to surface index errors	2019-12-28 12:14:01 -08:00
Dalton Hubble	11565ffa8a	Update Calico from v3.10.2 to v3.11.1 * https://docs.projectcalico.org/v3.11/release-notes/	2019-12-28 11:08:03 -08:00
Dalton Hubble	daa8d9d9ec	Update CoreDNS from v1.6.5 to v1.6.6 * https://coredns.io/2019/12/11/coredns-1.6.6-release/	2019-12-22 10:47:19 -05:00
Dalton Hubble	c0ce04e1de	Update Calico from v3.10.1 to v3.10.2 * https://docs.projectcalico.org/v3.10/release-notes/	2019-12-09 21:03:00 -08:00
Dalton Hubble	ed3550dce1	Update systemd services for the v0.17.x hyperkube * Binary asset locations within the upstream hyperkube image changed https://github.com/kubernetes/kubernetes/pull/84662 * Fix Container Linux and Flatcar Linux kubelet.service (rkt-fly with fairly dated CoreOS kubelet-wrapper) * Fix Fedora CoreOS kubelet.service (podman) * Fix Fedora CoreOS bootstrap.service * Fix delete-node kubectl usage for workers where nodes may delete themselves on shutdown (e.g. preemptible instances)	2019-12-09 18:39:17 -08:00
Dalton Hubble	de36d99afc	Update Kubernetes from v1.16.3 to v1.17.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md/#v1170	2019-12-09 18:31:58 -08:00
Dalton Hubble	4fce9485c8	Reduce kube-controller-manager pod eviction timeout from 5m to 1m * Reduce time to delete pods on unready nodes from 5m to 1m * Present since v1.13.3, but mistakenly removed in v1.16.0 static pod control plane migration Related: * https://github.com/poseidon/terraform-render-bootstrap/pull/148 * https://github.com/poseidon/terraform-render-bootstrap/pull/164	2019-12-08 22:58:31 -08:00
Dalton Hubble	d9c7a9e049	Add/update docs for asset_dir and kubeconfig usage * Original tutorials favored including the platform (e.g. google-cloud) in modules (e.g. google-cloud-yavin). Prefer naming conventions where each module / cluster has a simple name (e.g. yavin) since the platform is usually redundant * Retain the example cluster naming themes per platform	2019-12-05 22:56:42 -08:00
Dalton Hubble	2837275265	Introduce cluster creation without local writes to asset_dir * Allow generated assets (TLS materials, manifests) to be securely distributed to controller node(s) via file provisioner (i.e. ssh-agent) as an assets bundle file, rather than relying on assets being locally rendered to disk in an asset_dir and then securely distributed * Change `asset_dir` from required to optional. Left unset, asset_dir defaults to "" and no assets will be written to files on the machine that runs terraform apply * Enhancement: Managed cluster assets are kept only in Terraform state, which supports different backends (GCS, S3, etcd, etc) and optional encryption. terraform apply accesses state, runs in-memory, and distributes sensitive materials to controllers without making use of local disk (simplifies use in CI systems) * Enhancement: Improve asset unpack and layout process to position etcd certificates and control plane certificates more cleanly, without unneeded secret materials Details: * Terraform file provisioner support for distributing directories of contents (with unknown structure) has been limited to reading from a local directory, meaning local writes to asset_dir were required. https://github.com/poseidon/typhoon/issues/585 discusses the problem and newer or upcoming Terraform features that might help. * Observation: Terraform provisioner support for single files works well, but iteration isn't viable. We're also constrained to Terraform language features on the apply side (no extra plugins, no shelling out) and CoreOS / Fedora tools on the receive side. * Take a map representation of the contents that would have been splayed out in asset_dir and pack/encode them into a single file format devised for easy unpacking. Use an awk one-liner on the receive side to unpack. In pratice, this has worked well and its rather nice that a single assets file is transferred by file provisioner (all or none) Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162	2019-12-05 01:24:50 -08:00
Dalton Hubble	4b485a9bf2	Fix recent deletion of bootstrap module pinned SHA * Fix deletion of bootstrap module pinned SHA, which was introduced recently through an automation mistake creating https://github.com/poseidon/typhoon/pull/589	2019-11-21 22:34:09 -08:00
Dalton Hubble	8a9e8595ae	Fix terraform fmt formatting	2019-11-13 23:44:02 -08:00
Dalton Hubble	0e4ee5efc9	Add small CPU resource requests to static pods * Set small CPU requests on static pods kube-apiserver, kube-controller-manager, and kube-scheduler to align with upstream tooling and for edge cases * Effectively, a practical case for these requests hasn't been observed. However, a small static pod CPU request may offer a slight benefit if a controller became overloaded and the below mechanisms were insufficient Existing safeguards: * Control plane nodes are tainted to isolate them from ordinary workloads. Even dense workloads can only compress CPU resources on worker nodes. * Control plane static pods use the highest priority class, so contention favors control plane pods (over say node-exporter) and CPU is compressible too. See: https://github.com/poseidon/terraform-render-bootstrap/pull/161	2019-11-13 17:18:45 -08:00
Dalton Hubble	a271b9f340	Update CoreDNS from v1.6.2 to v1.6.5 * Add health `lameduck` option 5s. Before CoreDNS shuts down, it will wait and report unhealthy for 5s to allow time for plugins to shutdown cleanly * Minor bug fixes over a few releases * https://coredns.io/2019/08/31/coredns-1.6.3-release/ * https://coredns.io/2019/09/27/coredns-1.6.4-release/ * https://coredns.io/2019/11/05/coredns-1.6.5-release/	2019-11-13 16:47:44 -08:00
Dalton Hubble	cb0598e275	Adopt Terraform v0.12 templatefile function * Update terraform-render-bootstrap module to adopt the Terrform v0.12 templatefile function feature to replace the use of terraform-provider-template's `template_dir` * Require Terraform v0.12.6+ which adds `for_each` Background: * `template_dir` was added to `terraform-provider-template` to add support for template directory rendering in CoreOS Tectonic Kubernetes distribution (~2017) * Terraform v0.12 introduced a native `templatefile` function and v0.12.6 introduced native `for_each` support (July 2019) that makes it possible to replace `template_dir` usage	2019-11-13 16:33:36 -08:00
Dalton Hubble	d7061020ba	Update Kubernetes from v1.16.2 to v1.16.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163	2019-11-13 13:05:15 -08:00
Dalton Hubble	2c163503f1	Update etcd from v3.4.2 to v3.4.3 * etcd v3.4.3 builds with Go v1.12.12 instead of v1.12.9 and adds a few minor metrics fixes * https://github.com/etcd-io/etcd/compare/v3.4.2...v3.4.3	2019-11-07 11:41:01 -08:00
Dalton Hubble	0034a15711	Update Calico from v3.10.0 to v3.10.1 * https://docs.projectcalico.org/v3.10/release-notes/	2019-11-07 11:38:32 -08:00
Dalton Hubble	4775e9d0f7	Upgrade Calico v3.9.2 to v3.10.0 * Allow advertising Kubernetes service ClusterIPs to BGPPeer routers via a BGPConfiguration * Improve EdgeRouter docs about routes and BGP * https://docs.projectcalico.org/v3.10/release-notes/ * https://docs.projectcalico.org/v3.10/networking/advertise-service-ips	2019-10-27 14:13:41 -07:00
Dalton Hubble	d418045929	Switch kube-proxy from iptables mode to ipvs mode * Kubernetes v1.11 considered kube-proxy IPVS mode GA * Many problems were found #321 * Since then, major blockers seem to have been addressed	2019-10-27 00:37:41 -07:00
Dalton Hubble	24fc440d83	Update Kubernetes from v1.16.1 to v1.16.2 * Update Calico from v3.9.1 to v3.9.2	2019-10-15 22:42:52 -07:00
Dalton Hubble	a6702573a2	Update etcd from v3.4.1 to v3.4.2 * https://github.com/etcd-io/etcd/releases/tag/v3.4.2	2019-10-15 00:06:15 -07:00
Dalton Hubble	d874bdd17d	Update bootstrap module control plane manifests and type constraints * Remove unneeded control plane flags that correspond to defaults * Adopt Terraform v0.12 type constraints in bootstrap module	2019-10-06 21:09:30 -07:00
Dalton Hubble	5b9dab6659	Introduce list of detail objects for bare-metal machines * Define bare-metal `controllers` and `workers` as a complex type list(object{name=string, mac=string, domain=string}) to allow clusters with many machines to be defined more cleanly * Remove `controller_names` list variable * Remove `controller_macs` list variable * Remove `controller_domains` list variable * Remove `worker_names` list variable * Remove `worker_macs` list variable * Remove `worker_domains` list variable	2019-10-06 20:22:45 -07:00
Dalton Hubble	15c4b793c3	Use new Fedora CoreOS kernel/initrd/raw asset names * Fedora CoreOS changed the kernel, initramfs, and raw image asset download paths and names in 30.20191002.0	2019-10-06 17:31:21 -07:00
Dalton Hubble	36ed53924f	Add stricter types for bare-metal modules * Review variables available in bare-metal kubernetes modules for Container Linux and Fedora CoreOS * Deprecate cluster_domain_suffix variable * Remove deprecated container_linux_oem variable	2019-10-06 17:18:50 -07:00
Dalton Hubble	1c5ed84fc2	Update Kubernetes from v1.16.0 to v1.16.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161	2019-10-02 21:31:55 -07:00
Dalton Hubble	a6de245d8a	Rename bootkube.tf to bootstrap.tf * Typhoon no longer uses the bootkube project	2019-09-29 11:30:49 -07:00
Dalton Hubble	96afa6a531	Update Calico from v3.8.2 to v3.9.1 * https://docs.projectcalico.org/v3.9/release-notes/	2019-09-29 11:22:53 -07:00
Dalton Hubble	3e34fb075b	Update etcd from v3.4.0 to v3.4.1 * https://github.com/etcd-io/etcd/releases/tag/v3.4.1	2019-09-28 15:09:57 -07:00
Dalton Hubble	8703f2c3c5	Fix missing comma separator on bare-metal and DO * Introduced in bare-metal and DigitalOcean in #544 while addressing possible ordering race, but after the v1.16 upgrade validation	2019-09-23 11:05:26 -07:00
Dalton Hubble	5b06e0e869	Organize and cleanup Kubelet ExecStartPre * Sort Kubelet ExecStartPre mkdir commands * Remove unused inactive-manifests and checkpoint-secrets directories (were used by bootkube self-hosting)	2019-09-19 00:38:34 -07:00
Dalton Hubble	b951aca66f	Create /etc/kubernetes/manifests before asset copy * Fix issue (present since bootkube->bootstrap switch) where controller asset copy could fail if /etc/kubernetes/manifests wasn't created in time on platforms using path activation for the Kubelet (observed on DigitalOcean, also possible on bare-metal)	2019-09-19 00:30:53 -07:00

1 2 3 4 5 ...

347 Commits