typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-04 12:34:37 +02:00

Author	SHA1	Message	Date
Dalton Hubble	43e05b9131	Enable kube-proxy metrics and allow Prometheus scrapes * Configure kube-proxy --metrics-bind-address=0.0.0.0 (default 127.0.0.1) to serve metrics on 0.0.0.0:10249 * Add firewall rules to allow Prometheus (resides on a worker) to scrape kube-proxy service endpoints on controllers or workers * Add a clusterIP: None service for kube-proxy endpoint discovery	2020-01-06 21:11:18 -08:00
Dalton Hubble	b2eb3e05d0	Disable Kubelet 127.0.0.1.10248 healthz endpoint * Kubelet runs a healthz server listening on 127.0.0.1:10248 by default. Its unused by Typhoon and can be disabled * https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/	2019-12-29 11:23:25 -08:00
Dalton Hubble	f1f4cd6fc0	Inline Container Linux kubelet.service, deprecate kubelet-wrapper * Change kubelet.service on Container Linux nodes to ExecStart Kubelet inline to replace the use of the host OS kubelet-wrapper script * Express rkt run flags and volume mounts in a clear, uniform way to make the Kubelet service easier to audit, manage, and understand * Eliminate reliance on a Container Linux kubelet-wrapper script * Typhoon for Fedora CoreOS developed a kubelet.service that similarly uses an inline ExecStart (except with podman instead of rkt) and a more minimal set of volume mounts. Adopt the volume improvements: * Change Kubelet /etc/kubernetes volume to read-only * Change Kubelet /etc/resolv.conf volume to read-only * Remove unneeded /var/lib/cni volume mount Background: * kubelet-wrapper was added in CoreOS around the time of Kubernetes v1.0 to simplify running a CoreOS-built hyperkube ACI image via rkt-fly. The script defaults are no longer ideal (e.g. rkt's notion of trust dates back to quay.io ACI image serving and signing, which informed the OCI standard images we use today, though they still lack rkt's signing ideas). * Shipping kubelet-wrapper was regretted at CoreOS, but remains in the distro for compatibility. The script is not updated to track hyperkube changes, but it is stable and kubelet.env overrides bridge most gaps * Typhoon Container Linux nodes have used kubelet-wrapper to rkt/rkt-fly run the Kubelet via the official k8s.gcr.io hyperkube image using overrides (new image registry, new image format, restart handling, new mounts, new entrypoint in v1.17). * Observation: Most of what it takes to run a Kubelet container is defined in Typhoon, not in kubelet-wrapper. The wrapper's value is now undermined by having to workaround its dated defaults. Typhoon may be better served defining Kubelet.service explicitly * Typhoon for Fedora CoreOS developed a kubelet.service without the use of a host OS kubelet-wrapper which is both clearer and eliminated some volume mounts	2019-12-29 11:17:26 -08:00
Dalton Hubble	50db3d0231	Rename CLC files and favor Terraform list index syntax * Rename Container Linux Config (CLC) files to .yaml to align with Fedora CoreOS Config (FCC) files and for syntax highlighting Replace common uses of Terraform `element` (which wraps around) with `list[index]` syntax to surface index errors	2019-12-28 12:14:01 -08:00
Dalton Hubble	11565ffa8a	Update Calico from v3.10.2 to v3.11.1 * https://docs.projectcalico.org/v3.11/release-notes/	2019-12-28 11:08:03 -08:00
Dalton Hubble	daa8d9d9ec	Update CoreDNS from v1.6.5 to v1.6.6 * https://coredns.io/2019/12/11/coredns-1.6.6-release/	2019-12-22 10:47:19 -05:00
Dalton Hubble	c0ce04e1de	Update Calico from v3.10.1 to v3.10.2 * https://docs.projectcalico.org/v3.10/release-notes/	2019-12-09 21:03:00 -08:00
Dalton Hubble	ed3550dce1	Update systemd services for the v0.17.x hyperkube * Binary asset locations within the upstream hyperkube image changed https://github.com/kubernetes/kubernetes/pull/84662 * Fix Container Linux and Flatcar Linux kubelet.service (rkt-fly with fairly dated CoreOS kubelet-wrapper) * Fix Fedora CoreOS kubelet.service (podman) * Fix Fedora CoreOS bootstrap.service * Fix delete-node kubectl usage for workers where nodes may delete themselves on shutdown (e.g. preemptible instances)	2019-12-09 18:39:17 -08:00
Dalton Hubble	de36d99afc	Update Kubernetes from v1.16.3 to v1.17.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md/#v1170	2019-12-09 18:31:58 -08:00
Dalton Hubble	4fce9485c8	Reduce kube-controller-manager pod eviction timeout from 5m to 1m * Reduce time to delete pods on unready nodes from 5m to 1m * Present since v1.13.3, but mistakenly removed in v1.16.0 static pod control plane migration Related: * https://github.com/poseidon/terraform-render-bootstrap/pull/148 * https://github.com/poseidon/terraform-render-bootstrap/pull/164	2019-12-08 22:58:31 -08:00
Dalton Hubble	d9c7a9e049	Add/update docs for asset_dir and kubeconfig usage * Original tutorials favored including the platform (e.g. google-cloud) in modules (e.g. google-cloud-yavin). Prefer naming conventions where each module / cluster has a simple name (e.g. yavin) since the platform is usually redundant * Retain the example cluster naming themes per platform	2019-12-05 22:56:42 -08:00
Dalton Hubble	2837275265	Introduce cluster creation without local writes to asset_dir * Allow generated assets (TLS materials, manifests) to be securely distributed to controller node(s) via file provisioner (i.e. ssh-agent) as an assets bundle file, rather than relying on assets being locally rendered to disk in an asset_dir and then securely distributed * Change `asset_dir` from required to optional. Left unset, asset_dir defaults to "" and no assets will be written to files on the machine that runs terraform apply * Enhancement: Managed cluster assets are kept only in Terraform state, which supports different backends (GCS, S3, etcd, etc) and optional encryption. terraform apply accesses state, runs in-memory, and distributes sensitive materials to controllers without making use of local disk (simplifies use in CI systems) * Enhancement: Improve asset unpack and layout process to position etcd certificates and control plane certificates more cleanly, without unneeded secret materials Details: * Terraform file provisioner support for distributing directories of contents (with unknown structure) has been limited to reading from a local directory, meaning local writes to asset_dir were required. https://github.com/poseidon/typhoon/issues/585 discusses the problem and newer or upcoming Terraform features that might help. * Observation: Terraform provisioner support for single files works well, but iteration isn't viable. We're also constrained to Terraform language features on the apply side (no extra plugins, no shelling out) and CoreOS / Fedora tools on the receive side. * Take a map representation of the contents that would have been splayed out in asset_dir and pack/encode them into a single file format devised for easy unpacking. Use an awk one-liner on the receive side to unpack. In pratice, this has worked well and its rather nice that a single assets file is transferred by file provisioner (all or none) Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162	2019-12-05 01:24:50 -08:00
Dalton Hubble	4b485a9bf2	Fix recent deletion of bootstrap module pinned SHA * Fix deletion of bootstrap module pinned SHA, which was introduced recently through an automation mistake creating https://github.com/poseidon/typhoon/pull/589	2019-11-21 22:34:09 -08:00
Dalton Hubble	8a9e8595ae	Fix terraform fmt formatting	2019-11-13 23:44:02 -08:00
Dalton Hubble	0e4ee5efc9	Add small CPU resource requests to static pods * Set small CPU requests on static pods kube-apiserver, kube-controller-manager, and kube-scheduler to align with upstream tooling and for edge cases * Effectively, a practical case for these requests hasn't been observed. However, a small static pod CPU request may offer a slight benefit if a controller became overloaded and the below mechanisms were insufficient Existing safeguards: * Control plane nodes are tainted to isolate them from ordinary workloads. Even dense workloads can only compress CPU resources on worker nodes. * Control plane static pods use the highest priority class, so contention favors control plane pods (over say node-exporter) and CPU is compressible too. See: https://github.com/poseidon/terraform-render-bootstrap/pull/161	2019-11-13 17:18:45 -08:00
Dalton Hubble	a271b9f340	Update CoreDNS from v1.6.2 to v1.6.5 * Add health `lameduck` option 5s. Before CoreDNS shuts down, it will wait and report unhealthy for 5s to allow time for plugins to shutdown cleanly * Minor bug fixes over a few releases * https://coredns.io/2019/08/31/coredns-1.6.3-release/ * https://coredns.io/2019/09/27/coredns-1.6.4-release/ * https://coredns.io/2019/11/05/coredns-1.6.5-release/	2019-11-13 16:47:44 -08:00
Dalton Hubble	cb0598e275	Adopt Terraform v0.12 templatefile function * Update terraform-render-bootstrap module to adopt the Terrform v0.12 templatefile function feature to replace the use of terraform-provider-template's `template_dir` * Require Terraform v0.12.6+ which adds `for_each` Background: * `template_dir` was added to `terraform-provider-template` to add support for template directory rendering in CoreOS Tectonic Kubernetes distribution (~2017) * Terraform v0.12 introduced a native `templatefile` function and v0.12.6 introduced native `for_each` support (July 2019) that makes it possible to replace `template_dir` usage	2019-11-13 16:33:36 -08:00
Dalton Hubble	d7061020ba	Update Kubernetes from v1.16.2 to v1.16.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163	2019-11-13 13:05:15 -08:00
Dalton Hubble	2c163503f1	Update etcd from v3.4.2 to v3.4.3 * etcd v3.4.3 builds with Go v1.12.12 instead of v1.12.9 and adds a few minor metrics fixes * https://github.com/etcd-io/etcd/compare/v3.4.2...v3.4.3	2019-11-07 11:41:01 -08:00
Dalton Hubble	0034a15711	Update Calico from v3.10.0 to v3.10.1 * https://docs.projectcalico.org/v3.10/release-notes/	2019-11-07 11:38:32 -08:00
Dalton Hubble	4775e9d0f7	Upgrade Calico v3.9.2 to v3.10.0 * Allow advertising Kubernetes service ClusterIPs to BGPPeer routers via a BGPConfiguration * Improve EdgeRouter docs about routes and BGP * https://docs.projectcalico.org/v3.10/release-notes/ * https://docs.projectcalico.org/v3.10/networking/advertise-service-ips	2019-10-27 14:13:41 -07:00
Dalton Hubble	d418045929	Switch kube-proxy from iptables mode to ipvs mode * Kubernetes v1.11 considered kube-proxy IPVS mode GA * Many problems were found #321 * Since then, major blockers seem to have been addressed	2019-10-27 00:37:41 -07:00
Dalton Hubble	24fc440d83	Update Kubernetes from v1.16.1 to v1.16.2 * Update Calico from v3.9.1 to v3.9.2	2019-10-15 22:42:52 -07:00
Dalton Hubble	a6702573a2	Update etcd from v3.4.1 to v3.4.2 * https://github.com/etcd-io/etcd/releases/tag/v3.4.2	2019-10-15 00:06:15 -07:00
Dalton Hubble	d874bdd17d	Update bootstrap module control plane manifests and type constraints * Remove unneeded control plane flags that correspond to defaults * Adopt Terraform v0.12 type constraints in bootstrap module	2019-10-06 21:09:30 -07:00
Dalton Hubble	5b9dab6659	Introduce list of detail objects for bare-metal machines * Define bare-metal `controllers` and `workers` as a complex type list(object{name=string, mac=string, domain=string}) to allow clusters with many machines to be defined more cleanly * Remove `controller_names` list variable * Remove `controller_macs` list variable * Remove `controller_domains` list variable * Remove `worker_names` list variable * Remove `worker_macs` list variable * Remove `worker_domains` list variable	2019-10-06 20:22:45 -07:00
Dalton Hubble	15c4b793c3	Use new Fedora CoreOS kernel/initrd/raw asset names * Fedora CoreOS changed the kernel, initramfs, and raw image asset download paths and names in 30.20191002.0	2019-10-06 17:31:21 -07:00
Dalton Hubble	36ed53924f	Add stricter types for bare-metal modules * Review variables available in bare-metal kubernetes modules for Container Linux and Fedora CoreOS * Deprecate cluster_domain_suffix variable * Remove deprecated container_linux_oem variable	2019-10-06 17:18:50 -07:00
Dalton Hubble	1c5ed84fc2	Update Kubernetes from v1.16.0 to v1.16.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161	2019-10-02 21:31:55 -07:00
Dalton Hubble	a6de245d8a	Rename bootkube.tf to bootstrap.tf * Typhoon no longer uses the bootkube project	2019-09-29 11:30:49 -07:00
Dalton Hubble	96afa6a531	Update Calico from v3.8.2 to v3.9.1 * https://docs.projectcalico.org/v3.9/release-notes/	2019-09-29 11:22:53 -07:00
Dalton Hubble	3e34fb075b	Update etcd from v3.4.0 to v3.4.1 * https://github.com/etcd-io/etcd/releases/tag/v3.4.1	2019-09-28 15:09:57 -07:00
Dalton Hubble	8703f2c3c5	Fix missing comma separator on bare-metal and DO * Introduced in bare-metal and DigitalOcean in #544 while addressing possible ordering race, but after the v1.16 upgrade validation	2019-09-23 11:05:26 -07:00
Dalton Hubble	5b06e0e869	Organize and cleanup Kubelet ExecStartPre * Sort Kubelet ExecStartPre mkdir commands * Remove unused inactive-manifests and checkpoint-secrets directories (were used by bootkube self-hosting)	2019-09-19 00:38:34 -07:00
Dalton Hubble	b951aca66f	Create /etc/kubernetes/manifests before asset copy * Fix issue (present since bootkube->bootstrap switch) where controller asset copy could fail if /etc/kubernetes/manifests wasn't created in time on platforms using path activation for the Kubelet (observed on DigitalOcean, also possible on bare-metal)	2019-09-19 00:30:53 -07:00
Dalton Hubble	9da3725738	Update Kubernetes from v1.15.3 to v1.16.0 * Drop `node-role.kubernetes.io/master` and `node-role.kubernetes.io/node` node labels * Kubelet (v1.16) now rejects the node labels used in the kubectl get nodes ROLES output * https://github.com/kubernetes/kubernetes/issues/75457	2019-09-18 22:53:06 -07:00
Dalton Hubble	fd12f3612b	Rename CA organization from bootkube to typhoon * Rename the organization in generated CA certificates from bootkube to typhoon. Avoid confusion with the bootkube project * https://github.com/poseidon/terraform-render-bootstrap/pull/149	2019-09-14 16:56:53 -07:00
Dalton Hubble	96b646cf6d	Rename bootkube modules to bootstrap * Rename render module from bootkube to bootstrap. Avoid confusion with the kubernetes-incubator/bootkube tool since it is no longer used * Use the poseidon/terraform-render-bootstrap Terraform module (formerly poseidon/terraform-render-bootkube) * https://github.com/poseidon/terraform-render-bootkube/pull/149	2019-09-14 16:24:32 -07:00
Dalton Hubble	b15c60fa2f	Update CHANGES for control plane static pod switch * Remove old references to bootkube / self-hosted	2019-09-09 22:48:48 -07:00
Dalton Hubble	db947537d1	Migrate GCP, DO, Azure to static pod control plane * Run a kube-apiserver, kube-scheduler, and kube-controller-manager static pod on each controller node. Previously, kube-apiserver was self-hosted as a DaemonSet across controllers and kube-scheduler and kube-controller-manager were a Deployment (with 2 or controller_count many replicas). * Remove bootkube bootstrap and pivot to self-hosted * Remove pod-checkpointer manifests (no longer needed)	2019-09-09 22:37:31 -07:00
Dalton Hubble	21632c6674	Migrate Container Linux bare-metal to static pod control plane * Run a kube-apiserver, kube-scheduler, and kube-controller-manager static pod on each controller node. Previously, kube-apiserver was self-hosted as a DaemonSet across controllers and kube-scheduler and kube-controller-manager were a Deployment (with 2 or controller_count many replicas). * Remove bootkube bootstrap and pivot to self-hosted * Remove pod-checkpointer manifests (no longer needed)	2019-09-09 22:37:31 -07:00
Dalton Hubble	74780fb09f	Migrate Fedora CoreOS bare-metal to static pod control plane * Run a kube-apiserver, kube-scheduler, and kube-controller-manager static pod on each controller node. Previously, kube-apiserver was self-hosted as a DaemonSet across controllers and kube-scheduler and kube-controller-manager were a Deployment (with 2 or controller_count many replicas). * Remove bootkube bootstrap and pivot to self-hosted * Remove pod-checkpointer manifests (no longer needed)	2019-09-09 22:37:31 -07:00
Dalton Hubble	c20683067d	Update etcd from v3.3.15 to v3.4.0 * https://github.com/etcd-io/etcd/releases/tag/v3.4.0	2019-09-08 15:32:49 -07:00
Dalton Hubble	e8d586f3b3	Enable QoS on Fedora CoreOS controllers * Kubelet race should be fixed in Kubernetes v1.15.1 * https://github.com/kubernetes/kubernetes/issues/79046 * Reverts temporary mitigation https://github.com/poseidon/typhoon/pull/515	2019-09-04 21:09:45 -07:00
Dalton Hubble	4d5f962d76	Update CoreDNS from v1.5.0 to v1.6.2 * https://coredns.io/2019/06/26/coredns-1.5.1-release/ * https://coredns.io/2019/07/03/coredns-1.5.2-release/ * https://coredns.io/2019/07/28/coredns-1.6.0-release/ * https://coredns.io/2019/08/02/coredns-1.6.1-release/ * https://coredns.io/2019/08/13/coredns-1.6.2-release/	2019-08-31 15:57:42 -07:00
Dalton Hubble	c42139beaa	Update etcd from v3.3.14 to v3.3.15 * No functional changes, just changes to vendoring tools (go modules -> glide). Still, update to v3.3.15 anyway * https://github.com/etcd-io/etcd/compare/v3.3.14...v3.3.15	2019-08-19 15:05:21 -07:00
Dalton Hubble	35c2763ab0	Update Kubernetes from v1.15.2 to v1.15.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md/#v1153	2019-08-19 14:49:24 -07:00
Dalton Hubble	8f412e2f09	Update etcd from v3.3.13 to v3.3.14 * https://github.com/etcd-io/etcd/releases/tag/v3.3.14	2019-08-18 21:05:06 -07:00
Dalton Hubble	3c3708d58e	Update Calico from v3.8.1 to v3.8.2 * https://docs.projectcalico.org/v3.8/release-notes/	2019-08-16 15:38:23 -07:00
Dalton Hubble	2227f2cc62	Update Kubernetes from v1.15.1 to v1.15.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1152	2019-08-05 08:48:57 -07:00

1 2 3 4 5 ...

382 Commits