typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-07 12:44:36 +02:00

Author	SHA1	Message	Date
Dalton Hubble	4fce9485c8	Reduce kube-controller-manager pod eviction timeout from 5m to 1m * Reduce time to delete pods on unready nodes from 5m to 1m * Present since v1.13.3, but mistakenly removed in v1.16.0 static pod control plane migration Related: * https://github.com/poseidon/terraform-render-bootstrap/pull/148 * https://github.com/poseidon/terraform-render-bootstrap/pull/164	2019-12-08 22:58:31 -08:00
Dalton Hubble	d9c7a9e049	Add/update docs for asset_dir and kubeconfig usage * Original tutorials favored including the platform (e.g. google-cloud) in modules (e.g. google-cloud-yavin). Prefer naming conventions where each module / cluster has a simple name (e.g. yavin) since the platform is usually redundant * Retain the example cluster naming themes per platform	2019-12-05 22:56:42 -08:00
Dalton Hubble	2837275265	Introduce cluster creation without local writes to asset_dir * Allow generated assets (TLS materials, manifests) to be securely distributed to controller node(s) via file provisioner (i.e. ssh-agent) as an assets bundle file, rather than relying on assets being locally rendered to disk in an asset_dir and then securely distributed * Change `asset_dir` from required to optional. Left unset, asset_dir defaults to "" and no assets will be written to files on the machine that runs terraform apply * Enhancement: Managed cluster assets are kept only in Terraform state, which supports different backends (GCS, S3, etcd, etc) and optional encryption. terraform apply accesses state, runs in-memory, and distributes sensitive materials to controllers without making use of local disk (simplifies use in CI systems) * Enhancement: Improve asset unpack and layout process to position etcd certificates and control plane certificates more cleanly, without unneeded secret materials Details: * Terraform file provisioner support for distributing directories of contents (with unknown structure) has been limited to reading from a local directory, meaning local writes to asset_dir were required. https://github.com/poseidon/typhoon/issues/585 discusses the problem and newer or upcoming Terraform features that might help. * Observation: Terraform provisioner support for single files works well, but iteration isn't viable. We're also constrained to Terraform language features on the apply side (no extra plugins, no shelling out) and CoreOS / Fedora tools on the receive side. * Take a map representation of the contents that would have been splayed out in asset_dir and pack/encode them into a single file format devised for easy unpacking. Use an awk one-liner on the receive side to unpack. In pratice, this has worked well and its rather nice that a single assets file is transferred by file provisioner (all or none) Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162	2019-12-05 01:24:50 -08:00
Dalton Hubble	4b485a9bf2	Fix recent deletion of bootstrap module pinned SHA * Fix deletion of bootstrap module pinned SHA, which was introduced recently through an automation mistake creating https://github.com/poseidon/typhoon/pull/589	2019-11-21 22:34:09 -08:00
Dalton Hubble	8a9e8595ae	Fix terraform fmt formatting	2019-11-13 23:44:02 -08:00
Dalton Hubble	0e4ee5efc9	Add small CPU resource requests to static pods * Set small CPU requests on static pods kube-apiserver, kube-controller-manager, and kube-scheduler to align with upstream tooling and for edge cases * Effectively, a practical case for these requests hasn't been observed. However, a small static pod CPU request may offer a slight benefit if a controller became overloaded and the below mechanisms were insufficient Existing safeguards: * Control plane nodes are tainted to isolate them from ordinary workloads. Even dense workloads can only compress CPU resources on worker nodes. * Control plane static pods use the highest priority class, so contention favors control plane pods (over say node-exporter) and CPU is compressible too. See: https://github.com/poseidon/terraform-render-bootstrap/pull/161	2019-11-13 17:18:45 -08:00
Dalton Hubble	a271b9f340	Update CoreDNS from v1.6.2 to v1.6.5 * Add health `lameduck` option 5s. Before CoreDNS shuts down, it will wait and report unhealthy for 5s to allow time for plugins to shutdown cleanly * Minor bug fixes over a few releases * https://coredns.io/2019/08/31/coredns-1.6.3-release/ * https://coredns.io/2019/09/27/coredns-1.6.4-release/ * https://coredns.io/2019/11/05/coredns-1.6.5-release/	2019-11-13 16:47:44 -08:00
Dalton Hubble	cb0598e275	Adopt Terraform v0.12 templatefile function * Update terraform-render-bootstrap module to adopt the Terrform v0.12 templatefile function feature to replace the use of terraform-provider-template's `template_dir` * Require Terraform v0.12.6+ which adds `for_each` Background: * `template_dir` was added to `terraform-provider-template` to add support for template directory rendering in CoreOS Tectonic Kubernetes distribution (~2017) * Terraform v0.12 introduced a native `templatefile` function and v0.12.6 introduced native `for_each` support (July 2019) that makes it possible to replace `template_dir` usage	2019-11-13 16:33:36 -08:00
Dalton Hubble	d7061020ba	Update Kubernetes from v1.16.2 to v1.16.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163	2019-11-13 13:05:15 -08:00
Dalton Hubble	2c163503f1	Update etcd from v3.4.2 to v3.4.3 * etcd v3.4.3 builds with Go v1.12.12 instead of v1.12.9 and adds a few minor metrics fixes * https://github.com/etcd-io/etcd/compare/v3.4.2...v3.4.3	2019-11-07 11:41:01 -08:00
Dalton Hubble	0034a15711	Update Calico from v3.10.0 to v3.10.1 * https://docs.projectcalico.org/v3.10/release-notes/	2019-11-07 11:38:32 -08:00
Dalton Hubble	4775e9d0f7	Upgrade Calico v3.9.2 to v3.10.0 * Allow advertising Kubernetes service ClusterIPs to BGPPeer routers via a BGPConfiguration * Improve EdgeRouter docs about routes and BGP * https://docs.projectcalico.org/v3.10/release-notes/ * https://docs.projectcalico.org/v3.10/networking/advertise-service-ips	2019-10-27 14:13:41 -07:00
Dalton Hubble	d418045929	Switch kube-proxy from iptables mode to ipvs mode * Kubernetes v1.11 considered kube-proxy IPVS mode GA * Many problems were found #321 * Since then, major blockers seem to have been addressed	2019-10-27 00:37:41 -07:00
Dalton Hubble	24fc440d83	Update Kubernetes from v1.16.1 to v1.16.2 * Update Calico from v3.9.1 to v3.9.2	2019-10-15 22:42:52 -07:00
Dalton Hubble	a6702573a2	Update etcd from v3.4.1 to v3.4.2 * https://github.com/etcd-io/etcd/releases/tag/v3.4.2	2019-10-15 00:06:15 -07:00
Dalton Hubble	d874bdd17d	Update bootstrap module control plane manifests and type constraints * Remove unneeded control plane flags that correspond to defaults * Adopt Terraform v0.12 type constraints in bootstrap module	2019-10-06 21:09:30 -07:00
Dalton Hubble	5ef4155e08	Detect most recent Fedora CoreOS AMI in region * Detect the most recent Fedora CoreOS AMI to allow usage of Fedora CoreOS in supported regions (previously just us-east-1) * Unpin the Fedora CoreOS AMI image which was pinned to images that had been checked. This does mean if Fedora publishes a broken image, it will be selected * Filter out "dev" images which have similar naming	2019-10-06 18:13:55 -07:00
Dalton Hubble	1c5ed84fc2	Update Kubernetes from v1.16.0 to v1.16.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161	2019-10-02 21:31:55 -07:00
Dalton Hubble	78bfff0afe	Update Fedora CoreOS to testing 30.20190905.0 * Fix duplicated cluster_domain_suffix variable	2019-09-29 11:34:31 -07:00
Dalton Hubble	a6de245d8a	Rename bootkube.tf to bootstrap.tf * Typhoon no longer uses the bootkube project	2019-09-29 11:30:49 -07:00
Dalton Hubble	96afa6a531	Update Calico from v3.8.2 to v3.9.1 * https://docs.projectcalico.org/v3.9/release-notes/	2019-09-29 11:22:53 -07:00
Dalton Hubble	a407ff72df	Add stricter types for AWS modules and update docs * Review variables available in AWS kubernetes and workers modules and documentation * Switching between spot and on-demand has worked since Terraform v0.12 * Generally, there are too many knobs. Less useful ones should be de-emphasized or removed * Remove `cluster_domain_suffix` documentation	2019-09-29 11:19:38 -07:00
Dalton Hubble	3e34fb075b	Update etcd from v3.4.0 to v3.4.1 * https://github.com/etcd-io/etcd/releases/tag/v3.4.1	2019-09-28 15:09:57 -07:00
Dalton Hubble	9bfb1c5faf	Update docs and variable types for worker node_labels * Document worker pools `node_labels` variable to set the initial node labels for a homogeneous set of workers * Document `worker_node_labels` convenience variable to set the initial node labels for default worker nodes	2019-09-28 15:05:12 -07:00
Valer Cara	99ab81f79c	Add node_labels variable in workers modules to set initial node labels (#550 ) * Also add `worker_node_labels` variable in `kubernetes` modules to set initial node labels for the default workers	2019-09-28 14:59:24 -07:00
Dalton Hubble	5b06e0e869	Organize and cleanup Kubelet ExecStartPre * Sort Kubelet ExecStartPre mkdir commands * Remove unused inactive-manifests and checkpoint-secrets directories (were used by bootkube self-hosting)	2019-09-19 00:38:34 -07:00
Dalton Hubble	b951aca66f	Create /etc/kubernetes/manifests before asset copy * Fix issue (present since bootkube->bootstrap switch) where controller asset copy could fail if /etc/kubernetes/manifests wasn't created in time on platforms using path activation for the Kubelet (observed on DigitalOcean, also possible on bare-metal)	2019-09-19 00:30:53 -07:00
Dalton Hubble	9da3725738	Update Kubernetes from v1.15.3 to v1.16.0 * Drop `node-role.kubernetes.io/master` and `node-role.kubernetes.io/node` node labels * Kubelet (v1.16) now rejects the node labels used in the kubectl get nodes ROLES output * https://github.com/kubernetes/kubernetes/issues/75457	2019-09-18 22:53:06 -07:00
Dalton Hubble	fd12f3612b	Rename CA organization from bootkube to typhoon * Rename the organization in generated CA certificates from bootkube to typhoon. Avoid confusion with the bootkube project * https://github.com/poseidon/terraform-render-bootstrap/pull/149	2019-09-14 16:56:53 -07:00
Dalton Hubble	96b646cf6d	Rename bootkube modules to bootstrap * Rename render module from bootkube to bootstrap. Avoid confusion with the kubernetes-incubator/bootkube tool since it is no longer used * Use the poseidon/terraform-render-bootstrap Terraform module (formerly poseidon/terraform-render-bootkube) * https://github.com/poseidon/terraform-render-bootkube/pull/149	2019-09-14 16:24:32 -07:00
Dalton Hubble	b15c60fa2f	Update CHANGES for control plane static pod switch * Remove old references to bootkube / self-hosted	2019-09-09 22:48:48 -07:00
Dalton Hubble	c933bdfc26	Migrate Container Linux AWS to static pod control plane * Run a kube-apiserver, kube-scheduler, and kube-controller-manager static pod on each controller node. Previously, kube-apiserver was self-hosted as a DaemonSet across controllers and kube-scheduler and kube-controller-manager were a Deployment (with 2 or controller_count many replicas). * Remove bootkube bootstrap and pivot to self-hosted * Remove pod-checkpointer manifests (no longer needed)	2019-09-09 22:37:31 -07:00
Dalton Hubble	74780fb09f	Migrate Fedora CoreOS bare-metal to static pod control plane * Run a kube-apiserver, kube-scheduler, and kube-controller-manager static pod on each controller node. Previously, kube-apiserver was self-hosted as a DaemonSet across controllers and kube-scheduler and kube-controller-manager were a Deployment (with 2 or controller_count many replicas). * Remove bootkube bootstrap and pivot to self-hosted * Remove pod-checkpointer manifests (no longer needed)	2019-09-09 22:37:31 -07:00
Dalton Hubble	b60a2ecdf7	Migrate Fedora CoreOS AWS to a static pod control plane * Run a kube-apiserver, kube-scheduler, and kube-controller-manager static pod on each controller node. Previously, kube-apiserver was self-hosted as a DaemonSet across controllers and kube-scheduler and kube-controller-manager were a Deployment (with 2 or controller_count many replicas). * Remove bootkube bootstrap and pivot to self-hosted * Remove pod-checkpointer manifests (no longer needed)	2019-09-09 22:37:31 -07:00
Dalton Hubble	c20683067d	Update etcd from v3.3.15 to v3.4.0 * https://github.com/etcd-io/etcd/releases/tag/v3.4.0	2019-09-08 15:32:49 -07:00
Dalton Hubble	e8d586f3b3	Enable QoS on Fedora CoreOS controllers * Kubelet race should be fixed in Kubernetes v1.15.1 * https://github.com/kubernetes/kubernetes/issues/79046 * Reverts temporary mitigation https://github.com/poseidon/typhoon/pull/515	2019-09-04 21:09:45 -07:00
Dalton Hubble	4d5f962d76	Update CoreDNS from v1.5.0 to v1.6.2 * https://coredns.io/2019/06/26/coredns-1.5.1-release/ * https://coredns.io/2019/07/03/coredns-1.5.2-release/ * https://coredns.io/2019/07/28/coredns-1.6.0-release/ * https://coredns.io/2019/08/02/coredns-1.6.1-release/ * https://coredns.io/2019/08/13/coredns-1.6.2-release/	2019-08-31 15:57:42 -07:00
Dalton Hubble	c42139beaa	Update etcd from v3.3.14 to v3.3.15 * No functional changes, just changes to vendoring tools (go modules -> glide). Still, update to v3.3.15 anyway * https://github.com/etcd-io/etcd/compare/v3.3.14...v3.3.15	2019-08-19 15:05:21 -07:00
Dalton Hubble	35c2763ab0	Update Kubernetes from v1.15.2 to v1.15.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md/#v1153	2019-08-19 14:49:24 -07:00
Dalton Hubble	2067356ae9	Update Fedora CoreOS to testing 30.20190801.0	2019-08-18 21:46:59 -07:00
Dalton Hubble	8f412e2f09	Update etcd from v3.3.13 to v3.3.14 * https://github.com/etcd-io/etcd/releases/tag/v3.3.14	2019-08-18 21:05:06 -07:00
Dalton Hubble	3c3708d58e	Update Calico from v3.8.1 to v3.8.2 * https://docs.projectcalico.org/v3.8/release-notes/	2019-08-16 15:38:23 -07:00
Dalton Hubble	6db11d5908	Enable AWS root block device encryption by default * terraform-provider-aws v2.23.0 allows AWS root block devices to enable encryption by default. * Require updating terraform-provider-aws to v2.23.0 or higher * Enable root EBS device encryption by default for controller instances and worker instances in auto-scaling groups For comparison: * Google Cloud persistent disks have been encrypted by default for years * Azure managed disk encryption is not ready yet (#486)	2019-08-07 21:13:44 -07:00
Dalton Hubble	2227f2cc62	Update Kubernetes from v1.15.1 to v1.15.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1152	2019-08-05 08:48:57 -07:00
Dalton Hubble	dcd6733649	Update Calico from v3.8.0 to v3.8.1 * https://docs.projectcalico.org/v3.8/release-notes/	2019-07-27 15:31:13 -07:00
Dalton Hubble	8cb7fe48a1	Update Fedora CoreOS to testing 30.20190725.0 * Fedora CoreOS Preview AMI are pinned until maturity	2019-07-27 15:18:29 -07:00
Dalton Hubble	e0c7676a15	Update Kubernetes from v1.15.0 to v1.15.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#downloads-for-v1151	2019-07-19 01:21:08 -07:00
Dalton Hubble	339e323491	Temporarily turn off QoS cgroups on Fedora CoreOS controllers * Kubelets can hit the ContainerManager Delegation issue and fail to start (noted in `72c94f1c6`). Its unclear why this occurs only to some Kubelets (possibly an ordering concern) * QoS cgroups remain a goal * When a controller node is affected, bootstrapping fails, which makes other development harder. Temporarily disable QoS on controllers only. This should safeguard bring-up and hopefully still allow the issue to occur on some workers for debugging	2019-07-19 00:17:03 -07:00
Dalton Hubble	155bffa773	Add docs for Fedora CoreOS AWS and bare-metal	2019-07-18 00:55:22 -07:00
Dalton Hubble	ce45e123fe	Port Typhoon Fedora CoreOS support to AWS * Use the newly minted "Fedora CoreOS Preview" AMI * Remove iscsi, kubelet.path activation, and kubeconfig distribution * As usual, bare-metal efforts make cloud provider ports much easier	2019-07-18 00:55:22 -07:00

1 2 3 4 5 ...

278 Commits