typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-03 21:24:38 +02:00

Author	SHA1	Message	Date
Dalton Hubble	c0ce04e1de	Update Calico from v3.10.1 to v3.10.2 * https://docs.projectcalico.org/v3.10/release-notes/	2019-12-09 21:03:00 -08:00
Dalton Hubble	ed3550dce1	Update systemd services for the v0.17.x hyperkube * Binary asset locations within the upstream hyperkube image changed https://github.com/kubernetes/kubernetes/pull/84662 * Fix Container Linux and Flatcar Linux kubelet.service (rkt-fly with fairly dated CoreOS kubelet-wrapper) * Fix Fedora CoreOS kubelet.service (podman) * Fix Fedora CoreOS bootstrap.service * Fix delete-node kubectl usage for workers where nodes may delete themselves on shutdown (e.g. preemptible instances)	2019-12-09 18:39:17 -08:00
Dalton Hubble	de36d99afc	Update Kubernetes from v1.16.3 to v1.17.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md/#v1170	2019-12-09 18:31:58 -08:00
Dalton Hubble	4fce9485c8	Reduce kube-controller-manager pod eviction timeout from 5m to 1m * Reduce time to delete pods on unready nodes from 5m to 1m * Present since v1.13.3, but mistakenly removed in v1.16.0 static pod control plane migration Related: * https://github.com/poseidon/terraform-render-bootstrap/pull/148 * https://github.com/poseidon/terraform-render-bootstrap/pull/164	2019-12-08 22:58:31 -08:00
Dalton Hubble	d9c7a9e049	Add/update docs for asset_dir and kubeconfig usage * Original tutorials favored including the platform (e.g. google-cloud) in modules (e.g. google-cloud-yavin). Prefer naming conventions where each module / cluster has a simple name (e.g. yavin) since the platform is usually redundant * Retain the example cluster naming themes per platform	2019-12-05 22:56:42 -08:00
Dalton Hubble	2837275265	Introduce cluster creation without local writes to asset_dir * Allow generated assets (TLS materials, manifests) to be securely distributed to controller node(s) via file provisioner (i.e. ssh-agent) as an assets bundle file, rather than relying on assets being locally rendered to disk in an asset_dir and then securely distributed * Change `asset_dir` from required to optional. Left unset, asset_dir defaults to "" and no assets will be written to files on the machine that runs terraform apply * Enhancement: Managed cluster assets are kept only in Terraform state, which supports different backends (GCS, S3, etcd, etc) and optional encryption. terraform apply accesses state, runs in-memory, and distributes sensitive materials to controllers without making use of local disk (simplifies use in CI systems) * Enhancement: Improve asset unpack and layout process to position etcd certificates and control plane certificates more cleanly, without unneeded secret materials Details: * Terraform file provisioner support for distributing directories of contents (with unknown structure) has been limited to reading from a local directory, meaning local writes to asset_dir were required. https://github.com/poseidon/typhoon/issues/585 discusses the problem and newer or upcoming Terraform features that might help. * Observation: Terraform provisioner support for single files works well, but iteration isn't viable. We're also constrained to Terraform language features on the apply side (no extra plugins, no shelling out) and CoreOS / Fedora tools on the receive side. * Take a map representation of the contents that would have been splayed out in asset_dir and pack/encode them into a single file format devised for easy unpacking. Use an awk one-liner on the receive side to unpack. In pratice, this has worked well and its rather nice that a single assets file is transferred by file provisioner (all or none) Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162	2019-12-05 01:24:50 -08:00
Dalton Hubble	4b485a9bf2	Fix recent deletion of bootstrap module pinned SHA * Fix deletion of bootstrap module pinned SHA, which was introduced recently through an automation mistake creating https://github.com/poseidon/typhoon/pull/589	2019-11-21 22:34:09 -08:00
Dalton Hubble	8a9e8595ae	Fix terraform fmt formatting	2019-11-13 23:44:02 -08:00
Dalton Hubble	0e4ee5efc9	Add small CPU resource requests to static pods * Set small CPU requests on static pods kube-apiserver, kube-controller-manager, and kube-scheduler to align with upstream tooling and for edge cases * Effectively, a practical case for these requests hasn't been observed. However, a small static pod CPU request may offer a slight benefit if a controller became overloaded and the below mechanisms were insufficient Existing safeguards: * Control plane nodes are tainted to isolate them from ordinary workloads. Even dense workloads can only compress CPU resources on worker nodes. * Control plane static pods use the highest priority class, so contention favors control plane pods (over say node-exporter) and CPU is compressible too. See: https://github.com/poseidon/terraform-render-bootstrap/pull/161	2019-11-13 17:18:45 -08:00
Dalton Hubble	a271b9f340	Update CoreDNS from v1.6.2 to v1.6.5 * Add health `lameduck` option 5s. Before CoreDNS shuts down, it will wait and report unhealthy for 5s to allow time for plugins to shutdown cleanly * Minor bug fixes over a few releases * https://coredns.io/2019/08/31/coredns-1.6.3-release/ * https://coredns.io/2019/09/27/coredns-1.6.4-release/ * https://coredns.io/2019/11/05/coredns-1.6.5-release/	2019-11-13 16:47:44 -08:00
Dalton Hubble	cb0598e275	Adopt Terraform v0.12 templatefile function * Update terraform-render-bootstrap module to adopt the Terrform v0.12 templatefile function feature to replace the use of terraform-provider-template's `template_dir` * Require Terraform v0.12.6+ which adds `for_each` Background: * `template_dir` was added to `terraform-provider-template` to add support for template directory rendering in CoreOS Tectonic Kubernetes distribution (~2017) * Terraform v0.12 introduced a native `templatefile` function and v0.12.6 introduced native `for_each` support (July 2019) that makes it possible to replace `template_dir` usage	2019-11-13 16:33:36 -08:00
Dalton Hubble	d7061020ba	Update Kubernetes from v1.16.2 to v1.16.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163	2019-11-13 13:05:15 -08:00
Dalton Hubble	2c163503f1	Update etcd from v3.4.2 to v3.4.3 * etcd v3.4.3 builds with Go v1.12.12 instead of v1.12.9 and adds a few minor metrics fixes * https://github.com/etcd-io/etcd/compare/v3.4.2...v3.4.3	2019-11-07 11:41:01 -08:00
Dalton Hubble	0034a15711	Update Calico from v3.10.0 to v3.10.1 * https://docs.projectcalico.org/v3.10/release-notes/	2019-11-07 11:38:32 -08:00
Dalton Hubble	4775e9d0f7	Upgrade Calico v3.9.2 to v3.10.0 * Allow advertising Kubernetes service ClusterIPs to BGPPeer routers via a BGPConfiguration * Improve EdgeRouter docs about routes and BGP * https://docs.projectcalico.org/v3.10/release-notes/ * https://docs.projectcalico.org/v3.10/networking/advertise-service-ips	2019-10-27 14:13:41 -07:00
Dalton Hubble	d418045929	Switch kube-proxy from iptables mode to ipvs mode * Kubernetes v1.11 considered kube-proxy IPVS mode GA * Many problems were found #321 * Since then, major blockers seem to have been addressed	2019-10-27 00:37:41 -07:00
Dalton Hubble	24fc440d83	Update Kubernetes from v1.16.1 to v1.16.2 * Update Calico from v3.9.1 to v3.9.2	2019-10-15 22:42:52 -07:00
Dalton Hubble	a6702573a2	Update etcd from v3.4.1 to v3.4.2 * https://github.com/etcd-io/etcd/releases/tag/v3.4.2	2019-10-15 00:06:15 -07:00
Dalton Hubble	d874bdd17d	Update bootstrap module control plane manifests and type constraints * Remove unneeded control plane flags that correspond to defaults * Adopt Terraform v0.12 type constraints in bootstrap module	2019-10-06 21:09:30 -07:00
Dalton Hubble	5b9dab6659	Introduce list of detail objects for bare-metal machines * Define bare-metal `controllers` and `workers` as a complex type list(object{name=string, mac=string, domain=string}) to allow clusters with many machines to be defined more cleanly * Remove `controller_names` list variable * Remove `controller_macs` list variable * Remove `controller_domains` list variable * Remove `worker_names` list variable * Remove `worker_macs` list variable * Remove `worker_domains` list variable	2019-10-06 20:22:45 -07:00
Dalton Hubble	15c4b793c3	Use new Fedora CoreOS kernel/initrd/raw asset names * Fedora CoreOS changed the kernel, initramfs, and raw image asset download paths and names in 30.20191002.0	2019-10-06 17:31:21 -07:00
Dalton Hubble	36ed53924f	Add stricter types for bare-metal modules * Review variables available in bare-metal kubernetes modules for Container Linux and Fedora CoreOS * Deprecate cluster_domain_suffix variable * Remove deprecated container_linux_oem variable	2019-10-06 17:18:50 -07:00
Dalton Hubble	1c5ed84fc2	Update Kubernetes from v1.16.0 to v1.16.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161	2019-10-02 21:31:55 -07:00
Dalton Hubble	a6de245d8a	Rename bootkube.tf to bootstrap.tf * Typhoon no longer uses the bootkube project	2019-09-29 11:30:49 -07:00
Dalton Hubble	96afa6a531	Update Calico from v3.8.2 to v3.9.1 * https://docs.projectcalico.org/v3.9/release-notes/	2019-09-29 11:22:53 -07:00
Dalton Hubble	3e34fb075b	Update etcd from v3.4.0 to v3.4.1 * https://github.com/etcd-io/etcd/releases/tag/v3.4.1	2019-09-28 15:09:57 -07:00
Dalton Hubble	8703f2c3c5	Fix missing comma separator on bare-metal and DO * Introduced in bare-metal and DigitalOcean in #544 while addressing possible ordering race, but after the v1.16 upgrade validation	2019-09-23 11:05:26 -07:00
Dalton Hubble	5b06e0e869	Organize and cleanup Kubelet ExecStartPre * Sort Kubelet ExecStartPre mkdir commands * Remove unused inactive-manifests and checkpoint-secrets directories (were used by bootkube self-hosting)	2019-09-19 00:38:34 -07:00
Dalton Hubble	b951aca66f	Create /etc/kubernetes/manifests before asset copy * Fix issue (present since bootkube->bootstrap switch) where controller asset copy could fail if /etc/kubernetes/manifests wasn't created in time on platforms using path activation for the Kubelet (observed on DigitalOcean, also possible on bare-metal)	2019-09-19 00:30:53 -07:00
Dalton Hubble	9da3725738	Update Kubernetes from v1.15.3 to v1.16.0 * Drop `node-role.kubernetes.io/master` and `node-role.kubernetes.io/node` node labels * Kubelet (v1.16) now rejects the node labels used in the kubectl get nodes ROLES output * https://github.com/kubernetes/kubernetes/issues/75457	2019-09-18 22:53:06 -07:00
Dalton Hubble	fd12f3612b	Rename CA organization from bootkube to typhoon * Rename the organization in generated CA certificates from bootkube to typhoon. Avoid confusion with the bootkube project * https://github.com/poseidon/terraform-render-bootstrap/pull/149	2019-09-14 16:56:53 -07:00
Dalton Hubble	96b646cf6d	Rename bootkube modules to bootstrap * Rename render module from bootkube to bootstrap. Avoid confusion with the kubernetes-incubator/bootkube tool since it is no longer used * Use the poseidon/terraform-render-bootstrap Terraform module (formerly poseidon/terraform-render-bootkube) * https://github.com/poseidon/terraform-render-bootkube/pull/149	2019-09-14 16:24:32 -07:00
Dalton Hubble	b15c60fa2f	Update CHANGES for control plane static pod switch * Remove old references to bootkube / self-hosted	2019-09-09 22:48:48 -07:00
Dalton Hubble	74780fb09f	Migrate Fedora CoreOS bare-metal to static pod control plane * Run a kube-apiserver, kube-scheduler, and kube-controller-manager static pod on each controller node. Previously, kube-apiserver was self-hosted as a DaemonSet across controllers and kube-scheduler and kube-controller-manager were a Deployment (with 2 or controller_count many replicas). * Remove bootkube bootstrap and pivot to self-hosted * Remove pod-checkpointer manifests (no longer needed)	2019-09-09 22:37:31 -07:00
Dalton Hubble	c20683067d	Update etcd from v3.3.15 to v3.4.0 * https://github.com/etcd-io/etcd/releases/tag/v3.4.0	2019-09-08 15:32:49 -07:00
Dalton Hubble	e8d586f3b3	Enable QoS on Fedora CoreOS controllers * Kubelet race should be fixed in Kubernetes v1.15.1 * https://github.com/kubernetes/kubernetes/issues/79046 * Reverts temporary mitigation https://github.com/poseidon/typhoon/pull/515	2019-09-04 21:09:45 -07:00
Dalton Hubble	4d5f962d76	Update CoreDNS from v1.5.0 to v1.6.2 * https://coredns.io/2019/06/26/coredns-1.5.1-release/ * https://coredns.io/2019/07/03/coredns-1.5.2-release/ * https://coredns.io/2019/07/28/coredns-1.6.0-release/ * https://coredns.io/2019/08/02/coredns-1.6.1-release/ * https://coredns.io/2019/08/13/coredns-1.6.2-release/	2019-08-31 15:57:42 -07:00
Dalton Hubble	c42139beaa	Update etcd from v3.3.14 to v3.3.15 * No functional changes, just changes to vendoring tools (go modules -> glide). Still, update to v3.3.15 anyway * https://github.com/etcd-io/etcd/compare/v3.3.14...v3.3.15	2019-08-19 15:05:21 -07:00
Dalton Hubble	35c2763ab0	Update Kubernetes from v1.15.2 to v1.15.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md/#v1153	2019-08-19 14:49:24 -07:00
Dalton Hubble	8f412e2f09	Update etcd from v3.3.13 to v3.3.14 * https://github.com/etcd-io/etcd/releases/tag/v3.3.14	2019-08-18 21:05:06 -07:00
Dalton Hubble	3c3708d58e	Update Calico from v3.8.1 to v3.8.2 * https://docs.projectcalico.org/v3.8/release-notes/	2019-08-16 15:38:23 -07:00
Dalton Hubble	2227f2cc62	Update Kubernetes from v1.15.1 to v1.15.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1152	2019-08-05 08:48:57 -07:00
Dalton Hubble	dcd6733649	Update Calico from v3.8.0 to v3.8.1 * https://docs.projectcalico.org/v3.8/release-notes/	2019-07-27 15:31:13 -07:00
Dalton Hubble	1409bc62d8	Remove download_protocol variable from Fedora CoreOS * For Fedora CoreOS, only HTTPS downloads are available. Any iPXE firmware must be compiled to support TLS fetching. * For Container Linux, using public kernel/initramfs images defaults to using HTTPS, but can be set to HTTP for iPXE firmware that hasn't been custom compiled to support TLS	2019-07-27 15:23:34 -07:00
Dalton Hubble	e0c7676a15	Update Kubernetes from v1.15.0 to v1.15.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#downloads-for-v1151	2019-07-19 01:21:08 -07:00
Dalton Hubble	339e323491	Temporarily turn off QoS cgroups on Fedora CoreOS controllers * Kubelets can hit the ContainerManager Delegation issue and fail to start (noted in `72c94f1c6`). Its unclear why this occurs only to some Kubelets (possibly an ordering concern) * QoS cgroups remain a goal * When a controller node is affected, bootstrapping fails, which makes other development harder. Temporarily disable QoS on controllers only. This should safeguard bring-up and hopefully still allow the issue to occur on some workers for debugging	2019-07-19 00:17:03 -07:00
Dalton Hubble	5fdeb9bc78	Adjust Fedora CoreOS image locations * Use the xz compressed images published by Fedora testing, instead of gzippped tarballs. This is possible because the initramfs now supports xz and coreos-installer 0.8 was added * Separate bios and uefi raw images are no longer needed	2019-07-18 01:15:29 -07:00
Dalton Hubble	155bffa773	Add docs for Fedora CoreOS AWS and bare-metal	2019-07-18 00:55:22 -07:00
Dalton Hubble	72c94f1c6a	Add Kubelet System Container and bootkube bootstrap * First semi-working cluster using 30.307-metal-bios * Enable CPU, Memory, and BlockIO accounting * Mount /var/lib/kubelet with `rshare` so mounted tmpfs Secrets (e.g. serviceaccount's) are visible within appropriate containers * SELinux relabel /etc/kubernetes so install-cni init containers can write the CNI config to the host /etc/kubernetes/net.d * SELinux relabel /var/lib/kubelet so ConfigMaps can be read by containers * SELinux relabel /opt/cni/bin so install-cni containers can write CNI binaries to the host * Set net.ipv4_conf.all.rp_filter to 1 (not 2, loose mode) to satisfy Calico requirement * Enable the QoS cgroup hierarchy for pod workloads (kubepods, burstable, besteffort). Mount /sys/fs/cgroup and /sys/fs/cgroup/systemd into the Kubelet. Its still rather racy whether Kubelet will fail on ContainerManager Delegation	2019-07-18 00:55:22 -07:00
Dalton Hubble	aab14c5573	Run etcd-member.service across controllers * Running the etcd container with NOTIFY_SOCKET mounted (to use systemd Type=notify) causes podman to hang so for now just use exec * https://github.com/opencontainers/runc/pull/1807	2019-07-18 00:55:22 -07:00

1 2

51 Commits