typhoon

Commit Graph

Author	SHA1	Message	Date
Dalton Hubble	df17253e72	Fix delete node permission on Fedora CoreOS node shutdown * On cloud platforms, `delete-node.service` tries to delete the local node (not always possible depending on preemption time) * Since v1.18.3, kubelet TLS bootstrap generates a kubeconfig in `/var/lib/kubelet` which should be used with kubectl in the delete-node oneshot	2020-10-18 23:38:11 -07:00
Dalton Hubble	eda78db08e	Change Flatcar kubelet.service container from rkt to docker * Use docker to run the `kubelet.service` container * Update Kubelet mounts to match Fedora CoreOS * Remove unused `/etc/ssl/certs` mount (see https://github.com/poseidon/typhoon/pull/810) * Remove unused `/usr/share/ca-certificates` mount * Remove `/etc/resolv.conf` mount, Docker default is ok * Change `delete-node.service` to use docker instead of rkt and inline ExecStart, as was done on Fedora CoreOS * Fix permission denied on shutdown `delete-node`, caused by the kubeconfig mount changing with the introduction of node TLS bootstrap Background * podmand, rkt, and runc daemonless container process runners provide advantages over the docker daemon for system containers. Docker requires workarounds for use in systemd units where the ExecStart must tail logs so systemd can monitor the daemonized container. https://github.com/moby/moby/issues/6791 * Why switch then? On Flatcar Linux, podman isn't shipped. rkt works, but isn't developing while container standards continue to move forward. Typhoon has used runc for the Kubelet runner before in Fedora Atomic, but its more low-level. So we're left with Docker, which is less than ideal, but shipped in Flatcar * Flatcar Linux appears to be shifting system components to use docker, which does provide some limited guards against breakages (e.g. Flatcar cannot enable docker live restore)	2020-10-18 23:24:45 -07:00
Dalton Hubble	afac46e39a	Remove asset_dir variable and optional asset writes * Originally, poseidon/terraform-render-bootstrap generated TLS certificates, manifests, and cluster "assets" written to local disk (`asset_dir`) during terraform apply cluster bootstrap * Typhoon v1.17.0 introduced bootstrapping using only Terraform state to store cluster assets, to avoid ever writing sensitive materials to disk and improve automated use-cases. `asset_dir` was changed to optional and defaulted to "" (no writes) * Typhoon v1.18.0 deprecated the `asset_dir` variable, removed docs, and announced it would be deleted in future. * Add Terraform output `assets_dir` map * Remove the `asset_dir` variable Cluster assets are now stored in Terraform state only. For those who wish to write those assets to local files, this is possible doing so explicitly. ``` resource local_file "assets" { for_each = module.yavin.assets_dist filename = "some-assets/${each.key}" content = each.value } ``` Related: * https://github.com/poseidon/typhoon/pull/595 * https://github.com/poseidon/typhoon/pull/678	2020-10-17 15:00:15 -07:00
Dalton Hubble	b1e680ac0c	Update recommended Terraform provider versions * Sync Terraform provider plugins with those used internally	2020-10-17 13:56:24 -07:00
Dalton Hubble	9fbfbdb854	Update Prometheus from v2.21.0 to v2.22.0 * https://github.com/prometheus/prometheus/releases/tag/v2.22.0	2020-10-17 12:38:25 -07:00
Dalton Hubble	511f5272f4	Update Calico from v3.15.3 to v3.16.3 * https://github.com/projectcalico/calico/releases/tag/v3.16.3 * https://github.com/poseidon/terraform-render-bootstrap/pull/212	2020-10-15 20:08:51 -07:00
Dalton Hubble	46ca5e8813	Update Kubernetes from v1.19.2 to v1.19.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193	2020-10-14 20:47:49 -07:00
Dalton Hubble	394e496cc7	Update Grafana from v7.2.0 to v7.2.1 * https://github.com/grafana/grafana/releases/tag/v7.2.1	2020-10-11 13:21:25 -07:00
Dalton Hubble	7881f4bd86	Update kube-state-metrics from v1.9.7 to v2.0.0-alpha.1 * https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha * https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.1	2020-10-11 12:35:43 -07:00
Dalton Hubble	d5b5b7cb02	Update nginx-ingress from v0.40.0 to v0.40.2 * https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.2	2020-10-06 23:52:15 -07:00
Dalton Hubble	b39a1d70da	Update nginx-ingress from v0.35.0 to v0.40.0 * https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.0	2020-10-02 01:00:35 -07:00
Dalton Hubble	901f7939b2	Update Cilium from v1.8.3 to v1.8.4 * https://github.com/cilium/cilium/releases/tag/v1.8.4	2020-10-02 00:24:26 -07:00
Dalton Hubble	d65085ce14	Update Grafana from v7.1.5 to v7.2.0 * https://github.com/grafana/grafana/releases/tag/v7.2.0	2020-09-24 20:58:32 -07:00
Dalton Hubble	343db5b578	Remove references to CoreOS Container Linux * CoreOS Container Linux was deprecated in v1.18.3 (May 2020) in favor of Fedora CoreOS and Flatcar Linux. CoreOS Container Linux references were kept to give folks more time to migrate, but AMIs have now been deleted. Time is up. Rel: https://coreos.com/os/eol/	2020-09-24 20:51:02 -07:00
Dalton Hubble	444363be2d	Update Kubernetes from v1.19.1 to v1.19.2 * Update flannel from v0.12.0 to v0.13.0-rc2 * Update flannel-cni from v0.4.0 to v0.4.1 * Update CNI plugins from v0.8.6 to v0.8.7	2020-09-16 20:05:54 -07:00
Dalton Hubble	e838d4dc3d	Refresh Prometheus rules/alerts and Grafana dashboards * Refresh upstream Prometheus rules/alerts and Grafana dashboards	2020-09-13 15:03:27 -07:00
Dalton Hubble	979c092ef6	Reduce apiserver metrics cardinality of non-core APIs * Reduce `apiserver_request_duration_seconds_count` cardinality by dropping series for non-core Kubernetes APIs. This is done to match `apiserver_request_duration_seconds_count` relabeling * These two relabels must be performed the same way to avoid affecting new SLO calculations (upcoming) * See https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/498 Related: https://github.com/poseidon/typhoon/pull/596	2020-09-13 14:47:49 -07:00
Dalton Hubble	eb093af9ed	Drop Kubelet labelmap relabel for node_name * Originally, Kubelet and CAdvisor metrics used a labelmap relabel to add Kubernetes SD node labels onto timeseries * With https://github.com/poseidon/typhoon/pull/596 that relabel was dropped since node labels aren't usually that valuable. `__meta_kubernetes_node_name` was retained but the field name is empty * Favor just using Prometheus server-side `instance` in queries that require some node identifier for aggregation or debugging Fix https://github.com/poseidon/typhoon/issues/823	2020-09-12 19:40:00 -07:00
Dalton Hubble	36096f844d	Promote Cilium from experimental to GA * Cilium was added as an experimental CNI provider in June * Since then, I've been choosing it for an increasing number of clusters and scenarios.	2020-09-12 19:24:55 -07:00
Dalton Hubble	d236628e53	Update Prometheus from v2.20.0 to v2.21.0 * https://github.com/prometheus/prometheus/releases/tag/v2.21.0	2020-09-12 19:20:54 -07:00
Dalton Hubble	577b927a2b	Update Fedora CoreOS Config version from v1.0.0 to v1.1.0 * No notable changes in the config spec, just house keeping * Require any snippets customization to update to v1.1.0. Version skew between the main config and snippets will show an err message * https://github.com/coreos/fcct/blob/master/docs/configuration-v1_1.md	2020-09-10 23:38:40 -07:00
Dalton Hubble	000c11edf6	Update IngressClass resources to networking.k8s.io/v1 * Kubernetes v1.19 graduated Ingress and IngressClass from networking.k8s.io/v1beta1 to networking.k8s.io/v1	2020-09-10 23:25:53 -07:00
Dalton Hubble	29b16c3fc0	Change seccomp annotations to seccompProfile * seccomp graduated to GA in Kubernetes v1.19. Support for seccomp alpha annotations will be removed in v1.22 * Replace seccomp annotations with the GA seccompProfile field in the PodTemplate securityContext * Switch profile from `docker/default` to `runtime/default` (no effective change, since docker is the runtime) * Verify with docker inspect SecurityOpt. Without the profile, you'd see `seccomp=unconfined` Related: https://github.com/poseidon/terraform-render-bootstrap/pull/215	2020-09-10 01:15:07 -07:00
Dalton Hubble	0c7a879bc4	Update Kubernetes from v1.19.0 to v1.19.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191	2020-09-09 20:52:29 -07:00
Dalton Hubble	28ee693e6b	Update Cilium from v1.8.2 to v1.8.3 * https://github.com/cilium/cilium/releases/tag/v1.8.3	2020-09-07 21:10:27 -07:00
Dalton Hubble	8c7d95aefd	Update mkdocs-material from v5.5.9 to v5.5.11	2020-08-29 13:52:16 -07:00
Dalton Hubble	d45dfdbf91	Update nginx-ingress from v0.34.1 to v0.35.0 * Repo changed to k8s.gcr.io/ingress-nginx/controller * https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.35.0	2020-08-29 13:38:28 -07:00
Dalton Hubble	8dd221a57c	Add fleetlock docs and links to addons * Add links to fleetlock for Fedora CoreOS reboot coordination * https://github.com/poseidon/fleetlock	2020-08-28 00:02:24 -07:00
Dalton Hubble	a504264e24	Update Grafana from v7.1.4 to v7.1.5 * https://github.com/grafana/grafana/releases/tag/v7.1.5	2020-08-27 08:52:07 -07:00
Dalton Hubble	88cf7273dc	Update Kubernetes from v1.18.8 to v1.19.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md	2020-08-27 08:50:01 -07:00
Dalton Hubble	58def65a09	Update Grafana from v7.1.3 to v7.1.4 * https://github.com/grafana/grafana/releases/tag/v7.1.4	2020-08-22 15:40:09 -07:00
Dalton Hubble	cd7fd29194	Update etcd from v3.4.10 to v3.4.12 * https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md	2020-08-19 21:25:41 -07:00
Bo Huang	aafa38476a	Fix SELinux race condition on non-bootstrap controllers in multi-controller (#808 ) * Fix race condition for bootstrap-secrets SELinux context on non-bootstrap controllers in multi-controller FCOS clusters * On first boot from disk on non-bootstrap controllers, adding bootstrap-secrets races with kubelet.service starting, which can cause the secrets assets to have the wrong label until kubelet.service restarts (service, reboot, auto-update) * This can manifest as `kube-apiserver`, `kube-controller-manager`, and `kube-scheduler` pods crashlooping on spare controllers on first cluster creation	2020-08-19 21:18:10 -07:00
Dalton Hubble	9a07f1d30b	Update recommended Terraform provider versions * Sync Terraform provider plugin versions to those used internally * Update mkdocs-material from v5.5.1 to v5.5.6 * Fix minor details in docs	2020-08-14 10:05:52 -07:00
Dalton Hubble	c87db3ef37	Update Kubernetes from v1.18.6 to v1.18.8 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1188	2020-08-13 20:47:43 -07:00
Dalton Hubble	342380cfa4	Update Terraform migration guide SHA * Mention the first master branch SHA that introduced Terraform v0.13 forward compatibility * Link the migration guide on Github until a release is available and website docs are published	2020-08-13 00:36:47 -07:00
Dalton Hubble	5e70d7e2c8	Migrate from Terraform v0.12.x to v0.13.x * Recommend Terraform v0.13.x * Support automatic install of poseidon's provider plugins * Update tutorial docs for Terraform v0.13.x * Add migration guide for Terraform v0.13.x (best-effort) * Require Terraform v0.12.26+ (migration compatibility) * Require `terraform-provider-ct` v0.6.1 * Require `terraform-provider-matchbox` v0.4.1 * Require `terraform-provider-digitalocean` v1.20+ Related: * https://www.hashicorp.com/blog/announcing-hashicorp-terraform-0-13/ * https://www.terraform.io/upgrade-guides/0-13.html * https://registry.terraform.io/providers/poseidon/ct/latest * https://registry.terraform.io/providers/poseidon/matchbox/latest	2020-08-12 01:54:32 -07:00
Dalton Hubble	f6ce12766b	Allow terraform-provider-aws v3.0+ plugin * Typhoon AWS is compatible with terraform-provider-aws v3.x releases * Continue to allow v2.23+, no v3.x specific features are used * Set required provider versions in the worker module, since it can be used independently Related: * https://github.com/terraform-providers/terraform-provider-aws/releases/tag/v3.0.0	2020-08-09 12:39:26 -07:00
Dalton Hubble	e1d6ab2f24	Update Grafana from v7.1.1 to v7.1.3 * https://github.com/grafana/grafana/releases/tag/v7.1.3 * https://github.com/grafana/grafana/releases/tag/v7.1.2	2020-08-08 18:59:49 -07:00
Dalton Hubble	ccee5d3d89	Update from coreos/flannel-cni to poseidon/flannel-cni * Update CNI plugins from v0.6.0 to v0.8.6 to fix several CVEs * Update the base image to alpine:3.12 * Use `flannel-cni` as an init container and remove sleep * https://github.com/poseidon/terraform-render-bootstrap/pull/205 * https://github.com/poseidon/flannel-cni * https://quay.io/repository/poseidon/flannel-cni Background * Switch from github.com/coreos/flannel-cni v0.3.0 which was last published by me in 2017 and is no longer accessible to me to maintain or patch * Port to the poseidon/flannel-cni rewrite, which releases v0.4.0 to continue the prior release numbering	2020-08-02 15:13:15 -07:00
Dalton Hubble	78e6409bd0	Fix flannel support on Fedora CoreOS * Fedora CoreOS now ships systemd-udev's `default.link` while Flannel relies on being able to pick its own MAC address for the `flannel.1` link for tunneled traffic to reach cni0 on the destination side, without being dropped * This change first appeared in FCOS testing-devel 32.20200624.20.1 and is the behavior going forward in FCOS since it was added to align FCOS network naming / configs with the rest of Fedora and address issues related to the default being missing * Flatcar Linux (and Container Linux) has a specific flannel.link configuration builtin, so it was not affected * https://github.com/coreos/fedora-coreos-tracker/issues/574#issuecomment-665487296 Note: Typhoon's recommended and default CNI provider is Calico, unless `networking` is set to flannel directly.	2020-08-01 21:22:08 -07:00
Dalton Hubble	2aef42d4f6	Update Prometheus from v2.19.2 to v2.20.0 * https://github.com/prometheus/prometheus/releases/tag/v2.20.0	2020-07-25 16:37:28 -07:00
Dalton Hubble	b7d67757de	Update Grafana from v7.1.0 to v7.1.1 * https://github.com/grafana/grafana/releases/tag/v7.1.1	2020-07-25 16:33:40 -07:00
Dalton Hubble	cd0a28904e	Update Cilium from v1.8.1 to v1.8.2 * https://github.com/cilium/cilium/releases/tag/v1.8.2	2020-07-25 16:06:27 -07:00
Dalton Hubble	618f8b30fd	Update CoreDNS from v1.6.7 to v1.7.0 * https://coredns.io/2020/06/15/coredns-1.7.0-release/ * Update Grafana dashboard with revised metrics names	2020-07-25 15:51:31 -07:00
Dalton Hubble	f96e91f225	Update etcd from v3.4.9 to v3.4.10 * https://github.com/etcd-io/etcd/releases/tag/v3.4.10	2020-07-18 14:08:22 -07:00
Dalton Hubble	efd4a0319d	Update Grafana from v7.0.6 to v7.1.0 * https://github.com/grafana/grafana/releases/tag/v7.1.0	2020-07-18 13:54:56 -07:00
Dalton Hubble	5fba20d358	Update recommended Terraform provider versions * Sync Terraform provider plugin versions with those used internally	2020-07-18 13:19:25 -07:00
Dalton Hubble	a8d3d3bb12	Update ingress-nginx from v0.33.0 to v0.34.1 * Switch to ingress-nginx controller images from us.grc.io (eu, asia can also be used if desired) * https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.1 * https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.0	2020-07-15 22:43:49 -07:00
Dalton Hubble	dfd2a0ec23	Update Grafana from v7.0.5 to v7.0.6 * https://github.com/grafana/grafana/releases/tag/v7.0.6	2020-07-09 21:10:48 -07:00

1 2 3 4 5 ...

634 Commits