Commit Graph

234 Commits

Author SHA1 Message Date
Dalton Hubble
0eef16b274 Improve and tidy Fedora CoreOS etcd-member.service
* Allow a snippet with a systemd dropin to set an alternate
image via `ETCD_IMAGE`, for consistency across Fedora CoreOS
and Flatcar Linux
* Drop comments about integrating system containers with
systemd-notify
2020-11-08 11:49:56 -08:00
Dalton Hubble
ad1f59ce91 Change Flatcar etcd-member.service container from rkt to docker
* Use docker to run the `etcd-member.service` container
* Use env-file `/etc/etcd/etcd.env` like podman on FCOS
* Background: https://github.com/poseidon/typhoon/pull/855
2020-11-03 16:42:18 -08:00
Dalton Hubble
82e5ac3e7c Update Cilium from v1.8.5 to v1.9.0-rc3
* https://github.com/poseidon/terraform-render-bootstrap/pull/224
2020-11-03 10:29:07 -08:00
Dalton Hubble
a8f7880511 Update Cilium from v1.8.4 to v1.8.5
* https://github.com/cilium/cilium/releases/tag/v1.8.5
2020-10-29 00:50:18 -07:00
Dalton Hubble
893d139590 Update Calico from v3.16.3 to v3.16.4
* https://github.com/projectcalico/calico/releases/tag/v3.16.4
2020-10-26 00:50:40 -07:00
Dalton Hubble
7c3f3ab6d0 Rename container-linux modules to flatcar-linux
* CoreOS Container Linux was deprecated in v1.18.3
* Continue transitioning docs and modules from supporting
both CoreOS and Flatcar "variants" of Container Linux to
now supporting Flatcar Linux and equivalents

Action Required: Update the Flatcar Linux modules `source`
to replace `s/container-linux/flatcar-linux`. See docs for
examples
2020-10-20 22:47:19 -07:00
Dalton Hubble
a99a990d49 Remove unused Kubelet tls mounts
* Kubelet trusts only the cluster CA certificate (and
certificates in the Kubelet debian base image), there
is no longer a need to mount the host's trusted certs
* Similar change on Flatcar Linux in
https://github.com/poseidon/typhoon/pull/855

Rel: https://github.com/poseidon/typhoon/pull/810
2020-10-18 23:48:21 -07:00
Dalton Hubble
df17253e72 Fix delete node permission on Fedora CoreOS node shutdown
* On cloud platforms, `delete-node.service` tries to delete the
local node (not always possible depending on preemption time)
* Since v1.18.3, kubelet TLS bootstrap generates a kubeconfig
in `/var/lib/kubelet` which should be used with kubectl in
the delete-node oneshot
2020-10-18 23:38:11 -07:00
Dalton Hubble
eda78db08e Change Flatcar kubelet.service container from rkt to docker
* Use docker to run the `kubelet.service` container
* Update Kubelet mounts to match Fedora CoreOS
* Remove unused `/etc/ssl/certs` mount (see
https://github.com/poseidon/typhoon/pull/810)
* Remove unused `/usr/share/ca-certificates` mount
* Remove `/etc/resolv.conf` mount, Docker default is ok
* Change `delete-node.service` to use docker instead of rkt
and inline ExecStart, as was done on Fedora CoreOS
* Fix permission denied on shutdown `delete-node`, caused
by the kubeconfig mount changing with the introduction of
node TLS bootstrap

Background

* podmand, rkt, and runc daemonless container process runners
provide advantages over the docker daemon for system containers.
Docker requires workarounds for use in systemd units where the
ExecStart must tail logs so systemd can monitor the daemonized
container. https://github.com/moby/moby/issues/6791
* Why switch then? On Flatcar Linux, podman isn't shipped. rkt
works, but isn't developing while container standards continue
to move forward. Typhoon has used runc for the Kubelet runner
before in Fedora Atomic, but its more low-level. So we're left
with Docker, which is less than ideal, but shipped in Flatcar
* Flatcar Linux appears to be shifting system components to
use docker, which does provide some limited guards against
breakages (e.g. Flatcar cannot enable docker live restore)
2020-10-18 23:24:45 -07:00
Dalton Hubble
afac46e39a Remove asset_dir variable and optional asset writes
* Originally, poseidon/terraform-render-bootstrap generated
TLS certificates, manifests, and cluster "assets" written
to local disk (`asset_dir`) during terraform apply cluster
bootstrap
* Typhoon v1.17.0 introduced bootstrapping using only Terraform
state to store cluster assets, to avoid ever writing sensitive
materials to disk and improve automated use-cases. `asset_dir`
was changed to optional and defaulted to "" (no writes)
* Typhoon v1.18.0 deprecated the `asset_dir` variable, removed
docs, and announced it would be deleted in future.
* Add Terraform output `assets_dir` map
* Remove the `asset_dir` variable

Cluster assets are now stored in Terraform state only. For those
who wish to write those assets to local files, this is possible
doing so explicitly.

```
resource local_file "assets" {
  for_each = module.yavin.assets_dist
  filename = "some-assets/${each.key}"
  content = each.value
}
```

Related:

* https://github.com/poseidon/typhoon/pull/595
* https://github.com/poseidon/typhoon/pull/678
2020-10-17 15:00:15 -07:00
Dalton Hubble
511f5272f4 Update Calico from v3.15.3 to v3.16.3
* https://github.com/projectcalico/calico/releases/tag/v3.16.3
* https://github.com/poseidon/terraform-render-bootstrap/pull/212
2020-10-15 20:08:51 -07:00
Dalton Hubble
46ca5e8813 Update Kubernetes from v1.19.2 to v1.19.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193
2020-10-14 20:47:49 -07:00
Dalton Hubble
901f7939b2 Update Cilium from v1.8.3 to v1.8.4
* https://github.com/cilium/cilium/releases/tag/v1.8.4
2020-10-02 00:24:26 -07:00
Dalton Hubble
343db5b578 Remove references to CoreOS Container Linux
* CoreOS Container Linux was deprecated in v1.18.3 (May 2020)
in favor of Fedora CoreOS and Flatcar Linux. CoreOS Container
Linux references were kept to give folks more time to migrate,
but AMIs have now been deleted. Time is up.

Rel: https://coreos.com/os/eol/
2020-09-24 20:51:02 -07:00
Dalton Hubble
444363be2d Update Kubernetes from v1.19.1 to v1.19.2
* Update flannel from v0.12.0 to v0.13.0-rc2
* Update flannel-cni from v0.4.0 to v0.4.1
* Update CNI plugins from v0.8.6 to v0.8.7
2020-09-16 20:05:54 -07:00
Dalton Hubble
577b927a2b Update Fedora CoreOS Config version from v1.0.0 to v1.1.0
* No notable changes in the config spec, just house keeping
* Require any snippets customization to update to v1.1.0. Version
skew between the main config and snippets will show an err message
* https://github.com/coreos/fcct/blob/master/docs/configuration-v1_1.md
2020-09-10 23:38:40 -07:00
Dalton Hubble
29b16c3fc0 Change seccomp annotations to seccompProfile
* seccomp graduated to GA in Kubernetes v1.19. Support for
seccomp alpha annotations will be removed in v1.22
* Replace seccomp annotations with the GA seccompProfile
field in the PodTemplate securityContext
* Switch profile from `docker/default` to `runtime/default`
(no effective change, since docker is the runtime)
* Verify with docker inspect SecurityOpt. Without the profile,
you'd see `seccomp=unconfined`

Related: https://github.com/poseidon/terraform-render-bootstrap/pull/215
2020-09-10 01:15:07 -07:00
Dalton Hubble
0c7a879bc4 Update Kubernetes from v1.19.0 to v1.19.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191
2020-09-09 20:52:29 -07:00
Dalton Hubble
28ee693e6b Update Cilium from v1.8.2 to v1.8.3
* https://github.com/cilium/cilium/releases/tag/v1.8.3
2020-09-07 21:10:27 -07:00
Dalton Hubble
88cf7273dc Update Kubernetes from v1.18.8 to v1.19.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md
2020-08-27 08:50:01 -07:00
Dalton Hubble
cd7fd29194 Update etcd from v3.4.10 to v3.4.12
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md
2020-08-19 21:25:41 -07:00
Bo Huang
aafa38476a
Fix SELinux race condition on non-bootstrap controllers in multi-controller (#808)
* Fix race condition for bootstrap-secrets SELinux context on non-bootstrap controllers in multi-controller FCOS clusters
* On first boot from disk on non-bootstrap controllers, adding bootstrap-secrets races with kubelet.service starting, which can cause the secrets assets to have the wrong label until kubelet.service restarts (service, reboot, auto-update)
* This can manifest as `kube-apiserver`, `kube-controller-manager`, and `kube-scheduler` pods crashlooping on spare controllers on first cluster creation
2020-08-19 21:18:10 -07:00
Dalton Hubble
c87db3ef37 Update Kubernetes from v1.18.6 to v1.18.8
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1188
2020-08-13 20:47:43 -07:00
Dalton Hubble
5e70d7e2c8 Migrate from Terraform v0.12.x to v0.13.x
* Recommend Terraform v0.13.x
* Support automatic install of poseidon's provider plugins
* Update tutorial docs for Terraform v0.13.x
* Add migration guide for Terraform v0.13.x (best-effort)
* Require Terraform v0.12.26+ (migration compatibility)
* Require `terraform-provider-ct` v0.6.1
* Require `terraform-provider-matchbox` v0.4.1
* Require `terraform-provider-digitalocean` v1.20+

Related:

* https://www.hashicorp.com/blog/announcing-hashicorp-terraform-0-13/
* https://www.terraform.io/upgrade-guides/0-13.html
* https://registry.terraform.io/providers/poseidon/ct/latest
* https://registry.terraform.io/providers/poseidon/matchbox/latest
2020-08-12 01:54:32 -07:00
Dalton Hubble
ccee5d3d89 Update from coreos/flannel-cni to poseidon/flannel-cni
* Update CNI plugins from v0.6.0 to v0.8.6 to fix several CVEs
* Update the base image to alpine:3.12
* Use `flannel-cni` as an init container and remove sleep
* https://github.com/poseidon/terraform-render-bootstrap/pull/205
* https://github.com/poseidon/flannel-cni
* https://quay.io/repository/poseidon/flannel-cni

Background

* Switch from github.com/coreos/flannel-cni v0.3.0 which was last
published by me in 2017 and is no longer accessible to me to maintain
or patch
* Port to the poseidon/flannel-cni rewrite, which releases v0.4.0
to continue the prior release numbering
2020-08-02 15:13:15 -07:00
Dalton Hubble
78e6409bd0 Fix flannel support on Fedora CoreOS
* Fedora CoreOS now ships systemd-udev's `default.link` while
Flannel relies on being able to pick its own MAC address for
the `flannel.1` link for tunneled traffic to reach cni0 on
the destination side, without being dropped
* This change first appeared in FCOS testing-devel 32.20200624.20.1
and is the behavior going forward in FCOS since it was added
to align FCOS network naming / configs with the rest of Fedora
and address issues related to the default being missing
* Flatcar Linux (and Container Linux) has a specific flannel.link
configuration builtin, so it was not affected
* https://github.com/coreos/fedora-coreos-tracker/issues/574#issuecomment-665487296

Note: Typhoon's recommended and default CNI provider is Calico,
unless `networking` is set to flannel directly.
2020-08-01 21:22:08 -07:00
Dalton Hubble
cd0a28904e Update Cilium from v1.8.1 to v1.8.2
* https://github.com/cilium/cilium/releases/tag/v1.8.2
2020-07-25 16:06:27 -07:00
Dalton Hubble
618f8b30fd Update CoreDNS from v1.6.7 to v1.7.0
* https://coredns.io/2020/06/15/coredns-1.7.0-release/
* Update Grafana dashboard with revised metrics names
2020-07-25 15:51:31 -07:00
Dalton Hubble
264d23a1b5 Declare etcd data directory permissions
* Set etcd data directory /var/lib/etcd permissions to 700
* On Flatcar Linux, /var/lib/etcd is pre-existing and Ignition
v2 doesn't overwrite the directory. Update the Container Linux
config, but add the manual chmod workaround to bootstrap for
Flatcar Linux users
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v3410-2020-07-16
* https://github.com/etcd-io/etcd/pull/11798
2020-07-25 15:48:27 -07:00
Dalton Hubble
f96e91f225 Update etcd from v3.4.9 to v3.4.10
* https://github.com/etcd-io/etcd/releases/tag/v3.4.10
2020-07-18 14:08:22 -07:00
Dalton Hubble
6df6bf904a Show Cilium as a CNI provider option in docs
* Start to show Cilium as a CNI option
* https://github.com/cilium/cilium
2020-07-18 13:27:56 -07:00
Dalton Hubble
9ea6d2c245 Update Kubernetes from v1.18.5 to v1.18.6
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1186
* https://github.com/poseidon/terraform-render-bootstrap/pull/201
2020-07-15 22:05:57 -07:00
Dalton Hubble
49050320ce Update Cilium from v1.8.0 to v1.8.1
* https://github.com/cilium/cilium/releases/tag/v1.8.1
2020-07-05 16:00:00 -07:00
Dalton Hubble
df3f40bcce Allow using Flatcar Linux edge on Azure
* Set Kubelet cgroup driver to systemd when Flatcar Linux edge
is chosen

Note: Typhoon module status assumes use of the stable variant of
an OS channel/stream. Its possible to use earlier variants and
those are sometimes tested or developed against, but stable is
the recommendation
2020-06-30 01:35:29 -07:00
Dalton Hubble
0ba2c1a4da Fix terraform fmt in firewall rules 2020-06-29 23:04:54 -07:00
Dalton Hubble
7bce15975c Update Kubernetes from v1.18.4 to v1.18.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1185
2020-06-27 13:52:18 -07:00
Dalton Hubble
1f83ae7dbb Update Calico from v3.14.1 to v3.15.0
* https://docs.projectcalico.org/v3.15/release-notes/
2020-06-26 02:40:12 -07:00
Dalton Hubble
d27f367004 Update Cilium from v1.8.0-rc4 to v1.8.0
* https://github.com/cilium/cilium/releases/tag/v1.8.0
2020-06-22 22:26:49 -07:00
Dalton Hubble
e9c8520359 Add experimental Cilium CNI provider
* Accept experimental CNI `networking` mode "cilium"
* Run Cilium v1.8.0-rc4 with overlay vxlan tunnels and a
minimal set of features. We're interested in:
  * IPAM: Divide pod_cidr into /24 subnets per node
  * CNI networking pod-to-pod, pod-to-external
  * BPF masquerade
  * NetworkPolicy as defined by Kubernetes (no L7 Policy)
* Continue using kube-proxy with Cilium probe mode
* Firewall changes:
  * Require UDP 8472 for vxlan (Linux kernel default) between nodes
  * Optional ICMP echo(8) between nodes for host reachability
    (health)
  * Optional TCP 4240 between nodes for endpoint reachability (health)

Known Issues:

* Containers with `hostPort` don't listen on all host addresses,
these workloads must use `hostNetwork` for now
https://github.com/cilium/cilium/issues/12116
* Erroneous warning on Fedora CoreOS
https://github.com/cilium/cilium/issues/10256

Note: This is experimental. It is not listed in docs and may be
changed or removed without a deprecation notice

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/192
* https://github.com/cilium/cilium/issues/12217
2020-06-21 20:41:53 -07:00
Dalton Hubble
37f00a3882 Reduce Calcio MTU on Fedora CoreOS Azure
* Change the Calico VXLAN interface for MTU from 1450 to 1410
* VXLAN on Azure should support MTU 1450. However, there is
history where performance measures have shown that 1410 is
needed to have expected performance. Flatcar Linux has the
same MTU 1410 override and note
* FCOS 31.20200323.3.2 was known to perform fine with 1450, but
now in 31.20200517.3.0 the right value seems to be 1410
2020-06-19 00:24:56 -07:00
Dalton Hubble
90e23f5822 Rename controller node label and NoSchedule taint
* Remove node label `node.kubernetes.io/master` from controller nodes
* Use `node.kubernetes.io/controller` (present since v1.9.5,
[#160](https://github.com/poseidon/typhoon/pull/160)) to node select controllers
* Rename controller NoSchedule taint from `node-role.kubernetes.io/master` to
`node-role.kubernetes.io/controller`
* Tolerate the new taint name for workloads that may run on controller nodes
and stop tolerating `node-role.kubernetes.io/master` taint
2020-06-19 00:12:13 -07:00
Dalton Hubble
c25c59058c Update Kubernetes from v1.18.3 to v1.18.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1184
2020-06-17 19:53:19 -07:00
Dalton Hubble
413585681b Remove unused Kubelet lock-file and exit-on-lock-contention
* Kubelet `--lock-file` and `--exit-on-lock-contention` date
back to usage of bootkube and at one point running Kubelet
in a "self-hosted" style whereby an on-host Kubelet (rkt)
started pods, but then a Kubelet DaemonSet was scheduled
and able to take over (hence self-hosted). `lock-file` and
`exit-on-lock-contention` flags supported this pivot. The
pattern has been out of favor (in bootkube too) for years
because of dueling Kubelet complexity
* Typhoon runs Kubelet as a container via an on-host systemd
unit using podman (Fedora CoreOS) or rkt (Flatcar Linux). In
fact, Typhoon no longer uses bootkube or control plane pivot
(let alone Kubelet pivot) and uses static pods since v1.16.0
* https://github.com/poseidon/typhoon/pull/536
2020-06-12 00:06:41 -07:00
Dalton Hubble
96711d7f17 Remove unused Kubelet cert / key Terraform state
* Generated Kubelet TLS certificate and key are not longer
used or distributed to machines since Kubelet TLS bootstrap
is used instead. Remove the certificate and key from state
2020-06-11 21:24:36 -07:00
Dalton Hubble
a287920169 Use strict mode for Container Linux Configs
* Enable terraform-provider-ct `strict` mode for parsing
Container Linux Configs and snippets
* Fix Container Linux Config systemd unit syntax `enable`
(old) to `enabled`
* Align with Fedora CoreOS which uses strict mode already
2020-06-09 23:00:36 -07:00
Dalton Hubble
20bfd69780 Change Kubelet container image publishing
* Build Kubelet container images internally and publish
to Quay and Dockerhub (new) as an alternative in case of
registry outage or breach
* Use our infra to provide single and multi-arch (default)
Kublet images for possible future use
* Docs: Show how to use alternative Kubelet images via
snippets and a systemd dropin (builds on #737)

Changes:

* Update docs with changes to Kubelet image building
* If you prefer to trust images built by Quay/Dockerhub,
automated image builds are still available with unique
tags (albeit with some limitations):
  * Quay automated builds are tagged `build-{short_sha}`
  (limit: only amd64)
  * Dockerhub automated builts are tagged `build-{tag}`
  and `build-master` (limit: only amd64, no shas)

Links:

* Kubelet: https://github.com/poseidon/kubelet
* Docs: https://typhoon.psdn.io/topics/security/#container-images
* Registries:
  * quay.io/poseidon/kubelet
  * docker.io/psdn/kubelet
2020-05-30 23:34:23 -07:00
Dalton Hubble
ba44408b76 Update Calico from v3.14.0 to v3.14.1
* https://docs.projectcalico.org/v3.14/release-notes/
2020-05-30 22:08:37 -07:00
Dalton Hubble
e72f916c8d Update etcd from v3.4.8 to v3.4.9
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v349-2020-05-20
2020-05-22 00:52:20 -07:00
Dalton Hubble
ecae6679ff Update Kubernetes from v1.18.2 to v1.18.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md
2020-05-20 20:37:39 -07:00
Dalton Hubble
4760543356 Set Kubelet image via kubelet.service KUBELET_IMAGE
* Write the systemd kubelet.service to use `KUBELET_IMAGE`
as the Kubelet. This provides a nice way to use systemd
dropins to temporarily override the image (e.g. during a
registry outage)

Note: Only Typhoon Kubelet images and registries are supported.
2020-05-19 22:39:53 -07:00