Commit Graph

1538 Commits

Author SHA1 Message Date
Dalton Hubble
8b3d41d6a0 Update mkdocs-material from v5.4.0 to v5.5.1 2020-08-02 15:22:10 -07:00
Dalton Hubble
ccee5d3d89 Update from coreos/flannel-cni to poseidon/flannel-cni
* Update CNI plugins from v0.6.0 to v0.8.6 to fix several CVEs
* Update the base image to alpine:3.12
* Use `flannel-cni` as an init container and remove sleep
* https://github.com/poseidon/terraform-render-bootstrap/pull/205
* https://github.com/poseidon/flannel-cni
* https://quay.io/repository/poseidon/flannel-cni

Background

* Switch from github.com/coreos/flannel-cni v0.3.0 which was last
published by me in 2017 and is no longer accessible to me to maintain
or patch
* Port to the poseidon/flannel-cni rewrite, which releases v0.4.0
to continue the prior release numbering
2020-08-02 15:13:15 -07:00
Dalton Hubble
8aefd4f082 Relex terraform-provider-matchbox version constraint
* Allow use of terraform-provider-matchbox v0.3+ (which
allows v0.3.0 <= version < v1.0) for any pre 1.0 release
* Before, the requirement was v0.3.0 <= version < v0.4.0
2020-08-02 01:09:28 -07:00
Dalton Hubble
78e6409bd0 Fix flannel support on Fedora CoreOS
* Fedora CoreOS now ships systemd-udev's `default.link` while
Flannel relies on being able to pick its own MAC address for
the `flannel.1` link for tunneled traffic to reach cni0 on
the destination side, without being dropped
* This change first appeared in FCOS testing-devel 32.20200624.20.1
and is the behavior going forward in FCOS since it was added
to align FCOS network naming / configs with the rest of Fedora
and address issues related to the default being missing
* Flatcar Linux (and Container Linux) has a specific flannel.link
configuration builtin, so it was not affected
* https://github.com/coreos/fedora-coreos-tracker/issues/574#issuecomment-665487296

Note: Typhoon's recommended and default CNI provider is Calico,
unless `networking` is set to flannel directly.
2020-08-01 21:22:08 -07:00
Dalton Hubble
2aef42d4f6 Update Prometheus from v2.19.2 to v2.20.0
* https://github.com/prometheus/prometheus/releases/tag/v2.20.0
2020-07-25 16:37:28 -07:00
Dalton Hubble
b7d67757de Update Grafana from v7.1.0 to v7.1.1
* https://github.com/grafana/grafana/releases/tag/v7.1.1
2020-07-25 16:33:40 -07:00
Eldon
26f5d2d753
Fix some links in docs (#788) 2020-07-25 16:32:08 -07:00
Dalton Hubble
cd0a28904e Update Cilium from v1.8.1 to v1.8.2
* https://github.com/cilium/cilium/releases/tag/v1.8.2
2020-07-25 16:06:27 -07:00
Dalton Hubble
618f8b30fd Update CoreDNS from v1.6.7 to v1.7.0
* https://coredns.io/2020/06/15/coredns-1.7.0-release/
* Update Grafana dashboard with revised metrics names
2020-07-25 15:51:31 -07:00
Dalton Hubble
264d23a1b5 Declare etcd data directory permissions
* Set etcd data directory /var/lib/etcd permissions to 700
* On Flatcar Linux, /var/lib/etcd is pre-existing and Ignition
v2 doesn't overwrite the directory. Update the Container Linux
config, but add the manual chmod workaround to bootstrap for
Flatcar Linux users
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.4.md#v3410-2020-07-16
* https://github.com/etcd-io/etcd/pull/11798
2020-07-25 15:48:27 -07:00
Dalton Hubble
f96e91f225 Update etcd from v3.4.9 to v3.4.10
* https://github.com/etcd-io/etcd/releases/tag/v3.4.10
2020-07-18 14:08:22 -07:00
Dalton Hubble
efd4a0319d Update Grafana from v7.0.6 to v7.1.0
* https://github.com/grafana/grafana/releases/tag/v7.1.0
2020-07-18 13:54:56 -07:00
Dalton Hubble
6df6bf904a Show Cilium as a CNI provider option in docs
* Start to show Cilium as a CNI option
* https://github.com/cilium/cilium
2020-07-18 13:27:56 -07:00
Dalton Hubble
5fba20d358 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions with those
used internally
2020-07-18 13:19:25 -07:00
Dalton Hubble
a8d3d3bb12 Update ingress-nginx from v0.33.0 to v0.34.1
* Switch to ingress-nginx controller images from us.grc.io (eu, asia
can also be used if desired)
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.1
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.34.0
2020-07-15 22:43:49 -07:00
Dalton Hubble
9ea6d2c245 Update Kubernetes from v1.18.5 to v1.18.6
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1186
* https://github.com/poseidon/terraform-render-bootstrap/pull/201
2020-07-15 22:05:57 -07:00
Dalton Hubble
507aac9b78 Update mkdocs-material from v5.3.3 to v5.4.0 2020-07-11 22:56:59 -07:00
Dalton Hubble
dfd2a0ec23 Update Grafana from v7.0.5 to v7.0.6
* https://github.com/grafana/grafana/releases/tag/v7.0.6
2020-07-09 21:10:48 -07:00
Dalton Hubble
e3bf7d8f9b Update Prometheus from v2.19.1 to v2.19.2
* https://github.com/prometheus/prometheus/releases/tag/v2.19.2
2020-07-09 21:08:55 -07:00
Dalton Hubble
49050320ce Update Cilium from v1.8.0 to v1.8.1
* https://github.com/cilium/cilium/releases/tag/v1.8.1
2020-07-05 16:00:00 -07:00
Dalton Hubble
74e025c9e4 Update Grafana from v7.0.4 to v7.0.5
* https://github.com/grafana/grafana/releases/tag/v7.0.5
2020-07-05 15:49:34 -07:00
Dalton Hubble
257a49ce37 Remove CoreOS Container Linux image names from docs
* Remove coreos-stable, coreos-beta, and coreos-alpha channel
references from docs
* CoreOS Container Linux is end of life (see changelog)
2020-06-30 01:36:53 -07:00
Dalton Hubble
df3f40bcce Allow using Flatcar Linux edge on Azure
* Set Kubelet cgroup driver to systemd when Flatcar Linux edge
is chosen

Note: Typhoon module status assumes use of the stable variant of
an OS channel/stream. Its possible to use earlier variants and
those are sometimes tested or developed against, but stable is
the recommendation
2020-06-30 01:35:29 -07:00
Dalton Hubble
32886cfba1 Promote Fedora CoreOS on Google Cloud to stable status 2020-06-29 23:09:11 -07:00
Dalton Hubble
0ba2c1a4da Fix terraform fmt in firewall rules 2020-06-29 23:04:54 -07:00
Dalton Hubble
430d139a5b Remove os_image variable on Google Cloud Fedora CoreOS
* In v1.18.3, the `os_stream` variable was added to select
a Fedora CoreOS image stream (stable, testing, next) on
AWS and Google Cloud (which publish official streams)
* Remove `os_image` variable deprecated in v1.18.3. Manually
uploaded images are no longer needed
2020-06-29 22:57:11 -07:00
Dalton Hubble
7c6ab21b94 Isolate each DigitalOcean cluster in its own VPC
* DigitalOcean introduced Virtual Private Cloud (VPC) support
to match other clouds and enhance the prior "private networking"
feature. Before, droplet's belonging to different clusters (but
residing in the same region) could reach one another (although
Typhoon firewall rules prohibit this). Now, droplets in a VPC
reside in their own network
* https://www.digitalocean.com/docs/networking/vpc/
* Create droplet instances in a VPC per cluster. This matches the
design of Typhoon AWS, Azure, and GCP.
* Require `terraform-provider-digitalocean` v1.16.0+ (action required)
* Output `vpc_id` for use with an attached DigitalOcean
loadbalancer
2020-06-28 23:25:30 -07:00
Dalton Hubble
21178868db Revert "Update Prometheus from v2.19.1 to v2.19.2"
* Prometheus has not published the v1.19.2
* This reverts commit 81b6f54169.
2020-06-27 14:53:58 -07:00
Dalton Hubble
9dcf35e393 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions with those
used internally
2020-06-27 14:44:18 -07:00
Dalton Hubble
81b6f54169 Update Prometheus from v2.19.1 to v2.19.2
* https://github.com/prometheus/prometheus/releases/tag/v2.19.2
2020-06-27 14:34:30 -07:00
Dalton Hubble
7bce15975c Update Kubernetes from v1.18.4 to v1.18.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1185
2020-06-27 13:52:18 -07:00
Dalton Hubble
1f83ae7dbb Update Calico from v3.14.1 to v3.15.0
* https://docs.projectcalico.org/v3.15/release-notes/
2020-06-26 02:40:12 -07:00
Dalton Hubble
a10a1cee9f Update mkdocs-material from v5.3.0 to v5.3.3 2020-06-26 02:24:37 -07:00
Dalton Hubble
a79ad34ba3 Update Grafana from v7.0.3 to v7.0.4
* https://github.com/grafana/grafana/releases/tag/v7.0.4
2020-06-26 02:06:38 -07:00
Dalton Hubble
99a11442c7 Update Prometheus from v2.19.0 to v2.19.1
* https://github.com/prometheus/prometheus/releases/tag/v2.19.1
2020-06-26 02:01:58 -07:00
Dalton Hubble
d27f367004 Update Cilium from v1.8.0-rc4 to v1.8.0
* https://github.com/cilium/cilium/releases/tag/v1.8.0
2020-06-22 22:26:49 -07:00
Dalton Hubble
e9c8520359 Add experimental Cilium CNI provider
* Accept experimental CNI `networking` mode "cilium"
* Run Cilium v1.8.0-rc4 with overlay vxlan tunnels and a
minimal set of features. We're interested in:
  * IPAM: Divide pod_cidr into /24 subnets per node
  * CNI networking pod-to-pod, pod-to-external
  * BPF masquerade
  * NetworkPolicy as defined by Kubernetes (no L7 Policy)
* Continue using kube-proxy with Cilium probe mode
* Firewall changes:
  * Require UDP 8472 for vxlan (Linux kernel default) between nodes
  * Optional ICMP echo(8) between nodes for host reachability
    (health)
  * Optional TCP 4240 between nodes for endpoint reachability (health)

Known Issues:

* Containers with `hostPort` don't listen on all host addresses,
these workloads must use `hostNetwork` for now
https://github.com/cilium/cilium/issues/12116
* Erroneous warning on Fedora CoreOS
https://github.com/cilium/cilium/issues/10256

Note: This is experimental. It is not listed in docs and may be
changed or removed without a deprecation notice

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/192
* https://github.com/cilium/cilium/issues/12217
2020-06-21 20:41:53 -07:00
Dalton Hubble
37f00a3882 Reduce Calcio MTU on Fedora CoreOS Azure
* Change the Calico VXLAN interface for MTU from 1450 to 1410
* VXLAN on Azure should support MTU 1450. However, there is
history where performance measures have shown that 1410 is
needed to have expected performance. Flatcar Linux has the
same MTU 1410 override and note
* FCOS 31.20200323.3.2 was known to perform fine with 1450, but
now in 31.20200517.3.0 the right value seems to be 1410
2020-06-19 00:24:56 -07:00
Dalton Hubble
4cfafeaa07 Fix Kubelet starting before hostname set on FCOS AWS
* Fedora CoreOS `kubelet.service` can start before the hostname
is set. Kubelet reads the hostname to determine the node name to
register. If the hostname was read as localhost, Kubelet will
continue trying to register as localhost (problem)
* This race manifests as a node that appears NotReady, the Kubelet
is trying to register as localhost, while the host itself (by then)
has an AWS provided hostname. Restarting kubelet.service is a
manual fix so Kubelet re-reads the hostname
* This race could only be shown on AWS, not on Google Cloud or
Azure despite attempts. Bare-metal and DigitalOcean differ and
use hostname-override (e.g. afterburn) so they're not affected
* Wait for nodes to have a non-localhost hostname in the oneshot
that awaits /etc/resolve.conf. Typhoon has no valid cases for a
node hostname being localhost (not even single-node clusters)

Related Openshift: https://github.com/openshift/machine-config-operator/pull/1813
Close https://github.com/poseidon/typhoon/issues/765
2020-06-19 00:19:54 -07:00
Dalton Hubble
90e23f5822 Rename controller node label and NoSchedule taint
* Remove node label `node.kubernetes.io/master` from controller nodes
* Use `node.kubernetes.io/controller` (present since v1.9.5,
[#160](https://github.com/poseidon/typhoon/pull/160)) to node select controllers
* Rename controller NoSchedule taint from `node-role.kubernetes.io/master` to
`node-role.kubernetes.io/controller`
* Tolerate the new taint name for workloads that may run on controller nodes
and stop tolerating `node-role.kubernetes.io/master` taint
2020-06-19 00:12:13 -07:00
Dalton Hubble
6234147948 Update recommended Terraform provider versions
* Sync Terraform provider plugin versions with those
used internally
2020-06-18 01:03:37 -07:00
Dalton Hubble
c25c59058c Update Kubernetes from v1.18.3 to v1.18.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1184
2020-06-17 19:53:19 -07:00
Dalton Hubble
bc9b808d44 Update nginx-ingress from v0.32.0 to v0.33.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-0.33.0
2020-06-16 18:44:40 -07:00
Dalton Hubble
4b0203fdb2 Fix typo in DigitalOcean docs title 2020-06-16 18:33:56 -07:00
Dalton Hubble
331566e1f7 Update mkdocs packages for website 2020-06-16 18:20:19 -07:00
Dalton Hubble
04520e447c Update node-exporter from v1.0.0 to v1.0.1
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.1
2020-06-16 17:57:09 -07:00
Dalton Hubble
413585681b Remove unused Kubelet lock-file and exit-on-lock-contention
* Kubelet `--lock-file` and `--exit-on-lock-contention` date
back to usage of bootkube and at one point running Kubelet
in a "self-hosted" style whereby an on-host Kubelet (rkt)
started pods, but then a Kubelet DaemonSet was scheduled
and able to take over (hence self-hosted). `lock-file` and
`exit-on-lock-contention` flags supported this pivot. The
pattern has been out of favor (in bootkube too) for years
because of dueling Kubelet complexity
* Typhoon runs Kubelet as a container via an on-host systemd
unit using podman (Fedora CoreOS) or rkt (Flatcar Linux). In
fact, Typhoon no longer uses bootkube or control plane pivot
(let alone Kubelet pivot) and uses static pods since v1.16.0
* https://github.com/poseidon/typhoon/pull/536
2020-06-12 00:06:41 -07:00
Dalton Hubble
96711d7f17 Remove unused Kubelet cert / key Terraform state
* Generated Kubelet TLS certificate and key are not longer
used or distributed to machines since Kubelet TLS bootstrap
is used instead. Remove the certificate and key from state
2020-06-11 21:24:36 -07:00
Dalton Hubble
c9059d3fe9 Update Prometheus from v2.19.0-rc.0 to v2.19.0
* https://github.com/prometheus/prometheus/releases/tag/v2.19.0
2020-06-09 23:05:03 -07:00
Dalton Hubble
a287920169 Use strict mode for Container Linux Configs
* Enable terraform-provider-ct `strict` mode for parsing
Container Linux Configs and snippets
* Fix Container Linux Config systemd unit syntax `enable`
(old) to `enabled`
* Align with Fedora CoreOS which uses strict mode already
2020-06-09 23:00:36 -07:00