Commit Graph

1561 Commits

Author SHA1 Message Date
Dalton Hubble cc00afa4e1 Add Terraform v0.13 input variable validations
* Support for migrating from Terraform v0.12.x to v0.13.x
was added in v1.18.8
* Require Terraform v0.13+. Drop support for Terraform v0.12
2020-11-17 12:02:34 -08:00
Dalton Hubble 5c3b5a20de Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-11-14 13:32:04 -08:00
Dalton Hubble f5a83667e8 Update Grafana from v7.3.1 to v7.3.2
* https://github.com/grafana/grafana/releases/tag/v7.3.2
2020-11-14 13:30:30 -08:00
Dalton Hubble a911367c2e Update nginx-ingress from v0.41.0 to v0.41.2
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.2
2020-11-14 13:27:06 -08:00
Dalton Hubble f884de847e Discard Prometheus etcd gRPC failure alert
* Kubernetes watch expiry is not a gRPC code we care about
* Background: This rule is typically removed, but was added back in
2020-11-14 13:17:56 -08:00
Dalton Hubble 1b3a0f6ebc Add experimental Fedora CoreOS arm64 support on AWS
* Add experimental `arch` variable to Fedora CoreOS AWS,
accepting amd64 (default) or arm64 to support native
arm64/aarch64 clusters or mixed/hybrid clusters with
a worker pool of arm64 workers
* Add `daemonset_tolerations` variable to cluster module
(experimental)
* Add `node_taints` variable to workers module
* Requires flannel CNI and experimental Poseidon-built
arm64 Fedora CoreOS AMIs (published to us-east-1, us-east-2,
and us-west-1)

WARN:

* Our AMIs are experimental, may be removed at any time, and
will be removed when Fedora CoreOS publishes official arm64
AMIs. Do NOT use in production

Related:

* https://github.com/poseidon/typhoon/pull/682
2020-11-14 13:09:24 -08:00
Dalton Hubble 1113a22f61 Update Kubernetes from v1.19.3 to v1.19.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1194
2020-11-11 22:56:27 -08:00
Dalton Hubble 152c7d86bd Change bootstrap.service container from rkt to docker
* Use docker to run `bootstrap.service` container
* Background https://github.com/poseidon/typhoon/pull/855
2020-11-11 22:26:05 -08:00
Dalton Hubble 79deb8a967 Update Cilium from v1.9.0-rc3 to v1.9.0
* https://github.com/cilium/cilium/releases/tag/v1.9.0
2020-11-10 23:42:41 -08:00
Dalton Hubble f412f0d9f2 Update Calico from v3.16.4 to v3.16.5
* https://github.com/projectcalico/calico/releases/tag/v3.16.5
2020-11-10 22:58:19 -08:00
Phil Sautter eca6c4a1a1
Fix broken flatcar linux documentation links (#870)
* Fix old documentation links
2020-11-10 18:30:30 -08:00
Dalton Hubble 133d325013 Update nginx-ingress from v0.40.2 to v0.41.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.41.0
2020-11-08 14:34:52 -08:00
Dalton Hubble 4b05c0180e Update Grafana from v7.3.0 to v7.3.1
* https://github.com/grafana/grafana/releases/tag/v7.3.1
2020-11-08 14:13:39 -08:00
Dalton Hubble f49ab3a6ee Update Prometheus from v2.22.0 to v2.22.1
* https://github.com/prometheus/prometheus/releases/tag/v2.22.1
2020-11-08 14:12:24 -08:00
Dalton Hubble 0eef16b274 Improve and tidy Fedora CoreOS etcd-member.service
* Allow a snippet with a systemd dropin to set an alternate
image via `ETCD_IMAGE`, for consistency across Fedora CoreOS
and Flatcar Linux
* Drop comments about integrating system containers with
systemd-notify
2020-11-08 11:49:56 -08:00
Dalton Hubble ad1f59ce91 Change Flatcar etcd-member.service container from rkt to docker
* Use docker to run the `etcd-member.service` container
* Use env-file `/etc/etcd/etcd.env` like podman on FCOS
* Background: https://github.com/poseidon/typhoon/pull/855
2020-11-03 16:42:18 -08:00
Dalton Hubble 82e5ac3e7c Update Cilium from v1.8.5 to v1.9.0-rc3
* https://github.com/poseidon/terraform-render-bootstrap/pull/224
2020-11-03 10:29:07 -08:00
Dalton Hubble a8f7880511 Update Cilium from v1.8.4 to v1.8.5
* https://github.com/cilium/cilium/releases/tag/v1.8.5
2020-10-29 00:50:18 -07:00
Dalton Hubble cda5b93b09 Update kube-state-metrics from v2.0.0-alpha.1 to v2.0.0-alpha.2
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.2
2020-10-28 18:49:40 -07:00
Dalton Hubble 3e9f5f34de Update Grafana from v7.2.2 to v7.3.0
* https://github.com/grafana/grafana/releases/tag/v7.3.0
2020-10-28 17:46:26 -07:00
Dalton Hubble 893d139590 Update Calico from v3.16.3 to v3.16.4
* https://github.com/projectcalico/calico/releases/tag/v3.16.4
2020-10-26 00:50:40 -07:00
Dalton Hubble fc62e51b2a Update Grafana from v7.2.1 to v7.2.2
* https://github.com/grafana/grafana/releases/tag/v7.2.2
2020-10-22 00:14:04 -07:00
Dalton Hubble e5ba3329eb Remove bare-metal CoreOS Container Linux profiles
* Remove Matchbox profiles for CoreOS Container Linux
* Simplify the remaining Flatcat Linux profiles
2020-10-21 00:25:10 -07:00
Dalton Hubble 7c3f3ab6d0 Rename container-linux modules to flatcar-linux
* CoreOS Container Linux was deprecated in v1.18.3
* Continue transitioning docs and modules from supporting
both CoreOS and Flatcar "variants" of Container Linux to
now supporting Flatcar Linux and equivalents

Action Required: Update the Flatcar Linux modules `source`
to replace `s/container-linux/flatcar-linux`. See docs for
examples
2020-10-20 22:47:19 -07:00
Dalton Hubble a99a990d49 Remove unused Kubelet tls mounts
* Kubelet trusts only the cluster CA certificate (and
certificates in the Kubelet debian base image), there
is no longer a need to mount the host's trusted certs
* Similar change on Flatcar Linux in
https://github.com/poseidon/typhoon/pull/855

Rel: https://github.com/poseidon/typhoon/pull/810
2020-10-18 23:48:21 -07:00
Dalton Hubble df17253e72 Fix delete node permission on Fedora CoreOS node shutdown
* On cloud platforms, `delete-node.service` tries to delete the
local node (not always possible depending on preemption time)
* Since v1.18.3, kubelet TLS bootstrap generates a kubeconfig
in `/var/lib/kubelet` which should be used with kubectl in
the delete-node oneshot
2020-10-18 23:38:11 -07:00
Dalton Hubble eda78db08e Change Flatcar kubelet.service container from rkt to docker
* Use docker to run the `kubelet.service` container
* Update Kubelet mounts to match Fedora CoreOS
* Remove unused `/etc/ssl/certs` mount (see
https://github.com/poseidon/typhoon/pull/810)
* Remove unused `/usr/share/ca-certificates` mount
* Remove `/etc/resolv.conf` mount, Docker default is ok
* Change `delete-node.service` to use docker instead of rkt
and inline ExecStart, as was done on Fedora CoreOS
* Fix permission denied on shutdown `delete-node`, caused
by the kubeconfig mount changing with the introduction of
node TLS bootstrap

Background

* podmand, rkt, and runc daemonless container process runners
provide advantages over the docker daemon for system containers.
Docker requires workarounds for use in systemd units where the
ExecStart must tail logs so systemd can monitor the daemonized
container. https://github.com/moby/moby/issues/6791
* Why switch then? On Flatcar Linux, podman isn't shipped. rkt
works, but isn't developing while container standards continue
to move forward. Typhoon has used runc for the Kubelet runner
before in Fedora Atomic, but its more low-level. So we're left
with Docker, which is less than ideal, but shipped in Flatcar
* Flatcar Linux appears to be shifting system components to
use docker, which does provide some limited guards against
breakages (e.g. Flatcar cannot enable docker live restore)
2020-10-18 23:24:45 -07:00
Dalton Hubble afac46e39a Remove asset_dir variable and optional asset writes
* Originally, poseidon/terraform-render-bootstrap generated
TLS certificates, manifests, and cluster "assets" written
to local disk (`asset_dir`) during terraform apply cluster
bootstrap
* Typhoon v1.17.0 introduced bootstrapping using only Terraform
state to store cluster assets, to avoid ever writing sensitive
materials to disk and improve automated use-cases. `asset_dir`
was changed to optional and defaulted to "" (no writes)
* Typhoon v1.18.0 deprecated the `asset_dir` variable, removed
docs, and announced it would be deleted in future.
* Add Terraform output `assets_dir` map
* Remove the `asset_dir` variable

Cluster assets are now stored in Terraform state only. For those
who wish to write those assets to local files, this is possible
doing so explicitly.

```
resource local_file "assets" {
  for_each = module.yavin.assets_dist
  filename = "some-assets/${each.key}"
  content = each.value
}
```

Related:

* https://github.com/poseidon/typhoon/pull/595
* https://github.com/poseidon/typhoon/pull/678
2020-10-17 15:00:15 -07:00
Dalton Hubble b1e680ac0c Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-10-17 13:56:24 -07:00
Dalton Hubble 9fbfbdb854 Update Prometheus from v2.21.0 to v2.22.0
* https://github.com/prometheus/prometheus/releases/tag/v2.22.0
2020-10-17 12:38:25 -07:00
Dalton Hubble 511f5272f4 Update Calico from v3.15.3 to v3.16.3
* https://github.com/projectcalico/calico/releases/tag/v3.16.3
* https://github.com/poseidon/terraform-render-bootstrap/pull/212
2020-10-15 20:08:51 -07:00
Dalton Hubble 46ca5e8813 Update Kubernetes from v1.19.2 to v1.19.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193
2020-10-14 20:47:49 -07:00
Dalton Hubble 394e496cc7 Update Grafana from v7.2.0 to v7.2.1
* https://github.com/grafana/grafana/releases/tag/v7.2.1
2020-10-11 13:21:25 -07:00
Dalton Hubble a38ec1a856 Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-10-11 13:06:53 -07:00
Dalton Hubble 7881f4bd86 Update kube-state-metrics from v1.9.7 to v2.0.0-alpha.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.0.0-alpha.1
2020-10-11 12:35:43 -07:00
Dalton Hubble d5b5b7cb02 Update nginx-ingress from v0.40.0 to v0.40.2
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.2
2020-10-06 23:52:15 -07:00
Dalton Hubble 759a48be7c Update mkdocs-material from v5.5.12 to v6.0.1
* Update OS kernel, systemd, and docker verisons
2020-10-02 01:18:38 -07:00
Dalton Hubble b39a1d70da Update nginx-ingress from v0.35.0 to v0.40.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v0.40.0
2020-10-02 01:00:35 -07:00
Dalton Hubble 901f7939b2 Update Cilium from v1.8.3 to v1.8.4
* https://github.com/cilium/cilium/releases/tag/v1.8.4
2020-10-02 00:24:26 -07:00
Dalton Hubble d65085ce14 Update Grafana from v7.1.5 to v7.2.0
* https://github.com/grafana/grafana/releases/tag/v7.2.0
2020-09-24 20:58:32 -07:00
Dalton Hubble 343db5b578 Remove references to CoreOS Container Linux
* CoreOS Container Linux was deprecated in v1.18.3 (May 2020)
in favor of Fedora CoreOS and Flatcar Linux. CoreOS Container
Linux references were kept to give folks more time to migrate,
but AMIs have now been deleted. Time is up.

Rel: https://coreos.com/os/eol/
2020-09-24 20:51:02 -07:00
Dalton Hubble 444363be2d Update Kubernetes from v1.19.1 to v1.19.2
* Update flannel from v0.12.0 to v0.13.0-rc2
* Update flannel-cni from v0.4.0 to v0.4.1
* Update CNI plugins from v0.8.6 to v0.8.7
2020-09-16 20:05:54 -07:00
Dalton Hubble bc7ad25c60 Update Grafana dashboard for Kubelet v1.19
* Fix Kubelet pod and container count metrics dashboard
* https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/499
2020-09-15 23:21:56 -07:00
Dalton Hubble e838d4dc3d Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh upstream Prometheus rules/alerts and Grafana dashboards
2020-09-13 15:03:27 -07:00
Dalton Hubble 979c092ef6 Reduce apiserver metrics cardinality of non-core APIs
* Reduce `apiserver_request_duration_seconds_count` cardinality
by dropping series for non-core Kubernetes APIs. This is done
to match `apiserver_request_duration_seconds_count` relabeling
* These two relabels must be performed the same way to avoid
affecting new SLO calculations (upcoming)
* See https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/498

Related: https://github.com/poseidon/typhoon/pull/596
2020-09-13 14:47:49 -07:00
Dalton Hubble db8e94bb4b Update recommended Terraform provider versions
* Sync Terraform provider plugins with those used internally
2020-09-12 19:41:15 -07:00
Dalton Hubble eb093af9ed Drop Kubelet labelmap relabel for node_name
* Originally, Kubelet and CAdvisor metrics used a labelmap
relabel to add Kubernetes SD node labels onto timeseries
* With https://github.com/poseidon/typhoon/pull/596 that
relabel was dropped since node labels aren't usually that
valuable. `__meta_kubernetes_node_name` was retained but
the field name is empty
* Favor just using Prometheus server-side `instance` in
queries that require some node identifier for aggregation
or debugging

Fix https://github.com/poseidon/typhoon/issues/823
2020-09-12 19:40:00 -07:00
Dalton Hubble 36096f844d Promote Cilium from experimental to GA
* Cilium was added as an experimental CNI provider in June
* Since then, I've been choosing it for an increasing number
of clusters and scenarios.
2020-09-12 19:24:55 -07:00
Dalton Hubble d236628e53 Update Prometheus from v2.20.0 to v2.21.0
* https://github.com/prometheus/prometheus/releases/tag/v2.21.0
2020-09-12 19:20:54 -07:00
Dalton Hubble 577b927a2b Update Fedora CoreOS Config version from v1.0.0 to v1.1.0
* No notable changes in the config spec, just house keeping
* Require any snippets customization to update to v1.1.0. Version
skew between the main config and snippets will show an err message
* https://github.com/coreos/fcct/blob/master/docs/configuration-v1_1.md
2020-09-10 23:38:40 -07:00