Commit Graph

497 Commits

Author SHA1 Message Date
Dalton Hubble 4e1b8f22df Add support for Flatcar Linux on Azure
* Accept `os_image` "flatcar-stable" and "flatcar-beta" to
use Kinvolk's Flatcar Linux images from the Azure Marketplace

Note: Flatcar Linux Azure Marketplace images require terms be
accepted before use
2020-03-12 22:52:48 -07:00
Dalton Hubble ab7913a061 Accept initial worker node labels and taints map on bare-metal
* Add `worker_node_labels` map from node name to a list of initial
node label strings
* Add `worker_node_taints` map from node name to a list of initial
node taint strings
* Unlike cloud platforms, bare-metal node labels and taints
are defined via a map from node name to list of labels/taints.
Bare-metal clusters may have heterogeneous hardware so per node
labels and taints are accepted
* Only worker node names are allowed. Workloads are not scheduled
on controller nodes so altering their labels/taints isn't suitable

```
module "mercury" {
  ...

  worker_node_labels = {
    "node2" = ["role=special"]
  }

  worker_node_taints = {
    "node2" = ["role=special:NoSchedule"]
  }
}
```

Related: https://github.com/poseidon/typhoon/issues/429
2020-03-09 00:12:02 -07:00
Dalton Hubble 7b0ea23cdc Upgrade terraform-provider-azurerm to v2.0+
* Add support for `terraform-provider-azurerm` v2.0+. Require
`terraform-provider-azurerm` v2.0+ and drop v1.x support since
the Azure provider major release is not backwards compatible
* Use Azure's new Linux VM and Linux VM Scale Set resources
* Change controller's Azure disk caching to None
* Associate subnets (in addition to NICs) with security groups
(aesthetic)
* If set, change `worker_priority` from `Low` to `Spot` (action required)

Related:

* https://www.terraform.io/docs/providers/azurerm/guides/2.0-upgrade-guide.html
2020-03-08 17:40:13 -07:00
Dalton Hubble c4683c5bad Refresh Prometheus alerts and Grafana dashboards
* Add 2 min wait before KubeNodeUnreachable to be less
noisy on premeptible clusters
* Add a BlackboxProbeFailure alert for any failing probes
for services annotated `prometheus.io/probe: true`
2020-03-02 20:08:37 -08:00
Dalton Hubble 51cee6d5a4 Change Container Linux etcd-member to fetch with docker://
* Quay has historically generated ACI signatures for images to
facilitate rkt's notions of verification (it allowed authors to
actually sign images, though `--trust-keys-from-https` is in use
since etcd and most authors don't sign images). OCI standardization
didn't adopt verification ideas and checking signatures has fallen
out of favor.
* Fix an issue where Quay no longer seems to be generating ACI
signatures for new images (e.g. quay.io/coreos/etcd:v.3.4.4)
* Don't be alarmed by rkt `--insecure-options=image`. It refers
to disabling image signature checking (i.e. docker pull doesn't
check signatures either)
* System containers for Kubelet and bootstrap have transitioned
to the docker:// transport, so there is precedent and this brings
all the system containers on Container Linux controllers into
alignment
2020-03-02 19:57:45 -08:00
Dalton Hubble 87f9a2fc35 Add automatic worker deletion on Fedora CoreOS clouds
* On clouds where workers can scale down or be preempted
(AWS, GCP, Azure), shutdown runs delete-node.service to
remove a node a prevent NotReady nodes from lingering
* Add the delete-node.service that wasn't carried over
from Container Linux and port it to use podman
2020-02-29 20:22:03 -08:00
Dalton Hubble 6de5cf5a55 Update etcd from v3.4.3 to v3.4.4
* https://github.com/etcd-io/etcd/releases/tag/v3.4.4
2020-02-29 16:19:29 -08:00
Dalton Hubble 3250994c95 Use a route table with separate (rather than inline) routes
* Allow users to extend the route table using a data reference
and adding route resources (e.g. unusual peering setups)
* Note: Internally connecting AWS clusters can reduce cross-cloud
flexibility and inhibits blue-green cluster patterns. It is not
recommended
2020-02-25 23:21:58 -08:00
Dalton Hubble f4d260645c Update node-exporter from v0.18.1 to v1.0.0-rc.0
* Update mdadm alert rule; node-exporter adds `state` label to
`node_md_disks` and removes `node_md_disks_active`
* https://github.com/prometheus/node_exporter/releases/tag/v1.0.0-rc.0
2020-02-25 22:29:52 -08:00
Dalton Hubble d9219a6722 Update nginx-ingress from v0.29.0 to v0.30.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.30.0
2020-02-25 22:11:59 -08:00
Dalton Hubble 60c7eb85ee Update nginx-ingress from v0.28.0 to v0.29.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.29.0
2020-02-22 15:57:59 -08:00
Dalton Hubble 4c964b56a0 Update kube-state-metrics from v1.9.4 to v1.9.5
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.5
2020-02-22 15:21:10 -08:00
Dalton Hubble 1fbd6835f2 Update Grafana from v6.6.1 to v6.6.2
* https://github.com/grafana/grafana/releases/tag/v6.6.2
2020-02-22 15:19:24 -08:00
Dalton Hubble e4d977bfcd Fix worker_node_labels for initial Fedora CoreOS
* Add Terraform strip markers to consume beginning and
trailing whitespace in templated Kubelet arguments for
podman (Fedora CoreOS only)
* Fix initial `worker_node_labels` being quietly ignored
on Fedora CoreOS cloud platforms that offer the feature
* Close https://github.com/poseidon/typhoon/issues/650
2020-02-22 15:12:35 -08:00
Dalton Hubble 4a38fb5927 Update CoreDNS from v1.6.6 to v1.6.7
* https://coredns.io/2020/01/28/coredns-1.6.7-release/
2020-02-18 21:46:19 -08:00
Dalton Hubble 7ca03e5219 Update Prometheus from v1.15.2 to v1.16.0
* https://github.com/prometheus/prometheus/releases/tag/v2.16.0
2020-02-14 12:10:56 -08:00
Dalton Hubble 362b3fac5c Add guide for Typhoon with Flatcar Linux on DigitalOcean
* Add docs on manually uploading a Flatcar Linux DigitalOcean
bin image as a custom image and using a data reference
* Set status of Flatcar Linux on DigitalOcean to alpha
* IPv6 is not supported for DigitalOcean custom images
2020-02-14 12:08:58 -08:00
Dalton Hubble 32db59b9eb Update CHANGELOG sections and links 2020-02-14 12:05:51 -08:00
Dalton Hubble 008817b0aa Promote Fedora CoreOS AWS/bare-metal to beta
* Remove alpha warnings from docs headers
2020-02-13 14:25:22 -08:00
Dalton Hubble 49d3b9e6b3 Set docker log driver to json-file on Fedora CoreOS
* Fix the last minor issue for Fedora CoreOS clusters to pass CNCF's
Kubernetes conformance tests
* Kubelet supports a seldom used feature `kubectl logs --limit-bytes=N`
to trim a log stream to a desired length. Kubelet handles this in the
CRI driver. The Kubelet docker shim only supports the limit bytes
feature when Docker is configured with the default `json-file` logging
driver
* CNCF conformance tests started requiring limit-bytes be supported,
indirectly forcing the log driver choice until either the Kubelet or
the conformance tests are fixed
* Fedora CoreOS defaults Docker to use `journald` (desired). For now,
as a workaround to offer conformant clusters, the log driver can
be set back to `json-file`. RHEL CoreOS likely won't have noticed the
non-conformance since its using crio runtime
* https://github.com/kubernetes/kubernetes/issues/86367

Note: When upstream has a fix, the aim is to drop the docker config
override and use the journald default
2020-02-11 23:00:38 -08:00
Dalton Hubble 1243f395d1 Update Kubernetes from v1.17.2 to v1.17.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1173
2020-02-11 20:22:14 -08:00
Dalton Hubble 846f11097f Update Fedora CoreOS kernel arguments to align with upstream
* Align bare-metal kernel arguments with upstream docs
* Add missing initrd argument which can cause issues if
not present. Fix #638
* Add tty0 and ttyS0 consoles (matches Container Linux)
* Remove unused coreos.inst=yes

Related: https://docs.fedoraproject.org/en-US/fedora-coreos/bare-metal/
2020-02-11 20:11:19 -08:00
Dalton Hubble ba84f86dc7 Add guide for Typhoon with Flatcar Linux on Google Cloud
* Add docs on manually uploading a Flatcar Linux GCE/GCP gzipped
tarball image as a Compute Engine image for use with the Typhoon
container-linux module
* Set status of Flatcar Linux on Google Cloud to alpha
2020-02-11 19:38:40 -08:00
Dalton Hubble 34c3d7cc39 Update Grafana from v6.6.0 to v6.6.1
* https://github.com/grafana/grafana/releases/tag/v6.6.1
2020-02-08 14:50:33 -08:00
Dalton Hubble ca96a1335c Update Calico from v3.11.2 to v3.12.0
* https://docs.projectcalico.org/release-notes/#v3120
* Remove reverse packet filter override, since Calico no
longer relies on the setting
* https://github.com/coreos/fedora-coreos-tracker/issues/219
* https://github.com/projectcalico/felix/pull/2189
2020-02-06 00:43:33 -08:00
Dalton Hubble e339fbd2b6 Update kube-state-metrics from v1.9.3 to v1.9.4
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.4
2020-02-04 21:33:34 -08:00
Dalton Hubble 8cc303c9ac Add module for Fedora CoreOS on Google Cloud
* Add Typhoon Fedora CoreOS on Google Cloud as alpha
* Add docs on uploading the Fedora CoreOS GCP gzipped tarball to
Google Cloud storage to create a boot disk image
2020-02-01 15:21:40 -08:00
Dalton Hubble b19ba16afa Update nginx-ingress from v0.27.1 to v0.28.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.28.0
2020-01-30 18:00:23 -08:00
Dalton Hubble d127a7345c Update Grafana from v6.5.3 to v6.6.0
* https://github.com/grafana/grafana/releases/tag/v6.6.0
2020-01-27 20:46:32 -08:00
Dalton Hubble 5643ad525f Promote Fedora CoreOS from preview to alpha in docs
* Add an announcement to the website as well
2020-01-23 08:47:18 -08:00
Dalton Hubble d5b7ce8f27 Update kube-state-metrics from v1.9.2 to v1.9.3
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.3
2020-01-23 00:03:16 -08:00
Dalton Hubble 1cda5bcd2a Update Kubernetes from v1.17.1 to v1.17.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1172
2020-01-21 18:27:39 -08:00
Dalton Hubble bda73264f7 Update nginx-ingress from v0.26.1 to v0.27.1
* Change runAsUser from 33 to 101 for new alpine-based image
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.27.1
2020-01-20 15:22:16 -08:00
Dalton Hubble dd930a2ff9 Update bare-metal Fedora CoreOS image location
* Use Fedora CoreOS production download streams (change)
* Use live PXE kernel and initramfs images
* https://getfedora.org/coreos/download/
* Update docs example to use public images (cache is still
recommended at large scale) and stable stream
2020-01-20 14:44:06 -08:00
Dalton Hubble 03ff3a9cf3 Update kube-state-metrics from v1.9.1 to v1.9.2
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.2
2020-01-18 15:32:10 -08:00
Dalton Hubble 48703f9906 Update Grafana from v6.5.2 to v6.5.3
* https://github.com/grafana/grafana/releases/tag/v6.5.3
2020-01-18 15:30:39 -08:00
Dalton Hubble 7daabd28b5 Update Calico from v3.11.1 to v3.11.2
* https://docs.projectcalico.org/v3.11/release-notes/
2020-01-18 13:45:24 -08:00
Dalton Hubble 0e2fc89f78 Update kube-state-metrics from v1.9.0 to v1.9.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.1
2020-01-11 14:15:55 -08:00
Dalton Hubble b1f521fc4a Allow terraform-provider-google v3.x plugin versions
* Typhoon Google Cloud is compatible with `terraform-provider-google`
v3.x releases
* No v3.x specific features are used, so v2.19+ provider versions are
still allowed, to ease migrations
2020-01-11 14:07:18 -08:00
Dalton Hubble 73588cfad3 Update Prometheus from v2.15.1 to v2.15.2
* https://github.com/prometheus/prometheus/releases/tag/v2.15.2
2020-01-06 22:08:34 -08:00
Dalton Hubble bb586b60da Reduce Prometheus addon's node-exporter tolerations
* Change node-exporter DaemonSet tolerations from tolerating
all possible NoSchedule taints to tolerating the master taint
and the not ready taint (we'd like metrics regardless)
* Users who add custom node taints must add their custom taints
to the addon node-exporter DaemonSet. As an addon, its expected
users copy and manipulate manifests out-of-band in their own
systems
2020-01-06 21:24:24 -08:00
Dalton Hubble 43e05b9131 Enable kube-proxy metrics and allow Prometheus scrapes
* Configure kube-proxy --metrics-bind-address=0.0.0.0 (default
127.0.0.1) to serve metrics on 0.0.0.0:10249
* Add firewall rules to allow Prometheus (resides on a worker) to
scrape kube-proxy service endpoints on controllers or workers
* Add a clusterIP: None service for kube-proxy endpoint discovery
2020-01-06 21:11:18 -08:00
Dalton Hubble b2eb3e05d0 Disable Kubelet 127.0.0.1.10248 healthz endpoint
* Kubelet runs a healthz server listening on 127.0.0.1:10248
by default. Its unused by Typhoon and can be disabled
* https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
2019-12-29 11:23:25 -08:00
Dalton Hubble f1f4cd6fc0 Inline Container Linux kubelet.service, deprecate kubelet-wrapper
* Change kubelet.service on Container Linux nodes to ExecStart Kubelet
inline to replace the use of the host OS kubelet-wrapper script
* Express rkt run flags and volume mounts in a clear, uniform way to
make the Kubelet service easier to audit, manage, and understand
* Eliminate reliance on a Container Linux kubelet-wrapper script
* Typhoon for Fedora CoreOS developed a kubelet.service that similarly
uses an inline ExecStart (except with podman instead of rkt) and a
more minimal set of volume mounts. Adopt the volume improvements:
  * Change Kubelet /etc/kubernetes volume to read-only
  * Change Kubelet /etc/resolv.conf volume to read-only
  * Remove unneeded /var/lib/cni volume mount

Background:

* kubelet-wrapper was added in CoreOS around the time of Kubernetes v1.0
to simplify running a CoreOS-built hyperkube ACI image via rkt-fly. The
script defaults are no longer ideal (e.g. rkt's notion of trust dates
back to quay.io ACI image serving and signing, which informed the OCI
standard images we use today, though they still lack rkt's signing ideas).
* Shipping kubelet-wrapper was regretted at CoreOS, but remains in the
distro for compatibility. The script is not updated to track hyperkube
changes, but it is stable and kubelet.env overrides bridge most gaps
* Typhoon Container Linux nodes have used kubelet-wrapper to rkt/rkt-fly
run the Kubelet via the official k8s.gcr.io hyperkube image using overrides
(new image registry, new image format, restart handling, new mounts, new
entrypoint in v1.17).
* Observation: Most of what it takes to run a Kubelet container is defined
in Typhoon, not in kubelet-wrapper. The wrapper's value is now undermined
by having to workaround its dated defaults. Typhoon may be better served
defining Kubelet.service explicitly
* Typhoon for Fedora CoreOS developed a kubelet.service without the use
of a host OS kubelet-wrapper which is both clearer and eliminated some
volume mounts
2019-12-29 11:17:26 -08:00
Dalton Hubble 11565ffa8a Update Calico from v3.10.2 to v3.11.1
* https://docs.projectcalico.org/v3.11/release-notes/
2019-12-28 11:08:03 -08:00
Dalton Hubble a4e843693f Update Prometheus from v2.15.0 to v2.15.1
* https://github.com/prometheus/prometheus/releases/tag/v2.15.1
2019-12-26 09:12:55 -05:00
Dalton Hubble f48e43c0b1 Update Prometheus from v2.14.0 to v2.15.0
* https://github.com/prometheus/prometheus/releases/tag/v2.15.0
2019-12-24 10:52:19 -05:00
Dalton Hubble daa8d9d9ec Update CoreDNS from v1.6.5 to v1.6.6
* https://coredns.io/2019/12/11/coredns-1.6.6-release/
2019-12-22 10:47:19 -05:00
Dalton Hubble 52d11096dc Update kube-state-metrics from v1.9.0-rc.1 to v1.9.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.0-rc.0
2019-12-20 13:53:37 -08:00
Dalton Hubble 1b9fa2e688 Update Grafana from v6.5.1 to v6.5.2
* https://github.com/grafana/grafana/releases/tag/v6.5.2
2019-12-14 15:25:48 -08:00