Commit Graph

306 Commits

Author SHA1 Message Date
Dalton Hubble 70bdc9ec94 Allow bootstrap re-apply for Fedora CoreOS GCP
* Problem: Fedora CoreOS images are manually uploaded to GCP. When a
cluster is created with a stale image, Zincati immediately checks
for the latest stable image, fetches, and reboots. In practice,
this can unfortunately occur exactly during the initial cluster
bootstrap phase.

* Recommended: Upload the latest Fedora CoreOS image regularly
* Mitigation: Allow a failed bootstrap.service run (which won't touch
the done ConditionalPathExists) to be re-run by running `terraforma apply`
again. Add a known issue to CHANGES
* Update docs to show the current Fedora CoreOS stable version to
reduce likelihood users see this issue

 Longer term ideas:

* Ideal: Fedora CoreOS publishes a stable channel. Instances will always
boot with the latest image in a channel. The problem disappears since
it works the same way AWS does
* Timer: Consider some timer-based approach to have zincati delay any
system reboots for the first ~30 min of a machine's life. Possibly just
configured on the controller node https://github.com/coreos/zincati/pull/251
* External coordination: For Container Linux, locksmith filled a similar
role and was disabled to allow CLUO to coordinate reboots. By running
atop Kubernetes, it was not possible for the reboot to occur before
cluster bootstrap
* Rely on https://github.com/coreos/zincati/issues/115 to delay the
reboot since bootstrap involves an SSH session
* Use path-based activation of zincati on controllers and set that
path at the end of the bootstrap process

Rel: https://github.com/coreos/fedora-coreos-tracker/issues/239
2020-03-28 18:12:31 -07:00
Dalton Hubble 144bb9403c Add support for Fedora CoreOS snippets
* Refresh snippets customization docs
* Requires terraform-provider-ct v0.5+
2020-03-28 16:15:04 -07:00
Dalton Hubble d25f23e675 Update docs from Kubernetes v1.17.4 to v1.18.0 2020-03-25 20:28:30 -07:00
Dalton Hubble c3bf8bcf96 Add Fedora CoreOS to issue template and docs
* Update several Container Linux references to start
referring to Flatcar Linux
* Update docs and mentions of Fedora CoreOS
2020-03-25 00:36:15 -07:00
Dalton Hubble 5d1e4ad333 Deprecate asset_dir variable and remove docs
* Remove docs for the `asset_dir` variable and deprecate
it in CHANGES. It will be removed in an upcoming release
* Typhoon v1.17.0 introduced a new mechanism for managing
and distributing generated assets that stopped relying on
writing out to disk. `asset_dir` became optional and
defaulted to being unset / off (recommended)
2020-03-25 00:00:01 -07:00
Dalton Hubble 9f702c72d2 Rename DigitalOcean image variable to os_image
* Rename variable `image` to `os_image` to match the naming
used for the same purpose on other supported platforms (e.g.
AWS, Azure, GCP)
2020-03-24 23:49:37 -07:00
Dalton Hubble 590d941f50 Switch from upstream hyperkube image to individual images
* Kubernetes plans to stop releasing the hyperkube container image
* Upstream will continue to publish `kube-apiserver`, `kube-controller-manager`,
`kube-scheduler`, and `kube-proxy` container images to `k8s.gcr.io`
* Upstream will publish Kubelet only as a binary for distros to package,
either as a DEB/RPM on traditional distros or a container image on
container-optimized operating systems
* Typhoon will package the upstream Kubelet (checksummed) and its
dependencies as a container image for use on CoreOS Container Linux,
Flatcar Linux, and Fedora CoreOS
* Update the Typhoon container image security policy to list
`quay.io/poseidon/kubelet`as an official distributed artifact

Hyperkube: https://github.com/kubernetes/kubernetes/pull/88676
Kubelet Container Image: https://github.com/poseidon/kubelet
Kubelet Quay Repo: https://quay.io/repository/poseidon/kubelet
2020-03-21 15:43:05 -07:00
Dalton Hubble 2a5dddeb9d Promote Fedora CoreOS AWS and Google Cloud
* Promote Fedora CoreOS AWS to stable
* Promote Fedora CoreOS GCP to beta
2020-03-16 22:12:26 -07:00
Dalton Hubble 75fb4e5d11 Remove Container Linux Update Operator (CLUO) addon
* Stop providing example manifests for the Container Linux
Update Operator (CLUO)
* CLUO requires patches to support Kubernetes v1.16+, but the
project and push access is rather unowned
* CLUO hasn't been in active use in our clusters and won't be
relevant beyond Container Linux. Not to say folks can't patch
it and run it on their own. Examples just aren't provided here

Related: https://github.com/coreos/container-linux-update-operator/pull/197
2020-03-16 22:05:17 -07:00
Dalton Hubble 1a139ef6f1 Update recommended Terraform versions and providers
* Sync the documented Terraform versions and provider
plugin versions to those that are actively used/tested
by the author
2020-03-16 21:40:52 -07:00
Dalton Hubble bc7902f40a Update Kubernetes from v1.17.3 to v1.17.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1174
2020-03-13 00:06:41 -07:00
Dalton Hubble 4e1b8f22df Add support for Flatcar Linux on Azure
* Accept `os_image` "flatcar-stable" and "flatcar-beta" to
use Kinvolk's Flatcar Linux images from the Azure Marketplace

Note: Flatcar Linux Azure Marketplace images require terms be
accepted before use
2020-03-12 22:52:48 -07:00
Dalton Hubble ab7913a061 Accept initial worker node labels and taints map on bare-metal
* Add `worker_node_labels` map from node name to a list of initial
node label strings
* Add `worker_node_taints` map from node name to a list of initial
node taint strings
* Unlike cloud platforms, bare-metal node labels and taints
are defined via a map from node name to list of labels/taints.
Bare-metal clusters may have heterogeneous hardware so per node
labels and taints are accepted
* Only worker node names are allowed. Workloads are not scheduled
on controller nodes so altering their labels/taints isn't suitable

```
module "mercury" {
  ...

  worker_node_labels = {
    "node2" = ["role=special"]
  }

  worker_node_taints = {
    "node2" = ["role=special:NoSchedule"]
  }
}
```

Related: https://github.com/poseidon/typhoon/issues/429
2020-03-09 00:12:02 -07:00
Dalton Hubble 7b0ea23cdc Upgrade terraform-provider-azurerm to v2.0+
* Add support for `terraform-provider-azurerm` v2.0+. Require
`terraform-provider-azurerm` v2.0+ and drop v1.x support since
the Azure provider major release is not backwards compatible
* Use Azure's new Linux VM and Linux VM Scale Set resources
* Change controller's Azure disk caching to None
* Associate subnets (in addition to NICs) with security groups
(aesthetic)
* If set, change `worker_priority` from `Low` to `Spot` (action required)

Related:

* https://www.terraform.io/docs/providers/azurerm/guides/2.0-upgrade-guide.html
2020-03-08 17:40:13 -07:00
Dalton Hubble 3250994c95 Use a route table with separate (rather than inline) routes
* Allow users to extend the route table using a data reference
and adding route resources (e.g. unusual peering setups)
* Note: Internally connecting AWS clusters can reduce cross-cloud
flexibility and inhibits blue-green cluster patterns. It is not
recommended
2020-02-25 23:21:58 -08:00
Dalton Hubble 362b3fac5c Add guide for Typhoon with Flatcar Linux on DigitalOcean
* Add docs on manually uploading a Flatcar Linux DigitalOcean
bin image as a custom image and using a data reference
* Set status of Flatcar Linux on DigitalOcean to alpha
* IPv6 is not supported for DigitalOcean custom images
2020-02-14 12:08:58 -08:00
Dalton Hubble 0c53ad52e4 Update recommended Terraform versions and providers
* Sync the documented Terraform versions and provider
plugin versions to those that are actively used/tested
by the author
2020-02-13 14:39:48 -08:00
Dalton Hubble 008817b0aa Promote Fedora CoreOS AWS/bare-metal to beta
* Remove alpha warnings from docs headers
2020-02-13 14:25:22 -08:00
Dalton Hubble 1243f395d1 Update Kubernetes from v1.17.2 to v1.17.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1173
2020-02-11 20:22:14 -08:00
Dalton Hubble ba84f86dc7 Add guide for Typhoon with Flatcar Linux on Google Cloud
* Add docs on manually uploading a Flatcar Linux GCE/GCP gzipped
tarball image as a Compute Engine image for use with the Typhoon
container-linux module
* Set status of Flatcar Linux on Google Cloud to alpha
2020-02-11 19:38:40 -08:00
Dalton Hubble 8cc303c9ac Add module for Fedora CoreOS on Google Cloud
* Add Typhoon Fedora CoreOS on Google Cloud as alpha
* Add docs on uploading the Fedora CoreOS GCP gzipped tarball to
Google Cloud storage to create a boot disk image
2020-02-01 15:21:40 -08:00
Dalton Hubble 02a470d2f2 Fix minor typo in announcement date 2020-01-23 08:57:01 -08:00
Dalton Hubble 5643ad525f Promote Fedora CoreOS from preview to alpha in docs
* Add an announcement to the website as well
2020-01-23 08:47:18 -08:00
Dalton Hubble 1cda5bcd2a Update Kubernetes from v1.17.1 to v1.17.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1172
2020-01-21 18:27:39 -08:00
Dalton Hubble dd930a2ff9 Update bare-metal Fedora CoreOS image location
* Use Fedora CoreOS production download streams (change)
* Use live PXE kernel and initramfs images
* https://getfedora.org/coreos/download/
* Update docs example to use public images (cache is still
recommended at large scale) and stable stream
2020-01-20 14:44:06 -08:00
Dalton Hubble 7ddd3d096d Fix link in maintenance docs
* Also a fix version mention, since Terraform v0.12 was
added in Typhoon v1.15.0
2020-01-18 15:19:27 -08:00
Dalton Hubble b642e3b41b Update Kubernetes from v1.17.0 to v1.17.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1171
2020-01-14 20:21:36 -08:00
Dalton Hubble 073fcb7067 Fix bare-metal instruction for watching install to disk
* Original instructions were to watch install to disk by SSH'ing
via port 2222 following Typhoon v1.10.1. Restore that message,
since the version number in the instruction was incorrectly bumped
on each release
2020-01-12 14:16:00 -08:00
Dalton Hubble b1f521fc4a Allow terraform-provider-google v3.x plugin versions
* Typhoon Google Cloud is compatible with `terraform-provider-google`
v3.x releases
* No v3.x specific features are used, so v2.19+ provider versions are
still allowed, to ease migrations
2020-01-11 14:07:18 -08:00
Dalton Hubble c3e22f3d13 Fix minor example typo in README 2019-12-10 23:14:12 -08:00
Dalton Hubble f69dc2ea0f Update CHANGES and tutorial notes for release
* Update recommended Terraform and provider plugin versions
* Update the rough count of resources created per cluster
since its not been refreshed in a while (will vary based
on cluster options)
2019-12-10 23:03:39 -08:00
Dalton Hubble de36d99afc Update Kubernetes from v1.16.3 to v1.17.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md/#v1170
2019-12-09 18:31:58 -08:00
Dalton Hubble d9c7a9e049 Add/update docs for asset_dir and kubeconfig usage
* Original tutorials favored including the platform (e.g.
google-cloud) in modules (e.g. google-cloud-yavin). Prefer
naming conventions where each module / cluster has a simple
name (e.g. yavin) since the platform is usually redundant
* Retain the example cluster naming themes per platform
2019-12-05 22:56:42 -08:00
Dalton Hubble ad117f4592 Update recommended Terraform provider versions
* Recommend provider plugin version tested against
2019-11-13 13:53:46 -08:00
Dalton Hubble d7061020ba Update Kubernetes from v1.16.2 to v1.16.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163
2019-11-13 13:05:15 -08:00
Dalton Hubble 4775e9d0f7 Upgrade Calico v3.9.2 to v3.10.0
* Allow advertising Kubernetes service ClusterIPs to BGPPeer
routers via a BGPConfiguration
* Improve EdgeRouter docs about routes and BGP
* https://docs.projectcalico.org/v3.10/release-notes/
* https://docs.projectcalico.org/v3.10/networking/advertise-service-ips
2019-10-27 14:13:41 -07:00
Dalton Hubble e6bc5143aa Default to Calico as the CNI provider on Azure/DigitalOcean
* Change `networking` default from flannel to calico on
Azure and DigitalOcean
* AWS, bare-metal, and Google Cloud continue to default
to Calico (as they have since v1.7.5)
* Typhoon now defaults to using Calico and supporting
NetworkPolicy on all platforms
2019-10-15 23:15:40 -07:00
Dalton Hubble 24fc440d83 Update Kubernetes from v1.16.1 to v1.16.2
* Update Calico from v3.9.1 to v3.9.2
2019-10-15 22:42:52 -07:00
Dalton Hubble 5b9dab6659 Introduce list of detail objects for bare-metal machines
* Define bare-metal `controllers` and `workers` as a complex type
list(object{name=string, mac=string, domain=string}) to allow
clusters with many machines to be defined more cleanly
* Remove `controller_names` list variable
* Remove `controller_macs` list variable
* Remove `controller_domains` list variable
* Remove `worker_names` list variable
* Remove `worker_macs` list variable
* Remove `worker_domains` list variable
2019-10-06 20:22:45 -07:00
Dalton Hubble 5196709fe0 Update docs, CHANGES, and mkdocs-material
* Update mkdocs-material from v4.4.2 to v4.4.3
* Update recommended Terraform provider versions
* Cleanup the changelog before release
2019-10-06 18:41:25 -07:00
Dalton Hubble 5ef4155e08 Detect most recent Fedora CoreOS AMI in region
* Detect the most recent Fedora CoreOS AMI to allow usage
of Fedora CoreOS in supported regions (previously just
us-east-1)
* Unpin the Fedora CoreOS AMI image which was pinned to
images that had been checked. This does mean if Fedora
publishes a broken image, it will be selected
* Filter out "dev" images which have similar naming
2019-10-06 18:13:55 -07:00
Dalton Hubble 15c4b793c3 Use new Fedora CoreOS kernel/initrd/raw asset names
* Fedora CoreOS changed the kernel, initramfs, and raw
image asset download paths and names in 30.20191002.0
2019-10-06 17:31:21 -07:00
Dalton Hubble 36ed53924f Add stricter types for bare-metal modules
* Review variables available in bare-metal kubernetes modules
for Container Linux and Fedora CoreOS
* Deprecate cluster_domain_suffix variable
* Remove deprecated container_linux_oem variable
2019-10-06 17:18:50 -07:00
Dalton Hubble 995824fa6d Add stricter types for DigitalOcean module
* Review variables available in DigitalOcean kubernetes
module and sync with documentation
* Promote Calico for DigitalOcean and Azure beyond experimental
(its the primary mode I've used since it was introduced)
2019-10-02 21:48:24 -07:00
Dalton Hubble 1c5ed84fc2 Update Kubernetes from v1.16.0 to v1.16.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161
2019-10-02 21:31:55 -07:00
Dalton Hubble fdd6882a87 Add stricter types to Azure modules
* Review variables available in Azure kubernetes and workers
modules and sync with documentation
* Fix internal workers module default type to Standard_DS1_v2
2019-09-30 22:20:20 -07:00
Dalton Hubble f82266ac8c Add stricter types for GCP modules
* Review variables available in google-cloud kubernetes
and workers modules and in documentation
2019-09-30 22:04:35 -07:00
Dalton Hubble a407ff72df Add stricter types for AWS modules and update docs
* Review variables available in AWS kubernetes and workers
modules and documentation
* Switching between spot and on-demand has worked since
Terraform v0.12
* Generally, there are too many knobs. Less useful ones
should be de-emphasized or removed
* Remove `cluster_domain_suffix` documentation
2019-09-29 11:19:38 -07:00
Dalton Hubble 9bfb1c5faf Update docs and variable types for worker node_labels
* Document worker pools `node_labels` variable to set the
initial node labels for a homogeneous set of workers
* Document `worker_node_labels` convenience variable to
set the initial node labels for default worker nodes
2019-09-28 15:05:12 -07:00
Dalton Hubble 078f084220 Update CHANGES and docs for v1.16.0 release 2019-09-22 17:37:23 -07:00