* Allow generated assets (TLS materials, manifests) to be
securely distributed to controller node(s) via file provisioner
(i.e. ssh-agent) as an assets bundle file, rather than relying
on assets being locally rendered to disk in an asset_dir and
then securely distributed
* Change `asset_dir` from required to optional. Left unset,
asset_dir defaults to "" and no assets will be written to
files on the machine that runs terraform apply
* Enhancement: Managed cluster assets are kept only in Terraform
state, which supports different backends (GCS, S3, etcd, etc) and
optional encryption. terraform apply accesses state, runs in-memory,
and distributes sensitive materials to controllers without making
use of local disk (simplifies use in CI systems)
* Enhancement: Improve asset unpack and layout process to position
etcd certificates and control plane certificates more cleanly,
without unneeded secret materials
Details:
* Terraform file provisioner support for distributing directories of
contents (with unknown structure) has been limited to reading from a
local directory, meaning local writes to asset_dir were required.
https://github.com/poseidon/typhoon/issues/585 discusses the problem
and newer or upcoming Terraform features that might help.
* Observation: Terraform provisioner support for single files works
well, but iteration isn't viable. We're also constrained to Terraform
language features on the apply side (no extra plugins, no shelling out)
and CoreOS / Fedora tools on the receive side.
* Take a map representation of the contents that would have been splayed
out in asset_dir and pack/encode them into a single file format devised
for easy unpacking. Use an awk one-liner on the receive side to unpack.
In pratice, this has worked well and its rather nice that a single
assets file is transferred by file provisioner (all or none)
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162
* Fix issue (present since bootkube->bootstrap switch) where
controller asset copy could fail if /etc/kubernetes/manifests
wasn't created in time on platforms using path activation for
the Kubelet (observed on DigitalOcean, also possible on
bare-metal)
* Rename render module from bootkube to bootstrap. Avoid
confusion with the kubernetes-incubator/bootkube tool since
it is no longer used
* Use the poseidon/terraform-render-bootstrap Terraform module
(formerly poseidon/terraform-render-bootkube)
* https://github.com/poseidon/terraform-render-bootkube/pull/149
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
* Replace v0.11 bracket type hints with Terraform v0.12 list expressions
* Use expression syntax instead of interpolated strings, where suggested
* Update DigitalOcean tutorial documentation
* Define Terraform and plugin version requirements in versions.tf
* Require digitalocean ~> v1.3 to support Terraform v0.12
* Require ct ~> v0.3.2 to support Terraform v0.12
* Introduce "calico" as a `networking` option on Azure and DigitalOcean
using Calico's new VXLAN support (similar to flannel). Flannel remains
the default on these platforms for now.
* Historically, DigitalOcean and Azure only allowed Flannel as the
CNI provider, since those platforms don't support IPIP traffic that
was previously required for Calico.
* Looking forward, its desireable for Calico to become the default
across Typhoon clusters, since it provides NetworkPolicy and a
consistent experience
* No changes to AWS, GCP, or bare-metal where Calico remains the
default CNI provider. On these platforms, IPIP mode will always
be used, since its available and more performant than vxlan
* DigitalOcean clusters must secure copy a kubeconfig to
worker nodes, but Terraform could decide to try copying
before firewall rules have been added to allow SSH access.
* Add an explicit dependency on adding firewall rules first
* Kubelets can use a lower-privilege TLS client certificate with
Org system:nodes and a binding to the system:node ClusterRole
* Admin kubeconfig's continue to belong to Org system:masters to
provide cluster-admin (available in assets/auth/kubeconfig or as
a Terraform output kubeconfig-admin)
* Remove bare-metal output variable kubeconfig
* AWS and Google Cloud make use of auto-scaling groups
and managed instance groups, respectively. As such, the
kubeconfig is already held in cloud user-data
* Controller instances are provisioned with a kubeconfig
from user-data. Its redundant to use a Terraform remote
file copy step for the kubeconfig.
* Run etcd peers with TLS across controller nodes
* Deprecate self-hosted-etcd on the Digital Ocean platform
* Distribute etcd TLS certificates as part of initial provisioning
* Check the status of etcd by running `systemctl status etcd-member`