Commit Graph

19 Commits

Author SHA1 Message Date
Dalton Hubble 2837275265 Introduce cluster creation without local writes to asset_dir
* Allow generated assets (TLS materials, manifests) to be
securely distributed to controller node(s) via file provisioner
(i.e. ssh-agent) as an assets bundle file, rather than relying
on assets being locally rendered to disk in an asset_dir and
then securely distributed
* Change `asset_dir` from required to optional. Left unset,
asset_dir defaults to "" and no assets will be written to
files on the machine that runs terraform apply
* Enhancement: Managed cluster assets are kept only in Terraform
state, which supports different backends (GCS, S3, etcd, etc) and
optional encryption. terraform apply accesses state, runs in-memory,
and distributes sensitive materials to controllers without making
use of local disk (simplifies use in CI systems)
* Enhancement: Improve asset unpack and layout process to position
etcd certificates and control plane certificates more cleanly,
without unneeded secret materials

Details:

* Terraform file provisioner support for distributing directories of
contents (with unknown structure) has been limited to reading from a
local directory, meaning local writes to asset_dir were required.
https://github.com/poseidon/typhoon/issues/585 discusses the problem
and newer or upcoming Terraform features that might help.
* Observation: Terraform provisioner support for single files works
well, but iteration isn't viable. We're also constrained to Terraform
language features on the apply side (no extra plugins, no shelling out)
and CoreOS / Fedora tools on the receive side.
* Take a map representation of the contents that would have been splayed
out in asset_dir and pack/encode them into a single file format devised
for easy unpacking. Use an awk one-liner on the receive side to unpack.
In pratice, this has worked well and its rather nice that a single
assets file is transferred by file provisioner (all or none)

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162
2019-12-05 01:24:50 -08:00
Dalton Hubble 8a9e8595ae Fix terraform fmt formatting 2019-11-13 23:44:02 -08:00
Dalton Hubble 8703f2c3c5 Fix missing comma separator on bare-metal and DO
* Introduced in bare-metal and DigitalOcean in #544
while addressing possible ordering race, but after
the v1.16 upgrade validation
2019-09-23 11:05:26 -07:00
Dalton Hubble b951aca66f Create /etc/kubernetes/manifests before asset copy
* Fix issue (present since bootkube->bootstrap switch) where
controller asset copy could fail if /etc/kubernetes/manifests
wasn't created in time on platforms using path activation for
the Kubelet (observed on DigitalOcean, also possible on
bare-metal)
2019-09-19 00:30:53 -07:00
Dalton Hubble 96b646cf6d Rename bootkube modules to bootstrap
* Rename render module from bootkube to bootstrap. Avoid
confusion with the kubernetes-incubator/bootkube tool since
it is no longer used
* Use the poseidon/terraform-render-bootstrap Terraform module
(formerly poseidon/terraform-render-bootkube)
* https://github.com/poseidon/terraform-render-bootkube/pull/149
2019-09-14 16:24:32 -07:00
Dalton Hubble db947537d1 Migrate GCP, DO, Azure to static pod control plane
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
2019-09-09 22:37:31 -07:00
Dalton Hubble 1366ae404b Migrate DigitalOcean module from Terraform v0.11 to v0.12
* Replace v0.11 bracket type hints with Terraform v0.12 list expressions
* Use expression syntax instead of interpolated strings, where suggested
* Update DigitalOcean tutorial documentation
* Define Terraform and plugin version requirements in versions.tf
  * Require digitalocean ~> v1.3 to support Terraform v0.12
  * Require ct ~> v0.3.2 to support Terraform v0.12
2019-06-06 09:44:58 -07:00
Dalton Hubble 147c21a4bd Allow Calico networking on Azure and DigitalOcean
* Introduce "calico" as a `networking` option on Azure and DigitalOcean
using Calico's new VXLAN support (similar to flannel). Flannel remains
the default on these platforms for now.
* Historically, DigitalOcean and Azure only allowed Flannel as the
CNI provider, since those platforms don't support IPIP traffic that
was previously required for Calico.
* Looking forward, its desireable for Calico to become the default
across Typhoon clusters, since it provides NetworkPolicy and a
consistent experience
* No changes to AWS, GCP, or bare-metal where Calico remains the
default CNI provider. On these platforms, IPIP mode will always
be used, since its available and more performant than vxlan
2019-05-20 17:17:20 +02:00
Dalton Hubble 37ce722f9c Fix race condition in DigitalOcean cluster create
* DigitalOcean clusters must secure copy a kubeconfig to
worker nodes, but Terraform could decide to try copying
before firewall rules have been added to allow SSH access.
* Add an explicit dependency on adding firewall rules first
2019-05-17 13:05:08 +02:00
Dalton Hubble 812a1adb49 Use a lower-privilege Kubelet kubeconfig in system:nodes
* Kubelets can use a lower-privilege TLS client certificate with
Org system:nodes and a binding to the system:node ClusterRole
* Admin kubeconfig's continue to belong to Org system:masters to
provide cluster-admin (available in assets/auth/kubeconfig or as
a Terraform output kubeconfig-admin)
* Remove bare-metal output variable kubeconfig
2019-01-05 13:08:56 -08:00
Dalton Hubble 7acd4931f6 Remove redundant kubeconfig copy on AWS and GCP
* AWS and Google Cloud make use of auto-scaling groups
and managed instance groups, respectively. As such, the
kubeconfig is already held in cloud user-data
* Controller instances are provisioned with a kubeconfig
from user-data. Its redundant to use a Terraform remote
file copy step for the kubeconfig.
2018-03-26 00:01:47 -07:00
Dalton Hubble cfd603bea2 Ensure etcd secrets are only distributed to controller hosts
* Previously, etcd secrets were erroneously distributed to worker
nodes (permissions 500, ownership etc:etcd).
2018-03-25 23:46:44 -07:00
Dalton Hubble 47a9989927 Fix null_resource ordering constraints
* Ensure etcd TLS assets and kubeconfig are copied before
any attempt is made to run bootkube start
2017-11-06 00:55:44 -08:00
Dalton Hubble 308c7dfb6e digital-ocean: Run etcd cluster on-host, across controllers
* Run etcd peers with TLS across controller nodes
* Deprecate self-hosted-etcd on the Digital Ocean platform
* Distribute etcd TLS certificates as part of initial provisioning
* Check the status of etcd by running `systemctl status etcd-member`
2017-10-09 22:43:23 -07:00
Dalton Hubble 7c046b6206 *: Fix Terraform fmt and comments 2017-09-17 21:43:00 -07:00
Dalton Hubble 2ff6d602d8 digital-ocean: Distribute kubeconfig via Terraform null_resource
* Keep kubeconfig out of DigitalOcean metadata user-data
2017-09-13 20:19:52 -07:00
Dalton Hubble e19517d3df Fix the terraform fmt of configs 2017-08-12 18:26:05 -07:00
Dalton Hubble b303c0e986 digital-ocean: Add ssh step dependency on the 0th controller 2017-07-29 15:18:24 -07:00
Dalton Hubble 6070ffb449 Add dghubble/pegasus Digital Ocean Kubernetes Terraform module 2017-07-29 11:36:33 -07:00