Commit Graph

169 Commits

Author SHA1 Message Date
Ben Drucker 6a581ab577 Render etcd_initial_cluster using a template_file 2018-05-30 21:14:49 -07:00
Dalton Hubble 4ea1fde9c5 Update Kubernetes from v1.10.2 to v1.10.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1103
* Update Calico from v3.1.1 to v3.1.2
2018-05-21 21:38:43 -07:00
William Zhang 2ae126bf68 Fix README link to tutorial 2018-05-19 13:10:22 -07:00
Dalton Hubble 983489bb52 Re-run terraform fmt for formatting 2018-05-14 23:38:16 -07:00
Dalton Hubble c2b719dc75 Configure Prometheus to scrape Kubelets directly
* Use Kubelet bearer token authn/authz to scrape metrics
* Drop RBAC permission from nodes/proxy to nodes/metrics
* Stop proxying kubelet scrapes through the apiserver, since
this required higher privilege (nodes/proxy) and can add
load to the apiserver on large clusters
2018-05-14 23:06:50 -07:00
Dalton Hubble 37981f9fb1 Allow bearer token authn/authz to the Kubelet
* Require Webhook authorization to the Kubelet
* Switch apiserver X509 client cert org to systems:masters
to grant the apiserver admin and satisfy the authorization
requirement. kubectl commands like logs or exec that have
the apiserver make requests of a kubelet continue to work
as before
* https://kubernetes.io/docs/admin/kubelet-authentication-authorization/
* https://github.com/poseidon/typhoon/issues/215
2018-05-13 23:20:42 -07:00
Dalton Hubble 5eb11f5104 Allow Flatcar Linux os_image on AWS, rename os_channel
* Replace os_channel variable with os_image to align naming
across clouds. Users who set this option to stable, beta, or
alpha should now set os_image to coreos-stable, coreos-beta,
or coreos-alpha.
* Default os_image to coreos-stable. This continues to use
the most recent image from the stable channel as always.
* Allow Container Linux derivative Flatcar Linux by setting
os_image to `flatcar-stable`, `flatcar-beta`, `flatcar-alpha`
2018-05-12 11:41:58 -07:00
Dalton Hubble f2ee75ac98 Require Terraform v0.11.x, drop v0.10.x support
* Raise minimum Terraform version to v0.11.0
* Terraform v0.11.x has been supported since Typhoon v1.9.2
and Terraform v0.10.x was last released in Nov 2017. I'd like
to stop worrying about v0.10.x and remove migration docs as
a later followup
* Migration docs docs/topics/maintenance.md#terraform-v011x
2018-05-10 02:20:46 -07:00
Dalton Hubble 8b8e364915 Update etcd from v3.3.4 to v3.3.5
* https://github.com/coreos/etcd/releases/tag/v3.3.5
2018-05-10 02:12:53 -07:00
Michael Holt a5916da0e2 Update min AWS provider from v1.11 to v1.13 2018-05-02 15:16:03 -07:00
Dalton Hubble cc29530ba0 Allow preemptible workers on AWS via spot instances
* Add `worker_price` to allow worker spot instances. Defaults
to empty string for the worker autoscaling group to use regular
on-demand instances.
* Add `spot_price` to internal `workers` module for spot worker
pools
* Note: Unlike GCP `preemptible` workers, spot instances require
you to pick a bid price.
2018-04-29 13:31:17 -07:00
Dalton Hubble e889430926 Update kube-dns from v1.14.9 to v1.14.10
* https://github.com/kubernetes/kubernetes/pull/62676
2018-04-28 00:43:09 -07:00
Dalton Hubble 32ddfa94e1 Update Kubernetes from v1.10.1 to v1.10.2
* https://github.com/kubernetes/kubernetes/releases/tag/v1.10.2
2018-04-28 00:27:00 -07:00
Dalton Hubble 681450aa0d Update etcd from v3.3.3 to v3.3.4
* https://github.com/coreos/etcd/releases/tag/v3.3.4
2018-04-27 23:57:26 -07:00
Dalton Hubble 567e18f015 Fix conflict between Calico and NetworkManager
* Observed frequent kube-scheduler and controller-manager
restarts with Calico as the CNI provider. Root cause was
unclear since control plane was functional and tests of
pod to pod network connectivity passed
* Root cause: Calico sets up cali* and tunl* network interfaces
for containers on hosts. NetworkManager tries to manage these
interfaces. It periodically disconnected veth pairs. Logs did
not surface this issue since its not an error per-se, just Calico
and NetworkManager dueling for control. Kubernetes correctly
restarted pods failing health checks and ensured 2 replicas were
running so the control plane functioned mostly normally. Pod to
pod connecitivity was only affected occassionally. Pain to debug.
* Solution: Configure NetworkManager to ignore the Calico ifaces
per Calico's recommendation. Cloud-init writes files after
NetworkManager starts, so a restart is required on first boot. On
subsequent boots, the file is present so no restart is needed
2018-04-25 21:45:58 -07:00
Dalton Hubble 0a7fab56e2 Load ip_vs kernel module on boot as workaround
* (containerized) kube-proxy warns that it is unable to
load the ip_vs kernel module despite having the correct
mounts. Atomic uses an xz compressed module and modprobe
in the container was not compiled with compression support
* Workaround issue for now by always loading ip_vs on-host
* https://github.com/kubernetes/kubernetes/issues/60
2018-04-25 21:45:58 -07:00
Dalton Hubble d784b0fca6 Switch to quay.io/poseidon tagged system containers 2018-04-25 18:15:18 -07:00
Dalton Hubble 7198b9016c Update Calico from v3.0.4 to v3.1.1 for Atomic 2018-04-21 18:46:56 -07:00
Dalton Hubble 233ec6dcb0 Update Fedora Atomic AMI to version 27.122
* http://www.projectatomic.io/blog/2018/04/fedora-atomic-20-apr-18/
* Atomic publishes nightly AMIs which sometimes don't boot
or have issues. Until there is a source of reliable AMIs,
pin the best known working AMI
* Rel 66a66f0d18544591ffdbf8fae9df790113c93d72
2018-04-21 18:46:56 -07:00
Dalton Hubble 9b88d4bbfd Use bootkube system container on fedora-atomic
* Use the upstream bootkube image packaged with the
required metadata to be usable as a system container
under systemd
* Run bootkube with runc so no host level components
use Docker any more. Docker is still the runtime
* Remove bootkube script and old systemd unit
2018-04-21 18:46:56 -07:00
Dalton Hubble 3dde4ba8ba Mount host's /etc/os-release in kubelet system containers
* Fix `kubectl describe node` to reflect the host's operating
system
2018-04-21 18:46:56 -07:00
Dalton Hubble e148552220 Enable kubelet allocatable enforcement and QoS cgroup hierarchy
* Change kubelet system image to use --cgroups-per-qos=true
(default) instead of false
* Change kubelet system image to use --enforce-node-allocatable=pods
instead of an empty string
2018-04-21 18:46:56 -07:00
Dalton Hubble d8d1468f03 Update kubelet system container image to mount /etc/hosts
* Fix kubelet port-forward on Google Cloud / Fedora Atomic
* Mount the host's /etc/hosts in kubelet system containers
* Problem: kubelet runc system containers on Atomic were not
mounting the host's /etc/hosts, like rkt-fly does on Container
Linux. `kubectl port-forward` calls socat with localhost. DNS
servers on AWS, DO, and in many bare-metal environments resolve
localhost to the caller as a convenience. Google Cloud notably
does not nor is it required to do so and this surfaced the
missing /etc/hosts in runc kubelet namespaces.
2018-04-21 18:46:56 -07:00
Dalton Hubble 24d230505a Add cloud-metadata.service on AWS fedora-atomic 2018-04-21 18:46:56 -07:00
Dalton Hubble b3cf9508b6 Update Fedora Atomic modules to Kubernetes v1.10.1 2018-04-21 18:46:56 -07:00
Dalton Hubble 5212684472 Temporarily pin Fedora Atomic AMI
* Atomic has published AMI images that shutdown
immediately after being powered on
2018-04-21 18:46:56 -07:00
Dalton Hubble f990473cde Update control plane manifests and add etcd metrics
* Enable etcd v3.3 metrics to expose metrics for
scraping by Prometheus
* Use k8s.gcr.io instead of gcr.io/google_containers
* Add flexvolume plugin mount to controller manager
* Update kube-dns from v1.14.8 to v1.14.9
2018-04-21 18:46:56 -07:00
Dalton Hubble 8523a086e2 Fix kubelet system container to mount CNI plugins
* Mount /opt/cni/bin in kubelet system container so
CNI plugin binaries can be found. Before, flannel
worked because the kubelet falls back to flannel
plugin baked into the hyperkube (undesired)
* Move the CNI bin install location later, since /opt
changes may be lost between ostree rebases
2018-04-21 18:46:56 -07:00
Dalton Hubble 19bc5aea9e Use kubelet system container on fedora-atomic
* Use the upstream hyperkube image packaged with the
required metadata to be usable as a system container
under systemd
* Fix port-forward since socat is included
2018-04-21 18:46:56 -07:00
Dalton Hubble 8d7cfc1a45 Use etcd system container on fedora-atomic
* Use the upstream etcd image packaged with the required
metadata to be usable as a system container (runc) under
systemd
2018-04-21 18:46:56 -07:00
Dalton Hubble 9969c357da Change AWS Fedora module to fedora-atomic 2018-04-21 18:46:56 -07:00
Dalton Hubble b80a2eb8a0 Sync fedora-cloud modules with Container Linux
* Update manifests for Kubernetes v1.10.0
* Update etcd from v3.3.2 to v3.3.3
* Add disk_type optional variable on AWS
* Remove redundant kubeconfig copy on AWS
* Distribute etcd secres only to controllers
* Organize module variables and ssh steps
2018-04-21 18:46:56 -07:00
Dalton Hubble 3610da8b71 Add fedora-cloud module for AWS 2018-04-21 18:46:56 -07:00
Dalton Hubble a54f76db2a Update Calico from v3.0.4 to v3.1.1
* https://github.com/projectcalico/calico/releases/tag/v3.1.1
* https://github.com/projectcalico/calico/releases/tag/v3.1.0
2018-04-21 18:30:36 -07:00
Dalton Hubble 23a8156bdf Fix a few typos in comments 2018-04-15 17:21:49 -07:00
Dalton Hubble 77c0a4cf2e Update Kubernetes from v1.10.0 to v1.10.1
* Use kubernetes-incubator/bootkube v0.12.0
2018-04-12 20:57:31 -07:00
Dalton Hubble 9bb3de5327 Skip creating unused dirs on worker nodes 2018-04-11 22:23:51 -07:00
Dalton Hubble 6b08bde479 Use k8s.gcr.io instead of gcr.io/google_containers
* Kubernetes recommends using the alias to fetch images
from the nearest GCR regional mirror, to abstract the use
of GCR, and to drop names containing 'google'
* https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ
2018-04-08 12:57:52 -07:00
Dalton Hubble f4b2396718 Return Prometheus deployment to be a worker workload
* Expose etcd metrics to workers so Prometheus can
run on a worker, rather than a controller
* Drop temporary firewall rules allowing Prometheus
to run on a controller and scrape targes
* Related to https://github.com/poseidon/typhoon/pull/175
2018-04-08 12:20:00 -07:00
Dalton Hubble 18dbaf74ce Update kube-dns from v1.14.8 to v1.14.9
* https://github.com/kubernetes/kubernetes/pull/61908
2018-04-04 21:00:23 -07:00
Dalton Hubble ce001e9d56 Update etcd from v3.3.2 to v3.3.3
* https://github.com/coreos/etcd/releases/tag/v3.3.3
2018-04-04 20:32:24 -07:00
Dalton Hubble d770393dbc Add etcd metrics, Prometheus scrapes, and Grafana dash
* Use etcd v3.3 --listen-metrics-urls to expose only metrics
data via http://0.0.0.0:2381 on controllers
* Add Prometheus discovery for etcd peers on controller nodes
* Temporarily drop two noisy Prometheus alerts
2018-04-03 20:31:00 -07:00
Dalton Hubble 1cc043d1eb Update Kubernetes from v1.9.6 to v1.10.0 2018-03-30 22:14:07 -07:00
Dalton Hubble f8e9bfb1c0 Add disk_type variable for EBS volume type on AWS
* Change EBS volume type from `standard` ("prior generation)
 to `gp2`. Prometheus alerts are tuned for SSDs
* Other platforms have fast enough disks by default
2018-03-29 22:51:54 -07:00
Dalton Hubble 7acd4931f6 Remove redundant kubeconfig copy on AWS and GCP
* AWS and Google Cloud make use of auto-scaling groups
and managed instance groups, respectively. As such, the
kubeconfig is already held in cloud user-data
* Controller instances are provisioned with a kubeconfig
from user-data. Its redundant to use a Terraform remote
file copy step for the kubeconfig.
2018-03-26 00:01:47 -07:00
Dalton Hubble e43cf9f608 Organize and cleanup variable descriptions 2018-03-25 21:44:43 -07:00
Dalton Hubble a04ef3919a Update Kubernetes from v1.9.5 to v1.9.6 2018-03-21 20:29:52 -07:00
Dalton Hubble 758c09fa5c Update Kubernetes from v1.9.4 to v1.9.5 2018-03-19 00:25:44 -07:00
Dalton Hubble f3730b2bfa Add Container Linux Config snippets feature
* Introduce the ability to support Container Linux Config
"snippets" for controllers and workers on cloud platforms.
This allows end-users to customize hosts by providing Container
Linux configs that are additively merged into the base configs
defined by Typhoon. Config snippets are validated, merged, and
show any errors during `terraform plan`
* Example uses include adding systemd units, network configs,
mounts, files, raid arrays, or other disk provisioning features
provided by Container Linux Configs (using Ignition low-level)
* Requires terraform-provider-ct v0.2.1 plugin
2018-03-18 18:28:18 -07:00
Dalton Hubble 88aa9a46e5 Add /var/lib/calico volume mount to Calico DaemonSet 2018-03-18 16:40:38 -07:00
Dalton Hubble efa90d8b44 Add a new key=value label to controller nodes
* Add a node-role.kubernetes.io/controller="true" node label
to controllers so Prometheus service discovery can filter to
services that only run on controllers (i.e. masters)
* Leave node-role.kubernetes.io/master="" untouched as its
a Kubernetes convention
2018-03-18 16:39:10 -07:00
Dalton Hubble 21f2cef12f Improve changelog, README, and index page 2018-03-12 20:58:02 -07:00
Dalton Hubble 931e311786 Update Kubernetes from v1.9.3 to v1.9.4 2018-03-12 18:07:50 -07:00
Dalton Hubble 8e7e6b9f7f Normalize Terraform configs with terraform fmt 2018-03-11 14:46:05 -07:00
Dalton Hubble 35f3b1b28c Enable AWS NLB cross-zone load balancing
* https://github.com/terraform-providers/terraform-provider-aws/pull/3537
* https://aws.amazon.com/about-aws/whats-new/2018/02/network-load-balancer-now-supports-cross-zone-load-balancing/
2018-03-10 23:25:18 -08:00
Dalton Hubble 9fb1e1a0e2 Update etcd from v3.3.1 to v3.3.2
* https://github.com/coreos/etcd/releases/tag/v3.3.2
2018-03-10 13:44:35 -08:00
Dalton Hubble b61d6373c5 Add ignore_changes for AWS worker image_id 2018-03-10 13:16:05 -08:00
Dalton Hubble c112ee3829 Rename cluster_name to name in internal module
* Ensure consistency between AWS and GCP platforms
2018-03-03 17:52:01 -08:00
Dalton Hubble da6aafe816 Revert "Add module version requirements to internal workers modules"
* This reverts commit cce4537487.
* Provider passing to child modules is complex and the behavior
changed between Terraform v0.10 and v0.11. We're continuing to
allow both versions so this change should be reverted. For the
time being, those using our internal Terraform modules will have
to be aware of the minimum version for AWS and GCP providers,
there is no good way to do enforcement.
2018-03-03 16:56:34 -08:00
Dalton Hubble cce4537487 Add module version requirements to internal workers modules 2018-03-03 14:39:25 -08:00
Dalton Hubble 73126eb7f8 Add support for worker pools on AWS
* Allow groups of workers to be defined and joined to
a cluster (i.e. worker pools)
* Move worker resources into a Terraform submodule
* Output variables needed for passing to worker pools
* Add usage docs for AWS worker pools (advanced)
2018-02-27 18:31:42 -08:00
Dalton Hubble 98985e5acd Remove unused etcd_service_ip template variable
* etcd_service_ip dates back to deprecated self-hosted etcd
2018-02-26 22:20:20 -08:00
Dalton Hubble 486fdb6968 Simplify CLC kubeconfig templating on AWS and GCP
* Template terraform-render-bootkube's multi-line kubeconfig
output using the right indentation
* Add `kubeconfig` variable to google-cloud controllers and
workers Terraform submodules
* Remove `kubeconfig_*` variables from google-cloud controllers
and workers Terraform submodules
2018-02-26 12:49:01 -08:00
Dalton Hubble a44cf0edbd Update Calico from v3.0.2 to v3.0.3
* https://github.com/projectcalico/calico/releases/tag/v3.0.3
2018-02-26 12:48:19 -08:00
Dalton Hubble 13f3745093 Add kubelet --volume-plugin-dir flag
* Set Kubelet search path for flexvolume plugins
to /var/lib/kubelet/volumeplugins
* Add support for flexvolume plugins on AWS, GCE, and DO
* See 9548572d98 which added flexvolume support for bare-metal
2018-02-22 22:11:45 -08:00
Dalton Hubble c4914c326b Update bootkube and terraform-render-bootkube to v0.11.0 2018-02-22 21:53:26 -08:00
Paul Saunders ceb5555222 Switch apiserver from ELB to a network load balancer 2018-02-22 16:10:31 -08:00
Dalton Hubble 22fa051002 Switch Ingress ELB to a network load balancer
* Require terraform-provider-aws 1.7 or higher
2018-02-20 17:34:38 -08:00
Stephen Augustus c8313751d7 Ignore lifecycle changes to the AWS controller ami 2018-02-15 19:48:39 -08:00
Dalton Hubble 195d902ab6 Upgrade etcd from v3.2.15 to v3.3.1 2018-02-15 19:29:46 -08:00
Dalton Hubble c19a68b59b Update bootkube control-plane manifests
* Remove PersistentVolumeLabel admission controller flag
* Switch Deployments and DaemonSets to apps/v1
* Minor update to pod-checkpointer image version
2018-02-15 11:06:35 -08:00
Dalton Hubble 82a616c70b Fix terraform config formatting 2018-02-10 15:18:27 -08:00
Dalton Hubble a41691b222 Update Kubernetes from v1.9.2 to v1.9.3
* Add flannel service account and limited RBAC cluster role
* Change DaemonSets to tolerate NoSchedule and NoExecute taints
* Remove deprecated apiserver --etcd-quorum-read flag
* Update Calico from v3.0.1 to v3.0.2
* Add Calico GlobalNetworkSet CRD
* https://github.com/poseidon/terraform-render-bootkube/pull/44
2018-02-10 13:37:07 -08:00
Dalton Hubble 2fa1840c30 Update flannel from v0.9.0 to v0.10.0
* https://github.com/coreos/flannel/releases/tag/v0.10.0
2018-01-28 23:09:21 -08:00
Dalton Hubble 8e0b8d7e40 Upgrade Calico from 2.6.6 to 3.0.1 2018-01-28 11:47:23 -08:00
Dalton Hubble 3e6e4ea339 Update etcd from 3.2.14 to 3.2.15
* https://github.com/coreos/etcd/releases/tag/v3.2.15
2018-01-23 23:50:04 -08:00
Dalton Hubble 868265988b Update bootkube and terraform-render-bootkube to v0.10.0 2018-01-19 23:10:45 -08:00
Dalton Hubble 6adffcb778 Update Kubernetes from v1.9.1 to v1.9.2 2018-01-19 08:40:09 -08:00
Dalton Hubble d8db296932 Update kube-dns and use separate service account
* Update kube-dns from v1.14.7 to v1.14.8
* Use a separate kube-dns service account
* https://github.com/kubernetes/kubernetes/pull/57918
2018-01-12 10:29:30 -08:00
Dalton Hubble 388ac08492 Update etcd from 3.2.13 to 3.2.14
* https://github.com/coreos/etcd/releases/tag/v3.2.14
2018-01-12 07:20:55 -08:00
Dalton Hubble fc455c8624 Remove old mention of ACIs in bootkube.service description 2018-01-06 16:20:34 -08:00
Dalton Hubble 51a5f64024 Enable portmap plugin alongside Calico to fix hostPort
* https://github.com/poseidon/terraform-render-bootkube/pull/36
2018-01-06 14:01:18 -08:00
Dalton Hubble e1f2125f02 Update etcd from 3.2.0 to 3.2.13
* https://github.com/coreos/etcd/releases/tag/v3.2.13
2018-01-06 14:01:18 -08:00
Dalton Hubble 9329b775f6 Update Kubernetes from v1.8.6 to v1.9.1 2018-01-06 14:01:16 -08:00
Dalton Hubble fbdd946601 Update Kubernetes from v1.8.5 to v1.8.6 2017-12-21 11:20:37 -08:00
Barak Michener e79088baa0 Add optional cluster_domain_suffix variable
* Allow kube-dns to respond to DNS queries with a custom
suffix, instead of the default 'cluster.local'
* Useful when multiple clusters exist on the same local
network and wish to query services on one another
2017-12-15 01:45:52 -08:00
Dalton Hubble 495e33e213 Update bootkube and terraform-render-bootkube to v0.9.1 2017-12-15 01:45:02 -08:00
Dalton Hubble 63f5a26a72 Eliminate steps to move self-hosted etcd assets
* bootkube/assets/experimental/* assets corresponded to self-hosted
etcd manifests, which are no longer an option in Typhoon
2017-12-13 01:06:56 -08:00
Lars Fenneberg eea79e895d Fix manifest consolidation in bootkube start wrapper
* Fix manifest existence test in /opt/bootkube/bootkube-start
to also work with more than one directory
2017-12-12 23:08:22 -08:00
Dalton Hubble 165396d6aa Update Kubernetes from v1.8.4 to v1.8.5 2017-12-09 21:28:31 -08:00
Vincent Palmer ce49a93d5d Fix issue with etcd-member failing to resolve peers
* When restarting masters, `etcd-member.service` may fail to lookup peers if
/etc/resolv.conf hasn't been populated yet. Require the wait-for-dns.service.
2017-12-09 20:12:49 -08:00
Dalton Hubble 5f5eec1175 Update bootkube and terraform-render-bootkube to v0.9.0 2017-12-01 22:27:48 -08:00
Dalton Hubble 5308fde3d3 Add Kubernetes certification badge 2017-11-29 19:26:49 -08:00
Dalton Hubble 6483f613c5 Update Kubernetes from v1.8.3 to v1.8.4 2017-11-28 21:52:11 -08:00
Dalton Hubble 56c6bf431a Update terraform-render-bootkube for Kubernetes v1.8.4
* Update hyperkube from v1.8.3 to v1.8.4
* Remove flock from bootstrap-apiserver and kube-apiserver
* Remove unused critical-pod annotations in manifests
* Use service accounts for kube-proxy and pod-checkpointer
* Update Calico from v2.6.1 to v2.6.3
* Update flannel from v0.9.0 to v0.9.1
* Remove Calico termination grace period to prevent calico
from getting stuck for extended periods
* https://github.com/poseidon/terraform-render-bootkube/pull/29
2017-11-28 21:42:26 -08:00
Dalton Hubble 5f6b0728c5 Update bootkube and terraform-render-bootkube to v0.8.2 2017-11-10 20:01:37 -08:00
Dalton Hubble d774c51297 Update Kubernetes from v1.8.2 to v1.8.3 2017-11-08 23:34:19 -08:00
Dalton Hubble f6a8fb363e Remove deprecated kubelet --require-kubeconfig flag
* https://github.com/kubernetes/kubernetes/pull/40050
2017-11-08 23:34:19 -08:00
Dalton Hubble ea1efb536a Remove old firewall rule for bootstrap self-hosted etcd 2017-11-08 00:15:20 -08:00
Dalton Hubble ccc832f468 Add firewall rule to allow apiserver to proxy other controller kubelets
* Prometheus proxies through the apiserver to scrape kubelets
* In multi-controller setups, an apiserver must be able to scrape
kubelets (10250) on other controllers
2017-11-06 01:03:53 -08:00