typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-04 14:54:36 +02:00

Author	SHA1	Message	Date
Dalton Hubble	fb6f40051f	Disable AWS detailed monitoring on worker nodes * Basic monitoring (free) is sufficient for casual console browsing * Detailed monitoring (paid) is not leveraged for CloudWatch anyway * Favor Prometheus for cloud-agnostic metrics, aggregation, and alerting	2018-06-22 00:26:06 -07:00
Dalton Hubble	316f06df06	Combine NLBs to use one NLB per cluster * Simplify clusters to come with a single NLB * Listen for apiserver traffic on port 6443 and forward to controllers (with healthy apiserver) * Listen for ingress traffic on ports 80/443 and forward to workers (with healthy ingress controller) * Reduce cost of default clusters by 1 NLB ($18.14/month) * Keep using CNAME records to the `ingress_dns_name` NLB and the nginx-ingress addon for Ingress (up to a few million RPS) * Users with heavy traffic (many million RPS) can create their own separate NLB(s) for Ingress and use the new output worker target groups * Fix issue where additional worker pools come with an extraneous network load balancer	2018-06-21 23:46:57 -07:00
Dalton Hubble	f4d3059b00	Update Kubernetes from v1.10.4 to v1.10.5 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1105	2018-06-21 22:51:39 -07:00
Dalton Hubble	6c5a1964aa	Change kube-apiserver port from 443 to 6443 * Adjust firewall rules, security groups, cloud load balancers, and generated kubeconfig's * Facilitates some future simplifications and cost reductions * Bare-Metal users who exposed kube-apiserver on a WAN via their router or load balancer will need to adjust its configuration. This is uncommon, most apiserver are on LAN and/or behind VPN so no routing infrastructure is configured with the port number	2018-06-19 23:48:51 -07:00
Dalton Hubble	6e64634748	Update etcd from v3.3.7 to v3.3.8 * https://github.com/coreos/etcd/releases/tag/v3.3.8	2018-06-19 21:56:21 -07:00
Dalton Hubble	51906bf398	Update etcd from v3.3.6 to v3.3.7	2018-06-14 22:46:16 -07:00
Dalton Hubble	79260c48f6	Update Kubernetes from v1.10.3 to v1.10.4	2018-06-06 23:23:11 -07:00
Dalton Hubble	589c3569b7	Update etcd from v3.3.5 to v3.3.6 * https://github.com/coreos/etcd/releases/tag/v3.3.6	2018-06-06 23:19:30 -07:00
Dalton Hubble	6e968cd152	Update Calico from v3.1.2 to v3.1.3 * https://github.com/projectcalico/calico/releases/tag/v3.1.3 * https://github.com/projectcalico/cni-plugin/releases/tag/v3.1.3	2018-05-30 21:32:12 -07:00
Ben Drucker	6a581ab577	Render etcd_initial_cluster using a template_file	2018-05-30 21:14:49 -07:00
Dalton Hubble	4ea1fde9c5	Update Kubernetes from v1.10.2 to v1.10.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1103 * Update Calico from v3.1.1 to v3.1.2	2018-05-21 21:38:43 -07:00
William Zhang	2ae126bf68	Fix README link to tutorial	2018-05-19 13:10:22 -07:00
Dalton Hubble	983489bb52	Re-run terraform fmt for formatting	2018-05-14 23:38:16 -07:00
Dalton Hubble	c2b719dc75	Configure Prometheus to scrape Kubelets directly * Use Kubelet bearer token authn/authz to scrape metrics * Drop RBAC permission from nodes/proxy to nodes/metrics * Stop proxying kubelet scrapes through the apiserver, since this required higher privilege (nodes/proxy) and can add load to the apiserver on large clusters	2018-05-14 23:06:50 -07:00
Dalton Hubble	37981f9fb1	Allow bearer token authn/authz to the Kubelet * Require Webhook authorization to the Kubelet * Switch apiserver X509 client cert org to systems:masters to grant the apiserver admin and satisfy the authorization requirement. kubectl commands like logs or exec that have the apiserver make requests of a kubelet continue to work as before * https://kubernetes.io/docs/admin/kubelet-authentication-authorization/ * https://github.com/poseidon/typhoon/issues/215	2018-05-13 23:20:42 -07:00
Dalton Hubble	5eb11f5104	Allow Flatcar Linux os_image on AWS, rename os_channel * Replace os_channel variable with os_image to align naming across clouds. Users who set this option to stable, beta, or alpha should now set os_image to coreos-stable, coreos-beta, or coreos-alpha. * Default os_image to coreos-stable. This continues to use the most recent image from the stable channel as always. * Allow Container Linux derivative Flatcar Linux by setting os_image to `flatcar-stable`, `flatcar-beta`, `flatcar-alpha`	2018-05-12 11:41:58 -07:00
Dalton Hubble	f2ee75ac98	Require Terraform v0.11.x, drop v0.10.x support * Raise minimum Terraform version to v0.11.0 * Terraform v0.11.x has been supported since Typhoon v1.9.2 and Terraform v0.10.x was last released in Nov 2017. I'd like to stop worrying about v0.10.x and remove migration docs as a later followup * Migration docs docs/topics/maintenance.md#terraform-v011x	2018-05-10 02:20:46 -07:00
Dalton Hubble	8b8e364915	Update etcd from v3.3.4 to v3.3.5 * https://github.com/coreos/etcd/releases/tag/v3.3.5	2018-05-10 02:12:53 -07:00
Michael Holt	a5916da0e2	Update min AWS provider from v1.11 to v1.13	2018-05-02 15:16:03 -07:00
Dalton Hubble	cc29530ba0	Allow preemptible workers on AWS via spot instances * Add `worker_price` to allow worker spot instances. Defaults to empty string for the worker autoscaling group to use regular on-demand instances. * Add `spot_price` to internal `workers` module for spot worker pools * Note: Unlike GCP `preemptible` workers, spot instances require you to pick a bid price.	2018-04-29 13:31:17 -07:00
Dalton Hubble	e889430926	Update kube-dns from v1.14.9 to v1.14.10 * https://github.com/kubernetes/kubernetes/pull/62676	2018-04-28 00:43:09 -07:00
Dalton Hubble	32ddfa94e1	Update Kubernetes from v1.10.1 to v1.10.2 * https://github.com/kubernetes/kubernetes/releases/tag/v1.10.2	2018-04-28 00:27:00 -07:00
Dalton Hubble	681450aa0d	Update etcd from v3.3.3 to v3.3.4 * https://github.com/coreos/etcd/releases/tag/v3.3.4	2018-04-27 23:57:26 -07:00
Dalton Hubble	567e18f015	Fix conflict between Calico and NetworkManager * Observed frequent kube-scheduler and controller-manager restarts with Calico as the CNI provider. Root cause was unclear since control plane was functional and tests of pod to pod network connectivity passed * Root cause: Calico sets up cali* and tunl* network interfaces for containers on hosts. NetworkManager tries to manage these interfaces. It periodically disconnected veth pairs. Logs did not surface this issue since its not an error per-se, just Calico and NetworkManager dueling for control. Kubernetes correctly restarted pods failing health checks and ensured 2 replicas were running so the control plane functioned mostly normally. Pod to pod connecitivity was only affected occassionally. Pain to debug. * Solution: Configure NetworkManager to ignore the Calico ifaces per Calico's recommendation. Cloud-init writes files after NetworkManager starts, so a restart is required on first boot. On subsequent boots, the file is present so no restart is needed	2018-04-25 21:45:58 -07:00
Dalton Hubble	0a7fab56e2	Load ip_vs kernel module on boot as workaround * (containerized) kube-proxy warns that it is unable to load the ip_vs kernel module despite having the correct mounts. Atomic uses an xz compressed module and modprobe in the container was not compiled with compression support * Workaround issue for now by always loading ip_vs on-host * https://github.com/kubernetes/kubernetes/issues/60	2018-04-25 21:45:58 -07:00
Dalton Hubble	d784b0fca6	Switch to quay.io/poseidon tagged system containers	2018-04-25 18:15:18 -07:00
Dalton Hubble	7198b9016c	Update Calico from v3.0.4 to v3.1.1 for Atomic	2018-04-21 18:46:56 -07:00
Dalton Hubble	233ec6dcb0	Update Fedora Atomic AMI to version 27.122 * http://www.projectatomic.io/blog/2018/04/fedora-atomic-20-apr-18/ * Atomic publishes nightly AMIs which sometimes don't boot or have issues. Until there is a source of reliable AMIs, pin the best known working AMI * Rel 66a66f0d18544591ffdbf8fae9df790113c93d72	2018-04-21 18:46:56 -07:00
Dalton Hubble	9b88d4bbfd	Use bootkube system container on fedora-atomic * Use the upstream bootkube image packaged with the required metadata to be usable as a system container under systemd * Run bootkube with runc so no host level components use Docker any more. Docker is still the runtime * Remove bootkube script and old systemd unit	2018-04-21 18:46:56 -07:00
Dalton Hubble	3dde4ba8ba	Mount host's /etc/os-release in kubelet system containers * Fix `kubectl describe node` to reflect the host's operating system	2018-04-21 18:46:56 -07:00
Dalton Hubble	e148552220	Enable kubelet allocatable enforcement and QoS cgroup hierarchy * Change kubelet system image to use --cgroups-per-qos=true (default) instead of false * Change kubelet system image to use --enforce-node-allocatable=pods instead of an empty string	2018-04-21 18:46:56 -07:00
Dalton Hubble	d8d1468f03	Update kubelet system container image to mount /etc/hosts * Fix kubelet port-forward on Google Cloud / Fedora Atomic * Mount the host's /etc/hosts in kubelet system containers * Problem: kubelet runc system containers on Atomic were not mounting the host's /etc/hosts, like rkt-fly does on Container Linux. `kubectl port-forward` calls socat with localhost. DNS servers on AWS, DO, and in many bare-metal environments resolve localhost to the caller as a convenience. Google Cloud notably does not nor is it required to do so and this surfaced the missing /etc/hosts in runc kubelet namespaces.	2018-04-21 18:46:56 -07:00
Dalton Hubble	24d230505a	Add cloud-metadata.service on AWS fedora-atomic	2018-04-21 18:46:56 -07:00
Dalton Hubble	b3cf9508b6	Update Fedora Atomic modules to Kubernetes v1.10.1	2018-04-21 18:46:56 -07:00
Dalton Hubble	5212684472	Temporarily pin Fedora Atomic AMI * Atomic has published AMI images that shutdown immediately after being powered on	2018-04-21 18:46:56 -07:00
Dalton Hubble	f990473cde	Update control plane manifests and add etcd metrics * Enable etcd v3.3 metrics to expose metrics for scraping by Prometheus * Use k8s.gcr.io instead of gcr.io/google_containers * Add flexvolume plugin mount to controller manager * Update kube-dns from v1.14.8 to v1.14.9	2018-04-21 18:46:56 -07:00
Dalton Hubble	8523a086e2	Fix kubelet system container to mount CNI plugins * Mount /opt/cni/bin in kubelet system container so CNI plugin binaries can be found. Before, flannel worked because the kubelet falls back to flannel plugin baked into the hyperkube (undesired) * Move the CNI bin install location later, since /opt changes may be lost between ostree rebases	2018-04-21 18:46:56 -07:00
Dalton Hubble	19bc5aea9e	Use kubelet system container on fedora-atomic * Use the upstream hyperkube image packaged with the required metadata to be usable as a system container under systemd * Fix port-forward since socat is included	2018-04-21 18:46:56 -07:00
Dalton Hubble	8d7cfc1a45	Use etcd system container on fedora-atomic * Use the upstream etcd image packaged with the required metadata to be usable as a system container (runc) under systemd	2018-04-21 18:46:56 -07:00
Dalton Hubble	9969c357da	Change AWS Fedora module to fedora-atomic	2018-04-21 18:46:56 -07:00
Dalton Hubble	b80a2eb8a0	Sync fedora-cloud modules with Container Linux * Update manifests for Kubernetes v1.10.0 * Update etcd from v3.3.2 to v3.3.3 * Add disk_type optional variable on AWS * Remove redundant kubeconfig copy on AWS * Distribute etcd secres only to controllers * Organize module variables and ssh steps	2018-04-21 18:46:56 -07:00
Dalton Hubble	3610da8b71	Add fedora-cloud module for AWS	2018-04-21 18:46:56 -07:00
Dalton Hubble	a54f76db2a	Update Calico from v3.0.4 to v3.1.1 * https://github.com/projectcalico/calico/releases/tag/v3.1.1 * https://github.com/projectcalico/calico/releases/tag/v3.1.0	2018-04-21 18:30:36 -07:00
Dalton Hubble	23a8156bdf	Fix a few typos in comments	2018-04-15 17:21:49 -07:00
Dalton Hubble	77c0a4cf2e	Update Kubernetes from v1.10.0 to v1.10.1 * Use kubernetes-incubator/bootkube v0.12.0	2018-04-12 20:57:31 -07:00
Dalton Hubble	9bb3de5327	Skip creating unused dirs on worker nodes	2018-04-11 22:23:51 -07:00
Dalton Hubble	6b08bde479	Use k8s.gcr.io instead of gcr.io/google_containers * Kubernetes recommends using the alias to fetch images from the nearest GCR regional mirror, to abstract the use of GCR, and to drop names containing 'google' * https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ	2018-04-08 12:57:52 -07:00
Dalton Hubble	f4b2396718	Return Prometheus deployment to be a worker workload * Expose etcd metrics to workers so Prometheus can run on a worker, rather than a controller * Drop temporary firewall rules allowing Prometheus to run on a controller and scrape targes * Related to https://github.com/poseidon/typhoon/pull/175	2018-04-08 12:20:00 -07:00
Dalton Hubble	18dbaf74ce	Update kube-dns from v1.14.8 to v1.14.9 * https://github.com/kubernetes/kubernetes/pull/61908	2018-04-04 21:00:23 -07:00
Dalton Hubble	ce001e9d56	Update etcd from v3.3.2 to v3.3.3 * https://github.com/coreos/etcd/releases/tag/v3.3.3	2018-04-04 20:32:24 -07:00

1 2 3

128 Commits