typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-04 10:14:37 +02:00

Author	SHA1	Message	Date
Dalton Hubble	69d064bfdf	Run kube-apiserver with lower privilege user (nobody) * Run kube-apiserver as a non-root user (nobody). User no longer needs to bind low number ports. * On most platforms, the kube-apiserver load balancer listens on 6443 and fronts controllers with kube-apiserver pods using port 6443. Google Cloud TCP proxy load balancers cannot listen on 6443. However, GCP's load balancer can be made to listen on 443, while kube-apiserver uses 6443 across all platforms.	2019-07-08 20:52:00 -07:00
Dalton Hubble	7a69bae75e	Raise GCP network deletion timeout from 4m to 6m * Fix a GCP errata item https://github.com/poseidon/typhoon/wiki/Errata * Removal of a Google Cloud cluster often required 2 runs of `terraform apply` because network resource deletes timeout after 4m. Raise the network deletion timeout to 6m to ensure apply only needs to be run once to remove a cluster	2019-07-06 13:15:33 -07:00
Dalton Hubble	3fcb04f68c	Improve apiserver backend service zone spanning * google_compute_backend_services use nested blocks to define backends (instance groups heterogeneous controllers) * Use Terraform v0.12.x dynamic blocks so the apiserver backend service can refer to (up to zone-many) controller instance groups * Previously, with Terraform v0.11.x, the apiserver backend service had to list a fixed set of backends to span controller nodes across zones in multi-controller setups. 3 backends were used because each GCP region offered at least 3 zones. Single-controller clusters had the cosmetic ugliness of unused instance groups * Allow controllers to span more than 3 zones if avilable in a region (e.g. currently only us-central1, with 4 zones) Related: * https://www.terraform.io/docs/providers/google/r/compute_backend_service.html * https://www.terraform.io/docs/configuration/expressions.html#dynamic-blocks	2019-07-05 19:46:26 -07:00
Dalton Hubble	8d373b5850	Update Calico from v3.7.3 to v3.7.4 * https://docs.projectcalico.org/v3.7/release-notes/	2019-07-02 20:18:02 -07:00
Dalton Hubble	408e60075a	Update Kubernetes from v1.14.3 to v1.15.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1150 * Remove docs referring to possible v1.14.4 release	2019-06-23 13:12:18 -07:00
Dalton Hubble	21fb632e90	Update Calico from v3.7.2 to v3.7.3 * https://docs.projectcalico.org/v3.7/release-notes/	2019-06-13 23:54:20 -07:00
Dalton Hubble	d6d9e6c4b9	Migrate Google Cloud module Terraform v0.11 to v0.12 * Replace v0.11 bracket type hints with Terraform v0.12 list expressions * Use expression syntax instead of interpolated strings, where suggested * Update Google Cloud tutorial and worker pools documentation * Define Terraform and plugin version requirements in versions.tf * Require google ~> 2.5 to support Terraform v0.12 * Require ct ~> 0.3.2 to support Terraform v0.12	2019-06-06 09:48:56 -07:00
Dalton Hubble	0ccb2217b5	Update Kubernetes from v1.14.2 to v1.14.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#v1143	2019-05-31 01:08:32 -07:00
Dalton Hubble	c565f9fd47	Rename worker pool modules' count variable to worker_count * This change affects users who use worker pools on AWS, GCP, or Azure with a Container Linux derivative * Rename worker pool modules' `count` variable to `worker_count`, because `count` will be a reserved variable name in Terraform v0.12	2019-05-27 16:40:00 -07:00
Dalton Hubble	2a71cba0e3	Update CoreDNS from v1.3.1 to v1.5.0 * Add `ready` plugin to improve readinessProbe * https://coredns.io/2019/04/06/coredns-1.5.0-release/	2019-05-27 00:11:52 -07:00
Dalton Hubble	6e4cf65c4c	Fix terraform-render-bootkube to remove trailing slash * Fix to remove a trailing slash that was erroneously introduced in the scripting that updated from v1.14.1 to v1.14.2 * Workaround before this fix was to re-run `terraform init`	2019-05-22 18:29:11 +02:00
Dalton Hubble	da97bd4f12	Update Kubernetes from v1.14.1 to v1.14.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#v1142	2019-05-17 13:09:15 +02:00
Dalton Hubble	f62286b677	Update Calico from v3.7.0 to v3.7.2 * https://docs.projectcalico.org/v3.7/release-notes/	2019-05-17 12:29:46 +02:00
Dalton Hubble	af18296bc5	Change flannel port from 8472 to 4789 * Change flannel port from the kernel default 8472 to the IANA assigned VXLAN port 4789 * Update firewall rules or security groups for VXLAN * Why now? Calico now offers its own VXLAN backend so standardizing on the IANA port will simplify config * https://github.com/coreos/flannel/blob/master/Documentation/backends.md#vxlan	2019-05-06 21:58:10 -07:00
Dalton Hubble	09e0230111	Upgrade Calico from v3.6.1 to v3.7.0 * https://docs.projectcalico.org/v3.7/release-notes/ * https://github.com/poseidon/terraform-render-bootkube/pull/131	2019-05-06 00:44:15 -07:00
Dalton Hubble	feb6192aac	Update etcd from v3.3.12 to v3.3.13 on Container Linux * Skip updating etcd for Fedora Atomic clusters, now that Fedora Atomic has been deprecated	2019-05-04 12:55:42 -07:00
Jordan Pittier	ecbbdd905e	Use ./ prefix for inner/local worker pool modules * Terraform v0.11 encouraged use of a "./" prefix for local module references and Terraform v0.12 will require it * https://www.terraform.io/docs/modules/sources.html#local-paths Related: https://github.com/hashicorp/terraform/issues/19745	2019-05-04 12:27:22 -07:00
JordanP	8da17fb7a2	Fix "google_compute_target_pool.workers: Cannot determine region" If no region is set at the Google provider level, Terraform fails to create the google_compute_target_pool.workers resource and complains with "Cannot determine region: set in this resource, or set provider-level 'region' or 'zone'." This commit fixes the issue by explicitly setting the region for the google_compute_target_pool.workers resource.	2019-04-13 11:53:56 -07:00
Dalton Hubble	452253081b	Update Kubernetes from v1.14.0 to v1.14.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#changelog-since-v1140	2019-04-09 21:47:23 -07:00
Dalton Hubble	be29f52039	Add enable_aggregation option (defaults to false) * Add an `enable_aggregation` variable to enable the kube-apiserver aggregation layer for adding extension apiservers to clusters * Aggregation is disabled by default. Typhoon recommends you not enable aggregation. Consider whether less invasive ways to achieve your goals are possible and whether those goals are well-founded * Enabling aggregation and extension apiservers increases the attack surface of a cluster and makes extensions a part of the control plane. Admins must scrutinize and trust any extension apiserver used. * Passing a v1.14 CNCF conformance test requires aggregation be enabled. Having an option for aggregation keeps compliance, but retains the stricter security posture on default clusters	2019-04-07 12:00:38 -07:00
Dalton Hubble	5271e410eb	Update Kubernetes from v1.13.5 to v1.14.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#v1140	2019-04-07 00:15:59 -07:00
Dalton Hubble	aaa8e0261a	Add Google Cloud worker instances to a target pool * Background: A managed instance group of workers is used in backend services for global load balancing (HTTP/HTTPS Ingress) and output for custom global load balancing use cases * Add worker instances to a target pool load balancing TCP/UDP applications (NodePort or proxied). Output as `worker_target_pool` * Health check for workers with a healthy Ingress controller. Forward rules (regional) to target pools don't support different external and internal ports so choosing nodes with Ingress allows proxying as a workaround * A target pool is a logical grouping only. It doesn't add costs to clusters or worker pools	2019-04-01 21:03:48 -07:00
Dalton Hubble	b3ec5f73e3	Update Calico from v3.6.0 to v3.6.1 * https://docs.projectcalico.org/v3.6/release-notes/	2019-03-31 17:43:43 -07:00
Dalton Hubble	46196af500	Remove Haswell minimum CPU platform requirement * Google Cloud API implements `min_cpu_platform` to mean "use exactly this CPU" * Fix error creating clusters in newer regions lacking Haswell platform (e.g. europe-west2) (#438) * Reverts #405, added in v1.13.4 * Original goal of ignoring old Ivy/Sandy bridge CPUs in older regions will be achieved shortly anyway. Google Cloud is deprecating those CPUs in April 2019 * https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform#how_selecting_a_minimum_cpu_platform_works	2019-03-27 19:51:32 -07:00
Dalton Hubble	4fea526ebf	Update Kubernetes from v1.13.4 to v1.13.5 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1135	2019-03-25 21:43:47 -07:00
Dalton Hubble	1feefbe9c6	Update Calico from v3.5.2 to v3.6.0 * Add calico-ipam CRDs and RBAC permissions * Switch IPAM from host-local to calico-ipam * `calico-ipam` subnets `ippools` (defaults to pod CIDR) into `ipamblocks` (defaults to /26, but set to /24 in Typhoon) * `host-local` subnets the pod CIDR based on the node PodCIDR field (set via kube-controller-manager as /24's) * Create a custom default IPv4 IPPool to ensure the block size is kept at /24 to allow 110 pods per node (Kubernetes default) * Retaining host-local was slightly preferred, but Calico v3.6 is migrating all usage to calico-ipam. The codepath that skipped calico-ipam for KDD was removed * https://docs.projectcalico.org/v3.6/release-notes/	2019-03-19 22:49:56 -07:00
Dalton Hubble	2019177b6b	Fix implicit map assignments to be explicit * Terraform v0.12 will require map assignments be explicit, part of v0.12 readiness	2019-03-12 01:19:54 -07:00
Dalton Hubble	deec512c14	Resolve in-addr.arpa and ip6.arpa zones with CoreDNS kubernetes plugin * Resolve in-addr.arpa and ip6.arpa DNS PTR requests for Kubernetes service IPs and pod IPs * Previously, CoreDNS was configured to resolve in-addr.arpa PTR records for service IPs (but not pod IPs)	2019-03-04 23:03:00 -08:00
Dalton Hubble	f598307998	Update Kubernetes from v1.13.3 to v1.13.4 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1134	2019-02-28 22:47:43 -08:00
Dalton Hubble	73ae5d5649	Update Calico from v3.5.1 to v3.5.2 * https://docs.projectcalico.org/v3.5/releases/	2019-02-25 21:23:13 -08:00
Dalton Hubble	42d7222f3d	Add a readiness probe to CoreDNS * https://github.com/poseidon/terraform-render-bootkube/pull/115	2019-02-23 13:25:23 -08:00
Dalton Hubble	7f8572030d	Upgrade to support terraform-provider-google v2.0+ * Support terraform-provider-google v1.19.0, v1.19.1, v1.20.0 and v2.0+ (and allow for future 2.x.y releases) * Require terraform-provider-google v1.19.0 or newer. v1.19.0 introduced `network_interface` fields `network_ip` and `nat_ip` to deprecate `address` and `assigned_nat_ip`. Those deprecated fields are removed in terraform-provider-google v2.0 * https://github.com/terraform-providers/terraform-provider-google/releases/tag/v2.0.0	2019-02-20 02:33:32 -08:00
Dalton Hubble	4294bd0292	Assign Pod Priority classes to critical cluster and node components * Assign pod priorityClassNames to critical cluster and node components (higher is higher priority) to inform node out-of-resource eviction order and scheduler preemption and scheduling order * Priority Admission Controller has been enabled since Typhoon v1.11.1	2019-02-19 22:21:39 -08:00
Dalton Hubble	ba4c5de052	Set the Google Cloud minimum CPU platform to Intel Haswell * Intel Haswell or better is available in every zone around the world * Neither Kubernetes nor Typhoon have a particular minimum processor family. However, a few Google Cloud zones still default to Sandy/Ivy bridge (scheduled to shift April 2019). Price is only based on machine type so it is beneficial to opt for the next processor family * Intel Haswell is a suitable minimum since it still allows plenty of liberty in choosing any region or machine type * Likely a slight increase to preemption probability in a few zones, but any lower probability on Sandy/Ivy bridge is due to lower desirability as they're phased out * https://cloud.google.com/compute/docs/regions-zones/	2019-02-18 12:55:04 -08:00
Dalton Hubble	584088397c	Update etcd from v3.3.11 to v3.3.12 * https://github.com/etcd-io/etcd/releases/tag/v3.3.12	2019-02-09 11:54:54 -08:00
Dalton Hubble	0200058e0e	Update Calico from v3.5.0 to v3.5.1 * Fix in confd https://github.com/projectcalico/confd/pull/205	2019-02-09 11:49:31 -08:00
Dalton Hubble	ccd96c37da	Update Kubernetes from v1.13.2 to v1.13.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1133	2019-02-01 23:26:13 -08:00
Dalton Hubble	244a1a601a	Switch CoreDNS to use the forward plugin instead of proxy * Use the forward plugin to forward to upstream resolvers, instead of the proxy plugin. The forward plugin is reported to be a faster alternative since it can re-use open sockets * https://coredns.io/explugins/forward/ * https://coredns.io/plugins/proxy/ * https://github.com/kubernetes/kubernetes/issues/73254	2019-01-30 22:25:23 -08:00
Dalton Hubble	1ab06f69d7	Update flannel from v0.10.0 to v0.11.0 * https://github.com/coreos/flannel/releases/tag/v0.11.0	2019-01-29 21:51:25 -08:00
Dalton Hubble	e9659a8539	Update Calico from v3.4.0 to v3.5.0 * https://docs.projectcalico.org/v3.5/releases/	2019-01-27 16:34:30 -08:00
Dalton Hubble	f4d3508578	Update CoreDNS from v1.3.0 to v1.3.1 * https://coredns.io/2019/01/13/coredns-1.3.1-release/	2019-01-15 22:50:25 -08:00
Dalton Hubble	7eafa59d8f	Fix instance shutdown automatic worker deletion on clouds * Fix a regression caused by lowering the Kubelet TLS client certificate to system:nodes group (#100) since dropping cluster-admin dropped the Kubelet's ability to delete nodes. * On clouds where workers can scale down (manual terraform apply, AWS spot termination, Azure low priority deletion), worker shutdown runs the delete-node.service to remove a node to prevent NotReady nodes from accumulating * Allow Kubelets to delete cluster nodes via system:nodes group. Kubelets acting with system:node and kubelet-delete ClusterRoles is still an improvement over acting as cluster-admin	2019-01-14 23:27:48 -08:00
Dalton Hubble	b74cc8afd2	Update etcd from v3.3.10 to v3.3.11 * https://github.com/etcd-io/etcd/releases/tag/v3.3.11	2019-01-12 14:17:25 -08:00
Dalton Hubble	4d32b79c6f	Update Kubernetes from v1.13.1 to v1.13.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1132	2019-01-12 00:00:53 -08:00
Dalton Hubble	df4c0ba05d	Use HTTPS liveness probes for kube-scheduler and kube-controller-manager * Disable kube-scheduler and kube-controller-manager HTTP ports	2019-01-09 20:56:50 -08:00
Dalton Hubble	bfe0c74793	Enable the certificates.k8s.io API to issue cluster certificates * System components that require certificates signed by the cluster CA can submit a CSR to the apiserver, have an administrator inspect and approve it, and be issued a certificate * Configure kube-controller-manager to sign Approved CSR's using the cluster CA private key * Admins are responsible for approving or denying CSRs, otherwise, no certificate is issued. Read the Kubernetes docs carefully and verify the entity making the request and the authorization level * https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster	2019-01-06 17:33:37 -08:00
Dalton Hubble	60c70797ec	Use a single format of the admin kubeconfig * Use a single admin kubeconfig for initial bootkube bootstrap and for use by a human admin. Previously, an admin kubeconfig without a named context was used for bootstrap and direct usage with KUBECONFIG=path, while one with a named context was used for `kubectl config use-context` style usage. Confusing. * Provide the admin kubeconfig via `assets/auth/kubeconfig`, `assets/auth/CLUSTER-config`, or output `kubeconfig-admin`	2019-01-05 14:57:18 -08:00
Dalton Hubble	6795a753ea	Update CoreDNS from v1.2.6 to v1.3.0 * https://coredns.io/2018/12/15/coredns-1.3.0-release/	2019-01-05 13:35:03 -08:00
Dalton Hubble	b57273b6f1	Rename internal kube_dns_service_ip to cluster_dns_service_ip * terraform-render-bootkube module deprecated kube_dns_service_ip output in favor of cluster_dns_service_ip * Rename k8s_dns_service_ip to cluster_dns_service_ip for consistency too	2019-01-05 13:32:03 -08:00
Dalton Hubble	812a1adb49	Use a lower-privilege Kubelet kubeconfig in system:nodes * Kubelets can use a lower-privilege TLS client certificate with Org system:nodes and a binding to the system:node ClusterRole * Admin kubeconfig's continue to belong to Org system:masters to provide cluster-admin (available in assets/auth/kubeconfig or as a Terraform output kubeconfig-admin) * Remove bare-metal output variable kubeconfig	2019-01-05 13:08:56 -08:00

1 2 3 4 5

224 Commits