typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-03 20:14:37 +02:00

Author	SHA1	Message	Date
Dalton Hubble	d641a058fe	Update Calico from v3.2.3 to v3.3.0 * https://docs.projectcalico.org/v3.3/releases/	2018-10-23 20:30:30 -07:00
Dalton Hubble	99a6d5478b	Disable Kubelet read-only port 10255 * We can finally disable the Kubelet read-only port 10255! * Journey: https://github.com/poseidon/typhoon/issues/322#issuecomment-431073073	2018-10-18 21:14:14 -07:00
Dalton Hubble	d55bfd5589	Fix CoreDNS AntiAffinity spec to prefer spreading replicas * Pods were still being scheduled at random due to a typo	2018-10-17 22:19:57 -07:00
Dalton Hubble	9b6113a058	Update Kubernetes from v1.11.3 to v1.12.1 * Mount an empty dir for the controller-manager to work around https://github.com/kubernetes/kubernetes/issues/68973 * Update coreos/pod-checkpointer to strip affinity from checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced a default affinity that appears on checkpointed manifests; but it prevented scheduling and checkpointed pods should not have an affinity, they're run directly by the Kubelet on the local node * https://github.com/kubernetes-incubator/bootkube/issues/1001 * https://github.com/kubernetes/kubernetes/pull/68173	2018-10-16 20:28:13 -07:00
Dalton Hubble	5eb4078d68	Add docker/default seccomp to control plane and addons * Annotate pods, deployments, and daemonsets to start containers with the Docker runtime's default seccomp profile * Overrides Kubernetes default behavior which started containers with seccomp=unconfined * https://docs.docker.com/engine/security/seccomp/#pass-a-profile-for-a-container	2018-10-16 20:07:29 -07:00
Dalton Hubble	55bb4dfba6	Raise CoreDNS replica count to 2 or more * Run at least two replicas of CoreDNS to better support rolling updates (previously, kube-dns had a pod nanny) * On multi-master clusters, set the CoreDNS replica count to match the number of masters (e.g. a 3-master cluster previously used replicas:1, now replicas:3) * Add AntiAffinity preferred rule to favor distributing CoreDNS pods across controller nodes nodes	2018-10-13 20:31:29 -07:00
Dalton Hubble	43fe78a2cc	Raise scheduler/controller-manager replicas in multi-master * Continue to ensure scheduler and controller-manager run at least two replicas to support performing kubectl edits on single-master clusters (no change) * For multi-master clusters, set scheduler / controller-manager replica count to the number of masters (e.g. a 3-master cluster previously used replicas:2, now replicas:3)	2018-10-13 16:16:29 -07:00
Dalton Hubble	5a283b6443	Update etcd from v3.3.9 to v3.3.10 * https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.3.md#v3310-2018-10-10	2018-10-13 13:14:37 -07:00
Dalton Hubble	7653e511be	Update CoreDNS and Calico versions * Update CoreDNS from 1.1.3 to 1.2.2 * Update Calico from v3.2.1 to v3.2.3	2018-10-02 16:07:48 +02:00
Dalton Hubble	ad871dbfa9	Update Kubernetes from v1.11.2 to v1.11.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1113	2018-09-13 18:50:41 -07:00
Dalton Hubble	7eb09237f4	Update Calico from v3.1.3 to v3.2.1 * Add new bird and felix readiness checks * Read MTU from ConfigMap veth_mtu * Add RBAC read for serviceaccounts * Remove invalid description from CRDs	2018-08-25 17:53:11 -07:00
Dalton Hubble	e58b424882	Fix firewall to allow etcd client traffic between controllers * Broaden internal-etcd firewall rule to allow etcd client traffic (2379) from other controller nodes * Previously, kube-apiservers were only able to connect to their node's local etcd peer. While master node outages were tolerated, reaching a healthy peer took longer than neccessary in some cases * Reduce time needed to bootstrap a cluster	2018-08-21 23:51:40 -07:00
Dalton Hubble	b8eeafe4f9	Template etcd_servers list to replace null_resource.repeat * Remove the last usage of null_resource.repeat, which has always been an eyesore for creating the etcd server list * Originally, #224 switched to templating the etcd_servers list for all clouds, but had to revert on GCP in #237 * https://github.com/poseidon/typhoon/pull/224 * https://github.com/poseidon/typhoon/pull/237	2018-08-21 22:46:24 -07:00
Dalton Hubble	bdf1e6986e	Fix terraform fmt	2018-08-21 21:59:55 -07:00
Dalton Hubble	da5d2c5321	Remove GCP firewall rule allowing Nginx Ingress health * Nginx Ingress addon no longer uses hostNework so Prometheus may scrape port 10254 via the CNI network, rather than via the host address	2018-08-21 21:06:03 -07:00
Dalton Hubble	bceec9fdf5	Sort firewall / security rules and add comments * No functional changes to network firewalls	2018-08-21 20:53:16 -07:00
Dalton Hubble	f7ebdf475d	Update Kubernetes from v1.11.1 to v1.11.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1112	2018-08-07 21:57:25 -07:00
Dalton Hubble	db64ce3312	Update etcd from v3.3.8 to v3.3.9 * https://github.com/coreos/etcd/blob/master/CHANGELOG-3.3.md#v339-2018-07-24	2018-07-29 11:27:37 -07:00
Dalton Hubble	7c327b8bf4	Update from bootkube v0.12.0 to v0.13.0	2018-07-29 11:20:17 -07:00
Dalton Hubble	d8d524d10b	Update Kubernetes from v1.11.0 to v1.11.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1111	2018-07-20 00:41:27 -07:00
Dalton Hubble	6f958d7577	Replace kube-dns with CoreDNS * Add system:coredns ClusterRole and binding * Annotate CoreDNS for Prometheus metrics scraping * Remove kube-dns deployment, service, & service account * https://github.com/poseidon/terraform-render-bootkube/pull/71 * https://kubernetes.io/blog/2018/06/27/kubernetes-1.11-release-announcement/	2018-07-01 22:55:01 -07:00
Dalton Hubble	fd1de27aef	Remove deprecated ingress_static_ip and controllers_ipv4_public outputs	2018-07-01 20:47:46 -07:00
Dalton Hubble	8464b258d8	Update Kubernetes from v1.10.5 to v1.11.0 * Force apiserver to stop listening on 127.0.0.1:8080 * Remove deprecated Kubelet `--allow-privileged`. Defaults to true. Use `PodSecurityPolicy` if limiting is desired * https://github.com/kubernetes/kubernetes/releases/tag/v1.11.0 * https://github.com/poseidon/terraform-render-bootkube/pull/68	2018-06-27 22:47:35 -07:00
Dalton Hubble	0c4d59db87	Use global HTTP/TCP proxy load balancing for Ingress on GCP * Switch Ingress from regional network load balancers to global HTTP/TCP Proxy load balancing * Reduce cost by ~$19/month per cluster. Google bills the first 5 global and regional forwarding rules separately. Typhoon clusters now use 3 global and 0 regional forwarding rules. * Worker pools no longer include an extraneous load balancer. Remove worker module's `ingress_static_ip` output. * Add `ingress_static_ipv4` output variable * Add `worker_instance_group` output to allow custom global load balancing * Deprecate `controllers_ipv4_public` module output * Deprecate `ingress_static_ip` module output. Use `ingress_static_ipv4`	2018-06-23 14:37:40 -07:00
Dalton Hubble	0227014fa0	Fix terraform formatting	2018-06-22 00:28:36 -07:00
Dalton Hubble	f4d3059b00	Update Kubernetes from v1.10.4 to v1.10.5 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1105	2018-06-21 22:51:39 -07:00
Dalton Hubble	6c5a1964aa	Change kube-apiserver port from 443 to 6443 * Adjust firewall rules, security groups, cloud load balancers, and generated kubeconfig's * Facilitates some future simplifications and cost reductions * Bare-Metal users who exposed kube-apiserver on a WAN via their router or load balancer will need to adjust its configuration. This is uncommon, most apiserver are on LAN and/or behind VPN so no routing infrastructure is configured with the port number	2018-06-19 23:48:51 -07:00
Dalton Hubble	6e64634748	Update etcd from v3.3.7 to v3.3.8 * https://github.com/coreos/etcd/releases/tag/v3.3.8	2018-06-19 21:56:21 -07:00
Dalton Hubble	51906bf398	Update etcd from v3.3.6 to v3.3.7	2018-06-14 22:46:16 -07:00
Dalton Hubble	6676484490	Partially revert b7ed6e7bd35cee39a3f65b47e731938c3006b5cd * Fix change that broke Google Cloud container-linux and fedora-atomic https://github.com/poseidon/typhoon/pull/224	2018-06-06 23:48:37 -07:00
Dalton Hubble	79260c48f6	Update Kubernetes from v1.10.3 to v1.10.4	2018-06-06 23:23:11 -07:00
Dalton Hubble	589c3569b7	Update etcd from v3.3.5 to v3.3.6 * https://github.com/coreos/etcd/releases/tag/v3.3.6	2018-06-06 23:19:30 -07:00
Dalton Hubble	6e968cd152	Update Calico from v3.1.2 to v3.1.3 * https://github.com/projectcalico/calico/releases/tag/v3.1.3 * https://github.com/projectcalico/cni-plugin/releases/tag/v3.1.3	2018-05-30 21:32:12 -07:00
Ben Drucker	6a581ab577	Render etcd_initial_cluster using a template_file	2018-05-30 21:14:49 -07:00
Dalton Hubble	4ea1fde9c5	Update Kubernetes from v1.10.2 to v1.10.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1103 * Update Calico from v3.1.1 to v3.1.2	2018-05-21 21:38:43 -07:00
Dalton Hubble	28d0891729	Annotate nginx-ingress addon for Prometheus auto-discovery * Add Google Cloud firewall rule to allow worker to worker access to health and metrics	2018-05-19 13:13:14 -07:00
William Zhang	2ae126bf68	Fix README link to tutorial	2018-05-19 13:10:22 -07:00
Dalton Hubble	c2b719dc75	Configure Prometheus to scrape Kubelets directly * Use Kubelet bearer token authn/authz to scrape metrics * Drop RBAC permission from nodes/proxy to nodes/metrics * Stop proxying kubelet scrapes through the apiserver, since this required higher privilege (nodes/proxy) and can add load to the apiserver on large clusters	2018-05-14 23:06:50 -07:00
Dalton Hubble	37981f9fb1	Allow bearer token authn/authz to the Kubelet * Require Webhook authorization to the Kubelet * Switch apiserver X509 client cert org to systems:masters to grant the apiserver admin and satisfy the authorization requirement. kubectl commands like logs or exec that have the apiserver make requests of a kubelet continue to work as before * https://kubernetes.io/docs/admin/kubelet-authentication-authorization/ * https://github.com/poseidon/typhoon/issues/215	2018-05-13 23:20:42 -07:00
Dalton Hubble	f2ee75ac98	Require Terraform v0.11.x, drop v0.10.x support * Raise minimum Terraform version to v0.11.0 * Terraform v0.11.x has been supported since Typhoon v1.9.2 and Terraform v0.10.x was last released in Nov 2017. I'd like to stop worrying about v0.10.x and remove migration docs as a later followup * Migration docs docs/topics/maintenance.md#terraform-v011x	2018-05-10 02:20:46 -07:00
Dalton Hubble	8b8e364915	Update etcd from v3.3.4 to v3.3.5 * https://github.com/coreos/etcd/releases/tag/v3.3.5	2018-05-10 02:12:53 -07:00
Dalton Hubble	9d4cbb38f6	Rerun terraform fmt	2018-05-01 21:41:22 -07:00
Dalton Hubble	e889430926	Update kube-dns from v1.14.9 to v1.14.10 * https://github.com/kubernetes/kubernetes/pull/62676	2018-04-28 00:43:09 -07:00
Dalton Hubble	32ddfa94e1	Update Kubernetes from v1.10.1 to v1.10.2 * https://github.com/kubernetes/kubernetes/releases/tag/v1.10.2	2018-04-28 00:27:00 -07:00
Dalton Hubble	681450aa0d	Update etcd from v3.3.3 to v3.3.4 * https://github.com/coreos/etcd/releases/tag/v3.3.4	2018-04-27 23:57:26 -07:00
Dalton Hubble	a54f76db2a	Update Calico from v3.0.4 to v3.1.1 * https://github.com/projectcalico/calico/releases/tag/v3.1.1 * https://github.com/projectcalico/calico/releases/tag/v3.1.0	2018-04-21 18:30:36 -07:00
Dalton Hubble	ad2e4311d1	Switch GCP network lb to global TCP proxy lb * Allow multi-controller clusters on Google Cloud * GCP regional network load balancers have a long open bug in which requests originating from a backend instance are routed to the instance itself, regardless of whether the health check passes or not. As a result, only the 0th controller node registers. We've recommended just using single master GCP clusters for a while * https://issuetracker.google.com/issues/67366622 * Workaround issue by switching to a GCP TCP Proxy load balancer. TCP proxy lb routes traffic to a backend service (global) of instance group backends. In our case, spread controllers across 3 zones (all regions have 3+ zones) and organize them in 3 zonal unmanaged instance groups that serve as backends. Allows multi-controller cluster creation * GCP network load balancers only allowed legacy HTTP health checks so kubelet 10255 was checked as an approximation of controller health. Replace with TCP apiserver health checks to detect unhealth or unresponsive apiservers. * Drawbacks: GCP provision time increases, tailed logs now timeout (similar tradeoff in AWS), controllers only span 3 zones instead of the exact number in the region * Workaround in Typhoon has been known and posted for 5 months, but there still appears to be no better alternative. Its probably time to support multi-master and accept the downsides	2018-04-18 00:09:06 -07:00
Dalton Hubble	77c0a4cf2e	Update Kubernetes from v1.10.0 to v1.10.1 * Use kubernetes-incubator/bootkube v0.12.0	2018-04-12 20:57:31 -07:00
Dalton Hubble	5035d56db2	Refactor GCP to remove controller internal module * Remove the controller internal module to align with other platforms and since its not a supported use case	2018-04-12 19:41:51 -07:00
Dalton Hubble	9bb3de5327	Skip creating unused dirs on worker nodes	2018-04-11 22:23:51 -07:00

1 2 3 4

153 Commits