Commit Graph

197 Commits

Author SHA1 Message Date
Dalton Hubble 159443bae7 addons: Add better alerting rules to Prometheus manifests
* Adapt the coreos/prometheus-operator alerting rules for Typhoon,
https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus/manifests
* Add controller manager and scheduler shim services to let
prometheus discover them via service endpoints
* Fix several alert rules to use service endpoint discovery
* A few rules still don't do much, but they default to green
2017-11-10 20:57:47 -08:00
Dalton Hubble 119dc859d3 addons: Update nginx-ingress to 0.9.0-beta.17
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.9.0-beta.17
2017-11-10 20:16:40 -08:00
Dalton Hubble 5f6b0728c5 Update bootkube and terraform-render-bootkube to v0.8.2 2017-11-10 20:01:37 -08:00
Dalton Hubble d774c51297 Update Kubernetes from v1.8.2 to v1.8.3 2017-11-08 23:34:19 -08:00
Dalton Hubble f6a8fb363e Remove deprecated kubelet --require-kubeconfig flag
* https://github.com/kubernetes/kubernetes/pull/40050
2017-11-08 23:34:19 -08:00
Dalton Hubble f570af9418 addons: Update from Prometheus v1.8.2 to v2.0.0 2017-11-08 22:48:23 -08:00
Dalton Hubble 4ec6732b98 Output the Google network name and self_link
* Allow users to add custom firewall rules for unique cases
2017-11-08 00:19:49 -08:00
Dalton Hubble ea1efb536a Remove old firewall rule for bootstrap self-hosted etcd 2017-11-08 00:15:20 -08:00
Dalton Hubble 451fd86470 Improve internal firewall rules on Google Cloud
* Whitelist internal traffic between controllers and workers
* Switch to tag-based firewall policies rather than source IP
2017-11-08 00:15:06 -08:00
Dalton Hubble b1b611b22c Add docs to use one controller on Google Cloud 2017-11-07 19:51:03 -08:00
Dalton Hubble eabf00fbf1 Add missing controller dependency before bootkube start
* Require the controller module to be completed before starting
to remote exec bootkube start, otherwise its possible the controller
nodes were created, but not the network load balancer
2017-11-07 19:12:05 -08:00
Dalton Hubble 8eaa72c1ca addons: Update nginx-ingress to 0.9.0-beta.16
* Image registry changed from gcr.io to quay.io
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.9.0-beta.16
2017-11-06 23:15:15 -08:00
Dalton Hubble 58cf82da56 Promote AWS platform from alpha to beta 2017-11-06 21:38:24 -08:00
Dalton Hubble ccc832f468 Add firewall rule to allow apiserver to proxy other controller kubelets
* Prometheus proxies through the apiserver to scrape kubelets
* In multi-controller setups, an apiserver must be able to scrape
kubelets (10250) on other controllers
2017-11-06 01:03:53 -08:00
Dalton Hubble 90f8d62204 Add firewall rules to allow prometheus to reach node-exporter
* node_exporter service endpoints run on hostNetwork port 9100
* Re-evaluate after https://github.com/kubernetes-incubator/bootkube/pull/711
2017-11-06 01:03:53 -08:00
Dalton Hubble af5c413abf Focus controller ELB on load balancing apiservers
* ELB distributing load across controllers is no longer the mechanism
used to SSH to instances to distribute secrets
* Focus the ELB on load balancing across apiserver and edit the HTTP
health check to an SSL:443 check
2017-11-06 01:03:53 -08:00
Dalton Hubble 168c487484 Remove mention of self-hosted etcd, its deprecated 2017-11-06 01:03:53 -08:00
Dalton Hubble 805dd772a8 Run etcd cluster on-host, across controllers on AWS
* Change controllers ASG to heterogeneous EC2 instances
* Create DNS records for each controller's private IP for etcd
* Change etcd to run on-host, across controllers (etcd-member.service)
* Reduce time to bootstrap a cluster
* Deprecate self-hosted-etcd on the AWS platform
2017-11-06 01:03:53 -08:00
Dalton Hubble c6ec6596d8 Minor cleanup for zones, docs, and outputs
* Spread across all zones, regardless of UP/DOWN state
* Remove unused outputs of private IPs
2017-11-06 00:56:26 -08:00
Dalton Hubble 47a9989927 Fix null_resource ordering constraints
* Ensure etcd TLS assets and kubeconfig are copied before
any attempt is made to run bootkube start
2017-11-06 00:55:44 -08:00
Dalton Hubble 10b977d54a addons: Set kube-state-metrics to have clusterIP None
* kube-state-metrics service exists to facilitate prometheus discovery
2017-11-05 17:54:09 -08:00
Dalton Hubble b7a268fc45 addons: Add prometheus alertmanager flag
* Pass -alertmanager.url to work with a user's in-cluster
alertmanager deployment, if any
2017-11-05 15:50:46 -08:00
Dalton Hubble 279f36effd addons: Add grafana 4.6.1 and extend prometheus docs 2017-11-05 15:23:56 -08:00
Dalton Hubble 77fc14db71 Workaround target pool issue by listing instances as zone/name
* Instances can be listed by zone/name or self_link URL, but the
provider desires they be in zone/name form, which causes a diff
* https://github.com/terraform-providers/terraform-provider-google/issues/46
2017-11-05 14:07:05 -08:00
Dalton Hubble 2b0296d671 Create controller instances across zones in the region
* Change controller instances to automatically span zones in a region
* Remove the `zone` required variable
2017-11-05 13:24:32 -08:00
Dalton Hubble 7b38271212 Run etcd cluster on-host, across controllers on Google Cloud
* Change controllers from a managed group to individual instances
* Create discrete DNS records to each controller's private IP for etcd
* Change etcd to run on-host, across controllers (etcd-member.service)
* Reduce time to bootstrap a cluster
* Deprecate self-hosted-etcd on the Google Cloud platform
2017-11-05 11:03:35 -08:00
Dalton Hubble ae07a21e3d addons: Omit static resource requests/limits for kube-state-metrics
* Allow the addon-resizer to dynamically set resource values
* https://github.com/kubernetes/kube-state-metrics/pull/285
2017-11-04 14:41:04 -07:00
Dalton Hubble 0ab1ae3210 addons: Fix typo in kube-state-metrics strategy 2017-11-04 14:39:56 -07:00
Dalton Hubble 67e3d2b86e docs: GCE network bandwidth is excellent, even btw zones
* Remove performance note that the GCE vs AWS network performance
is not an equal comparison. On both platforms, workers now span the
(availability) zones of a region.
* Testing host-to-host and pod-to-pod network bandwidth between nodes
(now located in different zones) showed no reduction in bandwidth
2017-11-04 14:08:20 -07:00
Dalton Hubble a48dd9ebd8 Require google provider version ~> 1.1
* Require google provider plugin 1.1 or higher which includes fix:
https://github.com/terraform-providers/terraform-provider-google/issues/574
* Remove workaround which statically set the persistent disk name
* Original reasons for workaround in a97df839 or GH #34
2017-11-04 12:59:19 -07:00
Dalton Hubble 26a291aef4 Remove controller_preemptible option on Google Cloud
* Controller preemption is not safe or covered in documentation. Delete
the option, the variable is a holdover from old experiments
* Note, worker_preemeptible is still a great feature that's supported
2017-11-04 12:59:19 -07:00
Dalton Hubble 251a14519f Fix typo in internal template variable name
* ssh_authorized_keys should be ssh_authorized_key to match the user
facing variable which only allows a single SSH authorized key
2017-11-04 12:59:19 -07:00
Dalton Hubble 6300383b43 Change worker managed instance group to span zones in region
* Change Google Cloud module to require the `region` variable
* Workers are created in random zones within the given region
* Tolerate Google Cloud zone failures or capacity issues
* If workers are preempted (if enabled), replacement instances can
be drawn from any zone in the region, which should avoid scheduling
issues that were possible before if a single zone aggressively
preempts instances (presumably due to Google Cloud capacity)
2017-11-04 12:59:19 -07:00
Dalton Hubble e32885c9cd addons: Update prometheus from v1.8.0 to v1.8.2
* https://github.com/prometheus/prometheus/releases/tag/v1.8.2
2017-11-04 11:00:39 -07:00
Dalton Hubble fe8afdbee9 Update Typhoon logo and favicon 2017-11-04 01:20:17 -07:00
Dalton Hubble 878f5a3647 Bump bootkube and terraform-render-bootkube to v0.8.1
* Use the v0.8.1 tagged terraform-render-bootkube module
* Use the v0.8.1 quay.io/coreos/bootkube image to bootstrap
2017-10-28 12:50:37 -07:00
Dalton Hubble 34ec7e9862 Relax pessimistic constraints on 1.0+ providers
* Constrains ~> 1.0 means users can use 1.0.1, 1.1, but not 2.0
* https://www.terraform.io/docs/configuration/terraform.html
2017-10-25 23:27:28 -07:00
Dalton Hubble f6c6e85f84 Require minimum Terraform and plugin versions
* Bump minimum Terraform version to v0.10.4
* Allow minor version updates for 1.0+ plugins
* Fix versions for plugins which are pre-1.0
2017-10-25 23:00:31 -07:00
Dalton Hubble 8582e19077 Expand Nginx Ingress liveness and readiness probes
* Remove dnsPolicy: ClusterFirst
* https://github.com/kubernetes/ingress-nginx/pull/1584
2017-10-25 22:29:20 -07:00
Dalton Hubble 3727c40c6c Update Nginx Ingress defaultbackend from 1.0 to 1.4
* https://github.com/kubernetes/ingress-nginx/pull/1568
2017-10-25 22:16:23 -07:00
Dalton Hubble b608f9c615 addons: Use service endpoints to scrape node-exporter 2017-10-24 22:59:00 -07:00
Dalton Hubble ec1dbb853c addons: Include kube-state-metrics exporter manifests 2017-10-24 22:59:00 -07:00
Dalton Hubble d046d45769 addons: Include Prometheus and node-exporter manifests 2017-10-24 22:58:59 -07:00
Dalton Hubble a73f57fe4e Update CLUO from v0.4.0 to v0.4.1 2017-10-24 22:14:03 -07:00
Dalton Hubble 60bc8957c9 Update Kubernetes from v1.8.1 to v1.8.2
* Kubernetes v1.8.2 fixes a memory leak in the v1.8.1 apiserver
* Switch to using the `gcr.io/google_containers/hyperkube` for the
on-host kubelet and shutdown drains
* Update terraform-render-bootkube manifests generation
  * Update flannel from v0.8.0 to v0.9.0
  * Add `hairpinMode` to flannel CNI config
  * Add `--no-negcache` to kube-dns dnsmasq
2017-10-24 21:44:26 -07:00
Dalton Hubble 8b78c65483 Update Google Cloud Kubernetes from v1.7.7 to v1.8.1 2017-10-20 16:09:11 -07:00
Dalton Hubble f86c00288f Add missing update-agent RBAC role to get pods
* Drain now gets pods, deletes pods, and waits for deletion
2017-10-20 01:21:46 -07:00
Dalton Hubble a57b3cf973 Update CLUO addon to v0.4.0 and RBAC ClusterRole 2017-10-20 00:40:17 -07:00
Dalton Hubble 10c5487ad7 Add docs corrections for versions and log output 2017-10-20 00:39:17 -07:00
Dalton Hubble e4c479554c Update AWS, DO, BM Kubernetes from v1.7.7 to v1.8.1
* Update from bootkube v0.7.0 to v0.8.0
* Leave Google Cloud update to a followup commit
2017-10-19 21:10:04 -07:00