Commit Graph

616 Commits

Author SHA1 Message Date
Dalton Hubble
a54f76db2a Update Calico from v3.0.4 to v3.1.1
* https://github.com/projectcalico/calico/releases/tag/v3.1.1
* https://github.com/projectcalico/calico/releases/tag/v3.1.0
2018-04-21 18:30:36 -07:00
Dalton Hubble
e0d9e9979c Update nginx-ingress from 0.12.0 to 0.13.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.13.0
2018-04-18 21:12:09 -07:00
Dalton Hubble
ad2e4311d1 Switch GCP network lb to global TCP proxy lb
* Allow multi-controller clusters on Google Cloud
* GCP regional network load balancers have a long open
bug in which requests originating from a backend instance
are routed to the instance itself, regardless of whether
the health check passes or not. As a result, only the 0th
controller node registers. We've recommended just using
single master GCP clusters for a while
* https://issuetracker.google.com/issues/67366622
* Workaround issue by switching to a GCP TCP Proxy load
balancer. TCP proxy lb routes traffic to a backend service
(global) of instance group backends. In our case, spread
controllers across 3 zones (all regions have 3+ zones) and
organize them in 3 zonal unmanaged instance groups that
serve as backends. Allows multi-controller cluster creation
* GCP network load balancers only allowed legacy HTTP health
checks so kubelet 10255 was checked as an approximation of
controller health. Replace with TCP apiserver health checks
to detect unhealth or unresponsive apiservers.
* Drawbacks: GCP provision time increases, tailed logs now
timeout (similar tradeoff in AWS), controllers only span 3
zones instead of the exact number in the region
* Workaround in Typhoon has been known and posted for 5 months,
but there still appears to be no better alternative. Its
probably time to support multi-master and accept the downsides
2018-04-18 00:09:06 -07:00
@luke
490b628e2d Use relative image links to appear in Github markdown 2018-04-17 23:40:58 -07:00
Dalton Hubble
23a8156bdf Fix a few typos in comments 2018-04-15 17:21:49 -07:00
Dalton Hubble
9789881243 Update kube-state-metrics from v1.3.0 to v1.3.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.3.1
2018-04-15 17:10:02 -07:00
Dalton Hubble
77c0a4cf2e Update Kubernetes from v1.10.0 to v1.10.1
* Use kubernetes-incubator/bootkube v0.12.0
2018-04-12 20:57:31 -07:00
Dalton Hubble
5035d56db2 Refactor GCP to remove controller internal module
* Remove the controller internal module to align with
other platforms and since its not a supported use case
2018-04-12 19:41:51 -07:00
Dalton Hubble
9bb3de5327 Skip creating unused dirs on worker nodes 2018-04-11 22:23:51 -07:00
Dalton Hubble
c8eabc2af4 Fix GCP controller_type and worker_type vars 2018-04-11 22:19:58 -07:00
Matt Dorn
2eaf858c5c
Update example BGPPeer manifest
Previous example may have been outdated. It resulted in `error: unable to recognize "example.yaml": no matches for /, Kind=bgpPeer` .

See https://docs.projectcalico.org/v3.0/reference/calicoctl/resources/bgppeer.
2018-04-09 23:23:18 -05:00
Dalton Hubble
b8656fd74b Clarify bare-metal SSH instructions 2018-04-08 14:11:05 -07:00
Dalton Hubble
d276fffcda Fix bare-metal multiple apply/ssh on Terraform v0.11.4+
* Terraform v0.11.4 introduced changes to remote-exec
that mean Typhoon bare-metal clusters require multiple
runs of terraform apply to ssh and bootstrap.
* Bare-metal installs PXE boot a live instance to install
to disk and then reboot from disk as controllers/workers.
Terraform remote-exec has no way to "know" to wait until
the reboot has occurred to kickoff Kubernetes bootstrap.
Previously Typhoon created a "debug" user during this
install phase to allow an admin to SSH, but remote-exec
would hang, trying to connect as user "core". Terraform
v0.11.4 changes this behavior so remote-exec fails and
a user must re-run terraform apply until succeeding.
* A new way to "trick" remote-exec into waiting for the
reboot into the disk install is to run SSH on a non-standard
port during the disk install. This retains the ability
for an admin to SSH during install (most distros don't have
this) and fixes the issue so only a single run of terraform
apply is needed.
* https://github.com/hashicorp/terraform/pull/17359#issuecomment-376415464
2018-04-08 13:32:31 -07:00
Dalton Hubble
6b08bde479 Use k8s.gcr.io instead of gcr.io/google_containers
* Kubernetes recommends using the alias to fetch images
from the nearest GCR regional mirror, to abstract the use
of GCR, and to drop names containing 'google'
* https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ
2018-04-08 12:57:52 -07:00
Dalton Hubble
f4b2396718 Return Prometheus deployment to be a worker workload
* Expose etcd metrics to workers so Prometheus can
run on a worker, rather than a controller
* Drop temporary firewall rules allowing Prometheus
to run on a controller and scrape targes
* Related to https://github.com/poseidon/typhoon/pull/175
2018-04-08 12:20:00 -07:00
Dalton Hubble
b76126db93 Update docs builder and material theme 2018-04-08 00:00:03 -07:00
Dalton Hubble
7186aa46da Update kube-state-metrics from v1.2.0 to v1.3.0
* https://github.com/kubernetes/kube-state-metrics/pull/412
* https://github.com/kubernetes/kube-state-metrics/pull/413
2018-04-04 21:04:13 -07:00
Dalton Hubble
18dbaf74ce Update kube-dns from v1.14.8 to v1.14.9
* https://github.com/kubernetes/kubernetes/pull/61908
2018-04-04 21:00:23 -07:00
Dalton Hubble
ce001e9d56 Update etcd from v3.3.2 to v3.3.3
* https://github.com/coreos/etcd/releases/tag/v3.3.3
2018-04-04 20:32:24 -07:00
Dalton Hubble
d770393dbc Add etcd metrics, Prometheus scrapes, and Grafana dash
* Use etcd v3.3 --listen-metrics-urls to expose only metrics
data via http://0.0.0.0:2381 on controllers
* Add Prometheus discovery for etcd peers on controller nodes
* Temporarily drop two noisy Prometheus alerts
2018-04-03 20:31:00 -07:00
Dalton Hubble
642f7ec22f Update CHANGES.md with Kubernetes link 2018-03-30 23:12:38 -07:00
Dalton Hubble
1cc043d1eb Update Kubernetes from v1.9.6 to v1.10.0 2018-03-30 22:14:07 -07:00
Dalton Hubble
f8e9bfb1c0 Add disk_type variable for EBS volume type on AWS
* Change EBS volume type from `standard` ("prior generation)
 to `gp2`. Prometheus alerts are tuned for SSDs
* Other platforms have fast enough disks by default
2018-03-29 22:51:54 -07:00
Dalton Hubble
b1e41dcb99 addons: Update from Grafana v4.6.3 to v5.0.4
This reverts commit c59a9c66b1.
2018-03-28 19:45:19 -07:00
Dalton Hubble
de4d90750e Use consistent naming of remote provision steps 2018-03-26 00:29:57 -07:00
Dalton Hubble
7acd4931f6 Remove redundant kubeconfig copy on AWS and GCP
* AWS and Google Cloud make use of auto-scaling groups
and managed instance groups, respectively. As such, the
kubeconfig is already held in cloud user-data
* Controller instances are provisioned with a kubeconfig
from user-data. Its redundant to use a Terraform remote
file copy step for the kubeconfig.
2018-03-26 00:01:47 -07:00
Dalton Hubble
cfd603bea2 Ensure etcd secrets are only distributed to controller hosts
* Previously, etcd secrets were erroneously distributed to worker
nodes (permissions 500, ownership etc:etcd).
2018-03-25 23:46:44 -07:00
Dalton Hubble
fdb543e834 Add optional controller_type and worker_type vars on GCP
* Remove optional machine_type variable on Google Cloud
* Use controller_type and worker_type instead
2018-03-25 22:11:18 -07:00
Dalton Hubble
8d3d4220fd Add disk_size variable on Google Cloud 2018-03-25 22:04:14 -07:00
Dalton Hubble
ba9daf439e Remove unmaintained pxe-worker internal module 2018-03-25 21:57:52 -07:00
Dalton Hubble
38adb14bd2 Remove optional variable networking on Digital Ocean
* Calico isn't viable on Digital Ocean because their firewalls
do not support IP-IP protocol. Its not viable to run a cluster
without firewalls just to use Calico.
* Remove the caveat note. Don't allow users to shoot themselves
in the foot
2018-03-25 21:48:51 -07:00
Dalton Hubble
e43cf9f608 Organize and cleanup variable descriptions 2018-03-25 21:44:43 -07:00
Dalton Hubble
455a4af27e Improve cluster definition examples in docs 2018-03-25 20:41:52 -07:00
Dalton Hubble
39876e455f Fix docs to reflect enforced provider versions 2018-03-25 11:34:39 -07:00
Dalton Hubble
da2be86e8c Add v1.9.6 heading to CHANGES.md 2018-03-22 22:01:29 -07:00
Dalton Hubble
65a2751f77 addons: Update heapster from v1.5.1 to v1.5.2
* https://github.com/kubernetes/heapster/releases/tag/v1.5.2
2018-03-21 20:32:01 -07:00
Dalton Hubble
a04ef3919a Update Kubernetes from v1.9.5 to v1.9.6 2018-03-21 20:29:52 -07:00
Dalton Hubble
851bc1a3f8 Update nginx-ingress from 0.11.0 to 0.12.0 2018-03-19 23:17:17 -07:00
Dalton Hubble
758c09fa5c Update Kubernetes from v1.9.4 to v1.9.5 2018-03-19 00:25:44 -07:00
Dalton Hubble
b1cdd361ef Mention controllers node label in changelog 2018-03-19 00:15:56 -07:00
Dalton Hubble
7f7bc960a6 Set default Google Cloud os_image to coreos-stable 2018-03-19 00:08:26 -07:00
Dalton Hubble
29108fd99d Improve changelog with migration links 2018-03-18 23:54:55 -07:00
Dalton Hubble
18d08de898 Add Container Linux Config snippet docs 2018-03-18 23:22:40 -07:00
Dalton Hubble
f3730b2bfa Add Container Linux Config snippets feature
* Introduce the ability to support Container Linux Config
"snippets" for controllers and workers on cloud platforms.
This allows end-users to customize hosts by providing Container
Linux configs that are additively merged into the base configs
defined by Typhoon. Config snippets are validated, merged, and
show any errors during `terraform plan`
* Example uses include adding systemd units, network configs,
mounts, files, raid arrays, or other disk provisioning features
provided by Container Linux Configs (using Ignition low-level)
* Requires terraform-provider-ct v0.2.1 plugin
2018-03-18 18:28:18 -07:00
Dalton Hubble
88aa9a46e5 Add /var/lib/calico volume mount to Calico DaemonSet 2018-03-18 16:40:38 -07:00
Dalton Hubble
efa90d8b44 Add a new key=value label to controller nodes
* Add a node-role.kubernetes.io/controller="true" node label
to controllers so Prometheus service discovery can filter to
services that only run on controllers (i.e. masters)
* Leave node-role.kubernetes.io/master="" untouched as its
a Kubernetes convention
2018-03-18 16:39:10 -07:00
Dalton Hubble
46226a8015 Update Prometheus from 2.2.0 to 2.2.1 2018-03-18 15:56:44 -07:00
Dalton Hubble
270d1ce357 Add links to upstream regressions 2018-03-14 18:56:20 -07:00
Dalton Hubble
ab87b6cea3 Add clarifying links to CHANGES 2018-03-12 21:19:15 -07:00
Dalton Hubble
d621512dd6 Promote AWS platform from beta to stable 2018-03-12 21:15:53 -07:00