Commit Graph

367 Commits

Author SHA1 Message Date
Dalton Hubble
5b27d8d889 Update Kubernetes from v1.12.2 to v1.12.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.12.md/#v1123
2018-11-26 21:06:09 -08:00
Dalton Hubble
840b73f9ba Update pod-checkpointer image to query Kubelet secure API
* Updates pod-checkpointer to prefer the Kubelet secure
API (before falling back to the Kubelet read-only API that
is disabled on Typhoon clusters since
https://github.com/poseidon/typhoon/pull/324)
* Previously, pod-checkpointer checkpointed an initial set
of pods during bootstrapping so recovery from power cycling
clusters was unaffected, but logs were noisy
* https://github.com/kubernetes-incubator/bootkube/pull/1027
* https://github.com/kubernetes-incubator/bootkube/pull/1025
2018-11-26 20:24:32 -08:00
Dalton Hubble
915af3c6cc Fix Calico Felix reporting usage data, require opt-in
* Calico Felix has been reporting anonymous usage data about the
version and cluster size, which violates Typhoon's privacy policy
where analytics should be opt-in only
* Add a variable enable_reporting (default: false) to allow opting
in to reporting usage data to Calico (or future components)
2018-11-20 01:03:00 -08:00
Dalton Hubble
ea3fc6d2a7 Update CoreDNS from v1.2.4 to v1.2.6
* https://coredns.io/2018/11/05/coredns-1.2.6-release/
2018-11-18 16:45:53 -08:00
Dalton Hubble
56e9a82984 Add flannel resource request and mount only /run/flannel 2018-11-11 20:35:21 -08:00
Dalton Hubble
e95b856a22 Enable CoreDNS loop and loadbalance plugins
* loop sends an initial query to detect infinite forwarding
loops in configured upstream DNS servers and fast exit with
an error (its a fatal misconfiguration on the network that
will otherwise cause resolvers to consume memory/CPU until
crashing, masking the problem)
* https://github.com/coredns/coredns/tree/master/plugin/loop
* loadbalance randomizes the ordering of A, AAAA, and MX records
in responses to provide round-robin load balancing (as usual,
clients may still cache responses though)
* https://github.com/coredns/coredns/tree/master/plugin/loadbalance
2018-11-10 17:36:56 -08:00
Dalton Hubble
2b3f61d1bb Update Calico from v3.3.0 to v3.3.1
* Structure Calico and flannel manifests
* Rename kube-flannel mentions to just flannel
2018-11-10 13:37:12 -08:00
Dalton Hubble
8fd2978c31 Update bootkube image version from v0.13.0 to v0.14.0
* https://github.com/kubernetes-incubator/bootkube/releases/tag/v0.14.0
2018-11-06 23:35:11 -08:00
Dalton Hubble
721c847943 Set kube-apiserver kubelet preferred address types
* Prefer InternalIP and ExternalIP over the node's hostname,
to match upstream behavior and kubeadm
* Previously, hostname-override was used to set node names
to internal IP's to work around some cloud providers not
resolving hostnames for instances (e.g. DO droplets)
2018-11-03 22:31:55 -07:00
Dalton Hubble
0e71f7e565 Ignore controller user_data changes to allow plugin updates
* Updating the `terraform-provider-ct` plugin is known to produce
a `user_data` diff in all pre-existing clusters. Applying the
diff to pre-existing cluster destroys controller nodes
* Ignore changes to controller `user_data`. Once all managed
clusters use a release containing this change, it is possible
to update the `terraform-provider-ct` plugin (worker `user_data`
will still be modified)
* Changing the module `ref` for an existing cluster and
re-applying is still NOT supported (although this PR
would protect controllers from being destroyed)
2018-10-28 16:48:12 -07:00
Dalton Hubble
9b405a19b2 Fix minor naming inconsistencies in Ignition and CLC data 2018-10-27 16:24:59 -07:00
Tamer Fahmy
bfa1a679eb Name AWS and DigitalOcean Ignition data sources consistently 2018-10-27 16:14:44 -07:00
Dalton Hubble
f1da0731d8 Update Kubernetes from v1.12.1 to v1.12.2
* Update CoreDNS from v1.2.2 to v1.2.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.12.md#v1122
* https://coredns.io/2018/10/17/coredns-1.2.4-release/
* https://coredns.io/2018/10/16/coredns-1.2.3-release/
2018-10-27 15:47:57 -07:00
Dalton Hubble
d641a058fe Update Calico from v3.2.3 to v3.3.0
* https://docs.projectcalico.org/v3.3/releases/
2018-10-23 20:30:30 -07:00
Dalton Hubble
99a6d5478b Disable Kubelet read-only port 10255
* We can finally disable the Kubelet read-only port 10255!
* Journey: https://github.com/poseidon/typhoon/issues/322#issuecomment-431073073
2018-10-18 21:14:14 -07:00
Dalton Hubble
d55bfd5589 Fix CoreDNS AntiAffinity spec to prefer spreading replicas
* Pods were still being scheduled at random due to a typo
2018-10-17 22:19:57 -07:00
Robert Fairburn
0be4673e44 Add disk_iops variable for AWS
* Setting disk_iops is required for disk_type io1
* https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html#EBSVolumeTypes
2018-10-17 22:18:54 -07:00
Dalton Hubble
9b6113a058 Update Kubernetes from v1.11.3 to v1.12.1
* Mount an empty dir for the controller-manager to work around
https://github.com/kubernetes/kubernetes/issues/68973
* Update coreos/pod-checkpointer to strip affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* https://github.com/kubernetes-incubator/bootkube/issues/1001
* https://github.com/kubernetes/kubernetes/pull/68173
2018-10-16 20:28:13 -07:00
Dalton Hubble
5eb4078d68 Add docker/default seccomp to control plane and addons
* Annotate pods, deployments, and daemonsets to start containers
with the Docker runtime's default seccomp profile
* Overrides Kubernetes default behavior which started containers
with seccomp=unconfined
* https://docs.docker.com/engine/security/seccomp/#pass-a-profile-for-a-container
2018-10-16 20:07:29 -07:00
Dalton Hubble
55bb4dfba6 Raise CoreDNS replica count to 2 or more
* Run at least two replicas of CoreDNS to better support
rolling updates (previously, kube-dns had a pod nanny)
* On multi-master clusters, set the CoreDNS replica count
to match the number of masters (e.g. a 3-master cluster
previously used replicas:1, now replicas:3)
* Add AntiAffinity preferred rule to favor distributing
CoreDNS pods across controller nodes nodes
2018-10-13 20:31:29 -07:00
Dalton Hubble
43fe78a2cc Raise scheduler/controller-manager replicas in multi-master
* Continue to ensure scheduler and controller-manager run
at least two replicas to support performing kubectl edits
on single-master clusters (no change)
* For multi-master clusters, set scheduler / controller-manager
replica count to the number of masters (e.g. a 3-master cluster
previously used replicas:2, now replicas:3)
2018-10-13 16:16:29 -07:00
Dalton Hubble
5a283b6443 Update etcd from v3.3.9 to v3.3.10
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.3.md#v3310-2018-10-10
2018-10-13 13:14:37 -07:00
Dalton Hubble
7653e511be Update CoreDNS and Calico versions
* Update CoreDNS from 1.1.3 to 1.2.2
* Update Calico from v3.2.1 to v3.2.3
2018-10-02 16:07:48 +02:00
Dalton Hubble
ad871dbfa9 Update Kubernetes from v1.11.2 to v1.11.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1113
2018-09-13 18:50:41 -07:00
Dalton Hubble
7eb09237f4 Update Calico from v3.1.3 to v3.2.1
* Add new bird and felix readiness checks
* Read MTU from ConfigMap veth_mtu
* Add RBAC read for serviceaccounts
* Remove invalid description from CRDs
2018-08-25 17:53:11 -07:00
Dalton Hubble
ea365b551a Fix docs mentions of ELBs to NLBs
* Typhoon AWS clusters use an NLB rather than an ELB,
since v1.10.5
* Add a few missing links in CHANGES
2018-08-21 21:40:06 -07:00
Dalton Hubble
bbf2c13eef Remove AWS security rule allowing ICMP packets to nodes
* Deny ICMP packets for consistency across Typhoon clusters on
various clouds and because there isn't much need to allow them
2018-08-21 21:16:16 -07:00
Dalton Hubble
bceec9fdf5 Sort firewall / security rules and add comments
* No functional changes to network firewalls
2018-08-21 20:53:16 -07:00
Dalton Hubble
f7ebdf475d Update Kubernetes from v1.11.1 to v1.11.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1112
2018-08-07 21:57:25 -07:00
Dalton Hubble
edc250d62a Fix Kublet version for Fedora Atomic modules
* Release v1.11.1 erroneously left Fedora Atomic clusters using
the v1.11.0 Kubelet. The rest of the control plane ran v1.11.1
as expected
* Update Kubelet from v1.11.0 to v1.11.1 so Fedora Atomic matches
Container Linux
* Container Linux modules were not affected
2018-07-29 12:13:29 -07:00
Dalton Hubble
db64ce3312 Update etcd from v3.3.8 to v3.3.9
* https://github.com/coreos/etcd/blob/master/CHANGELOG-3.3.md#v339-2018-07-24
2018-07-29 11:27:37 -07:00
Dalton Hubble
7c327b8bf4 Update from bootkube v0.12.0 to v0.13.0 2018-07-29 11:20:17 -07:00
Dalton Hubble
d8d524d10b Update Kubernetes from v1.11.0 to v1.11.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1111
2018-07-20 00:41:27 -07:00
Dalton Hubble
6f958d7577 Replace kube-dns with CoreDNS
* Add system:coredns ClusterRole and binding
* Annotate CoreDNS for Prometheus metrics scraping
* Remove kube-dns deployment, service, & service account
* https://github.com/poseidon/terraform-render-bootkube/pull/71
* https://kubernetes.io/blog/2018/06/27/kubernetes-1.11-release-announcement/
2018-07-01 22:55:01 -07:00
Dalton Hubble
93de7506ef Update Fedora Atomic from 27 to 28 on AWS 2018-06-30 18:55:18 -07:00
Dalton Hubble
def445a344 Update Fedora Atomic kubelet from v1.10.5 to v1.11.0 2018-06-30 16:45:42 -07:00
Dalton Hubble
8464b258d8 Update Kubernetes from v1.10.5 to v1.11.0
* Force apiserver to stop listening on 127.0.0.1:8080
* Remove deprecated Kubelet `--allow-privileged`. Defaults to
true. Use `PodSecurityPolicy` if limiting is desired
* https://github.com/kubernetes/kubernetes/releases/tag/v1.11.0
* https://github.com/poseidon/terraform-render-bootkube/pull/68
2018-06-27 22:47:35 -07:00
Dalton Hubble
855aec5af3 Clarify AWS module output names and changes 2018-06-23 15:29:13 -07:00
Dalton Hubble
0227014fa0 Fix terraform formatting 2018-06-22 00:28:36 -07:00
Dalton Hubble
fb6f40051f Disable AWS detailed monitoring on worker nodes
* Basic monitoring (free) is sufficient for casual console browsing
* Detailed monitoring (paid) is not leveraged for CloudWatch anyway
* Favor Prometheus for cloud-agnostic metrics, aggregation, and alerting
2018-06-22 00:26:06 -07:00
Dalton Hubble
316f06df06 Combine NLBs to use one NLB per cluster
* Simplify clusters to come with a single NLB
* Listen for apiserver traffic on port 6443 and forward
to controllers (with healthy apiserver)
* Listen for ingress traffic on ports 80/443 and forward
to workers (with healthy ingress controller)
* Reduce cost of default clusters by 1 NLB ($18.14/month)
* Keep using CNAME records to the `ingress_dns_name` NLB and
the nginx-ingress addon for Ingress (up to a few million RPS)
* Users with heavy traffic (many million RPS) can create their
own separate NLB(s) for Ingress and use the new output worker
target groups
* Fix issue where additional worker pools come with an
extraneous network load balancer
2018-06-21 23:46:57 -07:00
Dalton Hubble
f4d3059b00 Update Kubernetes from v1.10.4 to v1.10.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1105
2018-06-21 22:51:39 -07:00
Dalton Hubble
6c5a1964aa Change kube-apiserver port from 443 to 6443
* Adjust firewall rules, security groups, cloud load balancers,
and generated kubeconfig's
* Facilitates some future simplifications and cost reductions
* Bare-Metal users who exposed kube-apiserver on a WAN via their
router or load balancer will need to adjust its configuration.
This is uncommon, most apiserver are on LAN and/or behind VPN
so no routing infrastructure is configured with the port number
2018-06-19 23:48:51 -07:00
Dalton Hubble
6e64634748 Update etcd from v3.3.7 to v3.3.8
* https://github.com/coreos/etcd/releases/tag/v3.3.8
2018-06-19 21:56:21 -07:00
Dalton Hubble
51906bf398 Update etcd from v3.3.6 to v3.3.7 2018-06-14 22:46:16 -07:00
Dalton Hubble
79260c48f6 Update Kubernetes from v1.10.3 to v1.10.4 2018-06-06 23:23:11 -07:00
Dalton Hubble
589c3569b7 Update etcd from v3.3.5 to v3.3.6
* https://github.com/coreos/etcd/releases/tag/v3.3.6
2018-06-06 23:19:30 -07:00
Dalton Hubble
6e968cd152 Update Calico from v3.1.2 to v3.1.3
* https://github.com/projectcalico/calico/releases/tag/v3.1.3
* https://github.com/projectcalico/cni-plugin/releases/tag/v3.1.3
2018-05-30 21:32:12 -07:00
Ben Drucker
6a581ab577 Render etcd_initial_cluster using a template_file 2018-05-30 21:14:49 -07:00
Dalton Hubble
4ea1fde9c5 Update Kubernetes from v1.10.2 to v1.10.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1103
* Update Calico from v3.1.1 to v3.1.2
2018-05-21 21:38:43 -07:00
William Zhang
2ae126bf68 Fix README link to tutorial 2018-05-19 13:10:22 -07:00
Dalton Hubble
983489bb52 Re-run terraform fmt for formatting 2018-05-14 23:38:16 -07:00
Dalton Hubble
c2b719dc75 Configure Prometheus to scrape Kubelets directly
* Use Kubelet bearer token authn/authz to scrape metrics
* Drop RBAC permission from nodes/proxy to nodes/metrics
* Stop proxying kubelet scrapes through the apiserver, since
this required higher privilege (nodes/proxy) and can add
load to the apiserver on large clusters
2018-05-14 23:06:50 -07:00
Dalton Hubble
37981f9fb1 Allow bearer token authn/authz to the Kubelet
* Require Webhook authorization to the Kubelet
* Switch apiserver X509 client cert org to systems:masters
to grant the apiserver admin and satisfy the authorization
requirement. kubectl commands like logs or exec that have
the apiserver make requests of a kubelet continue to work
as before
* https://kubernetes.io/docs/admin/kubelet-authentication-authorization/
* https://github.com/poseidon/typhoon/issues/215
2018-05-13 23:20:42 -07:00
Dalton Hubble
5eb11f5104 Allow Flatcar Linux os_image on AWS, rename os_channel
* Replace os_channel variable with os_image to align naming
across clouds. Users who set this option to stable, beta, or
alpha should now set os_image to coreos-stable, coreos-beta,
or coreos-alpha.
* Default os_image to coreos-stable. This continues to use
the most recent image from the stable channel as always.
* Allow Container Linux derivative Flatcar Linux by setting
os_image to `flatcar-stable`, `flatcar-beta`, `flatcar-alpha`
2018-05-12 11:41:58 -07:00
Dalton Hubble
f2ee75ac98 Require Terraform v0.11.x, drop v0.10.x support
* Raise minimum Terraform version to v0.11.0
* Terraform v0.11.x has been supported since Typhoon v1.9.2
and Terraform v0.10.x was last released in Nov 2017. I'd like
to stop worrying about v0.10.x and remove migration docs as
a later followup
* Migration docs docs/topics/maintenance.md#terraform-v011x
2018-05-10 02:20:46 -07:00
Dalton Hubble
8b8e364915 Update etcd from v3.3.4 to v3.3.5
* https://github.com/coreos/etcd/releases/tag/v3.3.5
2018-05-10 02:12:53 -07:00
Michael Holt
a5916da0e2 Update min AWS provider from v1.11 to v1.13 2018-05-02 15:16:03 -07:00
Dalton Hubble
cc29530ba0 Allow preemptible workers on AWS via spot instances
* Add `worker_price` to allow worker spot instances. Defaults
to empty string for the worker autoscaling group to use regular
on-demand instances.
* Add `spot_price` to internal `workers` module for spot worker
pools
* Note: Unlike GCP `preemptible` workers, spot instances require
you to pick a bid price.
2018-04-29 13:31:17 -07:00
Dalton Hubble
e889430926 Update kube-dns from v1.14.9 to v1.14.10
* https://github.com/kubernetes/kubernetes/pull/62676
2018-04-28 00:43:09 -07:00
Dalton Hubble
32ddfa94e1 Update Kubernetes from v1.10.1 to v1.10.2
* https://github.com/kubernetes/kubernetes/releases/tag/v1.10.2
2018-04-28 00:27:00 -07:00
Dalton Hubble
681450aa0d Update etcd from v3.3.3 to v3.3.4
* https://github.com/coreos/etcd/releases/tag/v3.3.4
2018-04-27 23:57:26 -07:00
Dalton Hubble
567e18f015 Fix conflict between Calico and NetworkManager
* Observed frequent kube-scheduler and controller-manager
restarts with Calico as the CNI provider. Root cause was
unclear since control plane was functional and tests of
pod to pod network connectivity passed
* Root cause: Calico sets up cali* and tunl* network interfaces
for containers on hosts. NetworkManager tries to manage these
interfaces. It periodically disconnected veth pairs. Logs did
not surface this issue since its not an error per-se, just Calico
and NetworkManager dueling for control. Kubernetes correctly
restarted pods failing health checks and ensured 2 replicas were
running so the control plane functioned mostly normally. Pod to
pod connecitivity was only affected occassionally. Pain to debug.
* Solution: Configure NetworkManager to ignore the Calico ifaces
per Calico's recommendation. Cloud-init writes files after
NetworkManager starts, so a restart is required on first boot. On
subsequent boots, the file is present so no restart is needed
2018-04-25 21:45:58 -07:00
Dalton Hubble
0a7fab56e2 Load ip_vs kernel module on boot as workaround
* (containerized) kube-proxy warns that it is unable to
load the ip_vs kernel module despite having the correct
mounts. Atomic uses an xz compressed module and modprobe
in the container was not compiled with compression support
* Workaround issue for now by always loading ip_vs on-host
* https://github.com/kubernetes/kubernetes/issues/60
2018-04-25 21:45:58 -07:00
Dalton Hubble
d784b0fca6 Switch to quay.io/poseidon tagged system containers 2018-04-25 18:15:18 -07:00
Dalton Hubble
7198b9016c Update Calico from v3.0.4 to v3.1.1 for Atomic 2018-04-21 18:46:56 -07:00
Dalton Hubble
233ec6dcb0 Update Fedora Atomic AMI to version 27.122
* http://www.projectatomic.io/blog/2018/04/fedora-atomic-20-apr-18/
* Atomic publishes nightly AMIs which sometimes don't boot
or have issues. Until there is a source of reliable AMIs,
pin the best known working AMI
* Rel 66a66f0d18544591ffdbf8fae9df790113c93d72
2018-04-21 18:46:56 -07:00
Dalton Hubble
9b88d4bbfd Use bootkube system container on fedora-atomic
* Use the upstream bootkube image packaged with the
required metadata to be usable as a system container
under systemd
* Run bootkube with runc so no host level components
use Docker any more. Docker is still the runtime
* Remove bootkube script and old systemd unit
2018-04-21 18:46:56 -07:00
Dalton Hubble
3dde4ba8ba Mount host's /etc/os-release in kubelet system containers
* Fix `kubectl describe node` to reflect the host's operating
system
2018-04-21 18:46:56 -07:00
Dalton Hubble
e148552220 Enable kubelet allocatable enforcement and QoS cgroup hierarchy
* Change kubelet system image to use --cgroups-per-qos=true
(default) instead of false
* Change kubelet system image to use --enforce-node-allocatable=pods
instead of an empty string
2018-04-21 18:46:56 -07:00
Dalton Hubble
d8d1468f03 Update kubelet system container image to mount /etc/hosts
* Fix kubelet port-forward on Google Cloud / Fedora Atomic
* Mount the host's /etc/hosts in kubelet system containers
* Problem: kubelet runc system containers on Atomic were not
mounting the host's /etc/hosts, like rkt-fly does on Container
Linux. `kubectl port-forward` calls socat with localhost. DNS
servers on AWS, DO, and in many bare-metal environments resolve
localhost to the caller as a convenience. Google Cloud notably
does not nor is it required to do so and this surfaced the
missing /etc/hosts in runc kubelet namespaces.
2018-04-21 18:46:56 -07:00
Dalton Hubble
24d230505a Add cloud-metadata.service on AWS fedora-atomic 2018-04-21 18:46:56 -07:00
Dalton Hubble
b3cf9508b6 Update Fedora Atomic modules to Kubernetes v1.10.1 2018-04-21 18:46:56 -07:00
Dalton Hubble
5212684472 Temporarily pin Fedora Atomic AMI
* Atomic has published AMI images that shutdown
immediately after being powered on
2018-04-21 18:46:56 -07:00
Dalton Hubble
f990473cde Update control plane manifests and add etcd metrics
* Enable etcd v3.3 metrics to expose metrics for
scraping by Prometheus
* Use k8s.gcr.io instead of gcr.io/google_containers
* Add flexvolume plugin mount to controller manager
* Update kube-dns from v1.14.8 to v1.14.9
2018-04-21 18:46:56 -07:00
Dalton Hubble
8523a086e2 Fix kubelet system container to mount CNI plugins
* Mount /opt/cni/bin in kubelet system container so
CNI plugin binaries can be found. Before, flannel
worked because the kubelet falls back to flannel
plugin baked into the hyperkube (undesired)
* Move the CNI bin install location later, since /opt
changes may be lost between ostree rebases
2018-04-21 18:46:56 -07:00
Dalton Hubble
19bc5aea9e Use kubelet system container on fedora-atomic
* Use the upstream hyperkube image packaged with the
required metadata to be usable as a system container
under systemd
* Fix port-forward since socat is included
2018-04-21 18:46:56 -07:00
Dalton Hubble
8d7cfc1a45 Use etcd system container on fedora-atomic
* Use the upstream etcd image packaged with the required
metadata to be usable as a system container (runc) under
systemd
2018-04-21 18:46:56 -07:00
Dalton Hubble
9969c357da Change AWS Fedora module to fedora-atomic 2018-04-21 18:46:56 -07:00
Dalton Hubble
b80a2eb8a0 Sync fedora-cloud modules with Container Linux
* Update manifests for Kubernetes v1.10.0
* Update etcd from v3.3.2 to v3.3.3
* Add disk_type optional variable on AWS
* Remove redundant kubeconfig copy on AWS
* Distribute etcd secres only to controllers
* Organize module variables and ssh steps
2018-04-21 18:46:56 -07:00
Dalton Hubble
3610da8b71 Add fedora-cloud module for AWS 2018-04-21 18:46:56 -07:00
Dalton Hubble
a54f76db2a Update Calico from v3.0.4 to v3.1.1
* https://github.com/projectcalico/calico/releases/tag/v3.1.1
* https://github.com/projectcalico/calico/releases/tag/v3.1.0
2018-04-21 18:30:36 -07:00
Dalton Hubble
23a8156bdf Fix a few typos in comments 2018-04-15 17:21:49 -07:00
Dalton Hubble
77c0a4cf2e Update Kubernetes from v1.10.0 to v1.10.1
* Use kubernetes-incubator/bootkube v0.12.0
2018-04-12 20:57:31 -07:00
Dalton Hubble
9bb3de5327 Skip creating unused dirs on worker nodes 2018-04-11 22:23:51 -07:00
Dalton Hubble
6b08bde479 Use k8s.gcr.io instead of gcr.io/google_containers
* Kubernetes recommends using the alias to fetch images
from the nearest GCR regional mirror, to abstract the use
of GCR, and to drop names containing 'google'
* https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ
2018-04-08 12:57:52 -07:00
Dalton Hubble
f4b2396718 Return Prometheus deployment to be a worker workload
* Expose etcd metrics to workers so Prometheus can
run on a worker, rather than a controller
* Drop temporary firewall rules allowing Prometheus
to run on a controller and scrape targes
* Related to https://github.com/poseidon/typhoon/pull/175
2018-04-08 12:20:00 -07:00
Dalton Hubble
18dbaf74ce Update kube-dns from v1.14.8 to v1.14.9
* https://github.com/kubernetes/kubernetes/pull/61908
2018-04-04 21:00:23 -07:00
Dalton Hubble
ce001e9d56 Update etcd from v3.3.2 to v3.3.3
* https://github.com/coreos/etcd/releases/tag/v3.3.3
2018-04-04 20:32:24 -07:00
Dalton Hubble
d770393dbc Add etcd metrics, Prometheus scrapes, and Grafana dash
* Use etcd v3.3 --listen-metrics-urls to expose only metrics
data via http://0.0.0.0:2381 on controllers
* Add Prometheus discovery for etcd peers on controller nodes
* Temporarily drop two noisy Prometheus alerts
2018-04-03 20:31:00 -07:00
Dalton Hubble
1cc043d1eb Update Kubernetes from v1.9.6 to v1.10.0 2018-03-30 22:14:07 -07:00
Dalton Hubble
f8e9bfb1c0 Add disk_type variable for EBS volume type on AWS
* Change EBS volume type from `standard` ("prior generation)
 to `gp2`. Prometheus alerts are tuned for SSDs
* Other platforms have fast enough disks by default
2018-03-29 22:51:54 -07:00
Dalton Hubble
7acd4931f6 Remove redundant kubeconfig copy on AWS and GCP
* AWS and Google Cloud make use of auto-scaling groups
and managed instance groups, respectively. As such, the
kubeconfig is already held in cloud user-data
* Controller instances are provisioned with a kubeconfig
from user-data. Its redundant to use a Terraform remote
file copy step for the kubeconfig.
2018-03-26 00:01:47 -07:00
Dalton Hubble
e43cf9f608 Organize and cleanup variable descriptions 2018-03-25 21:44:43 -07:00
Dalton Hubble
a04ef3919a Update Kubernetes from v1.9.5 to v1.9.6 2018-03-21 20:29:52 -07:00
Dalton Hubble
758c09fa5c Update Kubernetes from v1.9.4 to v1.9.5 2018-03-19 00:25:44 -07:00
Dalton Hubble
f3730b2bfa Add Container Linux Config snippets feature
* Introduce the ability to support Container Linux Config
"snippets" for controllers and workers on cloud platforms.
This allows end-users to customize hosts by providing Container
Linux configs that are additively merged into the base configs
defined by Typhoon. Config snippets are validated, merged, and
show any errors during `terraform plan`
* Example uses include adding systemd units, network configs,
mounts, files, raid arrays, or other disk provisioning features
provided by Container Linux Configs (using Ignition low-level)
* Requires terraform-provider-ct v0.2.1 plugin
2018-03-18 18:28:18 -07:00
Dalton Hubble
88aa9a46e5 Add /var/lib/calico volume mount to Calico DaemonSet 2018-03-18 16:40:38 -07:00
Dalton Hubble
efa90d8b44 Add a new key=value label to controller nodes
* Add a node-role.kubernetes.io/controller="true" node label
to controllers so Prometheus service discovery can filter to
services that only run on controllers (i.e. masters)
* Leave node-role.kubernetes.io/master="" untouched as its
a Kubernetes convention
2018-03-18 16:39:10 -07:00
Dalton Hubble
21f2cef12f Improve changelog, README, and index page 2018-03-12 20:58:02 -07:00
Dalton Hubble
931e311786 Update Kubernetes from v1.9.3 to v1.9.4 2018-03-12 18:07:50 -07:00
Dalton Hubble
8e7e6b9f7f Normalize Terraform configs with terraform fmt 2018-03-11 14:46:05 -07:00
Dalton Hubble
35f3b1b28c Enable AWS NLB cross-zone load balancing
* https://github.com/terraform-providers/terraform-provider-aws/pull/3537
* https://aws.amazon.com/about-aws/whats-new/2018/02/network-load-balancer-now-supports-cross-zone-load-balancing/
2018-03-10 23:25:18 -08:00
Dalton Hubble
9fb1e1a0e2 Update etcd from v3.3.1 to v3.3.2
* https://github.com/coreos/etcd/releases/tag/v3.3.2
2018-03-10 13:44:35 -08:00
Dalton Hubble
b61d6373c5 Add ignore_changes for AWS worker image_id 2018-03-10 13:16:05 -08:00
Dalton Hubble
c112ee3829 Rename cluster_name to name in internal module
* Ensure consistency between AWS and GCP platforms
2018-03-03 17:52:01 -08:00
Dalton Hubble
da6aafe816 Revert "Add module version requirements to internal workers modules"
* This reverts commit cce4537487.
* Provider passing to child modules is complex and the behavior
changed between Terraform v0.10 and v0.11. We're continuing to
allow both versions so this change should be reverted. For the
time being, those using our internal Terraform modules will have
to be aware of the minimum version for AWS and GCP providers,
there is no good way to do enforcement.
2018-03-03 16:56:34 -08:00
Dalton Hubble
cce4537487 Add module version requirements to internal workers modules 2018-03-03 14:39:25 -08:00
Dalton Hubble
73126eb7f8 Add support for worker pools on AWS
* Allow groups of workers to be defined and joined to
a cluster (i.e. worker pools)
* Move worker resources into a Terraform submodule
* Output variables needed for passing to worker pools
* Add usage docs for AWS worker pools (advanced)
2018-02-27 18:31:42 -08:00
Dalton Hubble
98985e5acd Remove unused etcd_service_ip template variable
* etcd_service_ip dates back to deprecated self-hosted etcd
2018-02-26 22:20:20 -08:00
Dalton Hubble
486fdb6968 Simplify CLC kubeconfig templating on AWS and GCP
* Template terraform-render-bootkube's multi-line kubeconfig
output using the right indentation
* Add `kubeconfig` variable to google-cloud controllers and
workers Terraform submodules
* Remove `kubeconfig_*` variables from google-cloud controllers
and workers Terraform submodules
2018-02-26 12:49:01 -08:00
Dalton Hubble
a44cf0edbd Update Calico from v3.0.2 to v3.0.3
* https://github.com/projectcalico/calico/releases/tag/v3.0.3
2018-02-26 12:48:19 -08:00
Dalton Hubble
13f3745093 Add kubelet --volume-plugin-dir flag
* Set Kubelet search path for flexvolume plugins
to /var/lib/kubelet/volumeplugins
* Add support for flexvolume plugins on AWS, GCE, and DO
* See 9548572d98 which added flexvolume support for bare-metal
2018-02-22 22:11:45 -08:00
Dalton Hubble
c4914c326b Update bootkube and terraform-render-bootkube to v0.11.0 2018-02-22 21:53:26 -08:00
Paul Saunders
ceb5555222 Switch apiserver from ELB to a network load balancer 2018-02-22 16:10:31 -08:00
Dalton Hubble
22fa051002 Switch Ingress ELB to a network load balancer
* Require terraform-provider-aws 1.7 or higher
2018-02-20 17:34:38 -08:00
Stephen Augustus
c8313751d7 Ignore lifecycle changes to the AWS controller ami 2018-02-15 19:48:39 -08:00
Dalton Hubble
195d902ab6 Upgrade etcd from v3.2.15 to v3.3.1 2018-02-15 19:29:46 -08:00
Dalton Hubble
c19a68b59b Update bootkube control-plane manifests
* Remove PersistentVolumeLabel admission controller flag
* Switch Deployments and DaemonSets to apps/v1
* Minor update to pod-checkpointer image version
2018-02-15 11:06:35 -08:00
Dalton Hubble
82a616c70b Fix terraform config formatting 2018-02-10 15:18:27 -08:00
Dalton Hubble
a41691b222 Update Kubernetes from v1.9.2 to v1.9.3
* Add flannel service account and limited RBAC cluster role
* Change DaemonSets to tolerate NoSchedule and NoExecute taints
* Remove deprecated apiserver --etcd-quorum-read flag
* Update Calico from v3.0.1 to v3.0.2
* Add Calico GlobalNetworkSet CRD
* https://github.com/poseidon/terraform-render-bootkube/pull/44
2018-02-10 13:37:07 -08:00
Dalton Hubble
2fa1840c30 Update flannel from v0.9.0 to v0.10.0
* https://github.com/coreos/flannel/releases/tag/v0.10.0
2018-01-28 23:09:21 -08:00
Dalton Hubble
8e0b8d7e40 Upgrade Calico from 2.6.6 to 3.0.1 2018-01-28 11:47:23 -08:00
Dalton Hubble
3e6e4ea339 Update etcd from 3.2.14 to 3.2.15
* https://github.com/coreos/etcd/releases/tag/v3.2.15
2018-01-23 23:50:04 -08:00
Dalton Hubble
868265988b Update bootkube and terraform-render-bootkube to v0.10.0 2018-01-19 23:10:45 -08:00
Dalton Hubble
6adffcb778 Update Kubernetes from v1.9.1 to v1.9.2 2018-01-19 08:40:09 -08:00
Dalton Hubble
d8db296932 Update kube-dns and use separate service account
* Update kube-dns from v1.14.7 to v1.14.8
* Use a separate kube-dns service account
* https://github.com/kubernetes/kubernetes/pull/57918
2018-01-12 10:29:30 -08:00
Dalton Hubble
388ac08492 Update etcd from 3.2.13 to 3.2.14
* https://github.com/coreos/etcd/releases/tag/v3.2.14
2018-01-12 07:20:55 -08:00
Dalton Hubble
fc455c8624 Remove old mention of ACIs in bootkube.service description 2018-01-06 16:20:34 -08:00
Dalton Hubble
51a5f64024 Enable portmap plugin alongside Calico to fix hostPort
* https://github.com/poseidon/terraform-render-bootkube/pull/36
2018-01-06 14:01:18 -08:00
Dalton Hubble
e1f2125f02 Update etcd from 3.2.0 to 3.2.13
* https://github.com/coreos/etcd/releases/tag/v3.2.13
2018-01-06 14:01:18 -08:00
Dalton Hubble
9329b775f6 Update Kubernetes from v1.8.6 to v1.9.1 2018-01-06 14:01:16 -08:00
Dalton Hubble
fbdd946601 Update Kubernetes from v1.8.5 to v1.8.6 2017-12-21 11:20:37 -08:00
Barak Michener
e79088baa0 Add optional cluster_domain_suffix variable
* Allow kube-dns to respond to DNS queries with a custom
suffix, instead of the default 'cluster.local'
* Useful when multiple clusters exist on the same local
network and wish to query services on one another
2017-12-15 01:45:52 -08:00
Dalton Hubble
495e33e213 Update bootkube and terraform-render-bootkube to v0.9.1 2017-12-15 01:45:02 -08:00
Dalton Hubble
63f5a26a72 Eliminate steps to move self-hosted etcd assets
* bootkube/assets/experimental/* assets corresponded to self-hosted
etcd manifests, which are no longer an option in Typhoon
2017-12-13 01:06:56 -08:00
Lars Fenneberg
eea79e895d Fix manifest consolidation in bootkube start wrapper
* Fix manifest existence test in /opt/bootkube/bootkube-start
to also work with more than one directory
2017-12-12 23:08:22 -08:00
Dalton Hubble
165396d6aa Update Kubernetes from v1.8.4 to v1.8.5 2017-12-09 21:28:31 -08:00
Vincent Palmer
ce49a93d5d Fix issue with etcd-member failing to resolve peers
* When restarting masters, `etcd-member.service` may fail to lookup peers if
/etc/resolv.conf hasn't been populated yet. Require the wait-for-dns.service.
2017-12-09 20:12:49 -08:00
Dalton Hubble
5f5eec1175 Update bootkube and terraform-render-bootkube to v0.9.0 2017-12-01 22:27:48 -08:00
Dalton Hubble
5308fde3d3 Add Kubernetes certification badge 2017-11-29 19:26:49 -08:00
Dalton Hubble
6483f613c5 Update Kubernetes from v1.8.3 to v1.8.4 2017-11-28 21:52:11 -08:00
Dalton Hubble
56c6bf431a Update terraform-render-bootkube for Kubernetes v1.8.4
* Update hyperkube from v1.8.3 to v1.8.4
* Remove flock from bootstrap-apiserver and kube-apiserver
* Remove unused critical-pod annotations in manifests
* Use service accounts for kube-proxy and pod-checkpointer
* Update Calico from v2.6.1 to v2.6.3
* Update flannel from v0.9.0 to v0.9.1
* Remove Calico termination grace period to prevent calico
from getting stuck for extended periods
* https://github.com/poseidon/terraform-render-bootkube/pull/29
2017-11-28 21:42:26 -08:00
Dalton Hubble
5f6b0728c5 Update bootkube and terraform-render-bootkube to v0.8.2 2017-11-10 20:01:37 -08:00
Dalton Hubble
d774c51297 Update Kubernetes from v1.8.2 to v1.8.3 2017-11-08 23:34:19 -08:00
Dalton Hubble
f6a8fb363e Remove deprecated kubelet --require-kubeconfig flag
* https://github.com/kubernetes/kubernetes/pull/40050
2017-11-08 23:34:19 -08:00
Dalton Hubble
ea1efb536a Remove old firewall rule for bootstrap self-hosted etcd 2017-11-08 00:15:20 -08:00
Dalton Hubble
ccc832f468 Add firewall rule to allow apiserver to proxy other controller kubelets
* Prometheus proxies through the apiserver to scrape kubelets
* In multi-controller setups, an apiserver must be able to scrape
kubelets (10250) on other controllers
2017-11-06 01:03:53 -08:00
Dalton Hubble
90f8d62204 Add firewall rules to allow prometheus to reach node-exporter
* node_exporter service endpoints run on hostNetwork port 9100
* Re-evaluate after https://github.com/kubernetes-incubator/bootkube/pull/711
2017-11-06 01:03:53 -08:00
Dalton Hubble
af5c413abf Focus controller ELB on load balancing apiservers
* ELB distributing load across controllers is no longer the mechanism
used to SSH to instances to distribute secrets
* Focus the ELB on load balancing across apiserver and edit the HTTP
health check to an SSL:443 check
2017-11-06 01:03:53 -08:00
Dalton Hubble
168c487484 Remove mention of self-hosted etcd, its deprecated 2017-11-06 01:03:53 -08:00
Dalton Hubble
805dd772a8 Run etcd cluster on-host, across controllers on AWS
* Change controllers ASG to heterogeneous EC2 instances
* Create DNS records for each controller's private IP for etcd
* Change etcd to run on-host, across controllers (etcd-member.service)
* Reduce time to bootstrap a cluster
* Deprecate self-hosted-etcd on the AWS platform
2017-11-06 01:03:53 -08:00
Dalton Hubble
878f5a3647 Bump bootkube and terraform-render-bootkube to v0.8.1
* Use the v0.8.1 tagged terraform-render-bootkube module
* Use the v0.8.1 quay.io/coreos/bootkube image to bootstrap
2017-10-28 12:50:37 -07:00
Dalton Hubble
34ec7e9862 Relax pessimistic constraints on 1.0+ providers
* Constrains ~> 1.0 means users can use 1.0.1, 1.1, but not 2.0
* https://www.terraform.io/docs/configuration/terraform.html
2017-10-25 23:27:28 -07:00
Dalton Hubble
f6c6e85f84 Require minimum Terraform and plugin versions
* Bump minimum Terraform version to v0.10.4
* Allow minor version updates for 1.0+ plugins
* Fix versions for plugins which are pre-1.0
2017-10-25 23:00:31 -07:00
Dalton Hubble
60bc8957c9 Update Kubernetes from v1.8.1 to v1.8.2
* Kubernetes v1.8.2 fixes a memory leak in the v1.8.1 apiserver
* Switch to using the `gcr.io/google_containers/hyperkube` for the
on-host kubelet and shutdown drains
* Update terraform-render-bootkube manifests generation
  * Update flannel from v0.8.0 to v0.9.0
  * Add `hairpinMode` to flannel CNI config
  * Add `--no-negcache` to kube-dns dnsmasq
2017-10-24 21:44:26 -07:00
Dalton Hubble
e4c479554c Update AWS, DO, BM Kubernetes from v1.7.7 to v1.8.1
* Update from bootkube v0.7.0 to v0.8.0
* Leave Google Cloud update to a followup commit
2017-10-19 21:10:04 -07:00
Dalton Hubble
43dc44623f Fix the terraform fmt of configs 2017-10-16 01:32:25 -07:00
Dalton Hubble
1bc25c1036 Update Kubernetes from v1.7.5 to v1.7.7
* Update from bootkube v0.6.2 to v0.7.0
* Use renamed terraform-render-bootkube. Renamed from
bootkube-terraform to meet Terraform Module requirements
2017-10-03 21:03:15 -07:00
Dalton Hubble
2d5a4ae1ef Update kube-dns image to address dnsmasq vulnerability
* https://security.googleblog.com/2017/10/behind-masq-yet-more-dns-and-dhcp.html
2017-10-02 10:27:10 -07:00
Dalton Hubble
dd883988bd Update from Calico v2.5.1 to v2.6.1
* Network policy improvements
* Update cni sidecar image from v1.10.0 to v1.11.0
* Lower log level in Calico CNI config from debug to info
2017-09-30 16:16:40 -07:00
Dalton Hubble
e0d8917573 Add LICENSE to top-level of each module 2017-09-28 20:41:19 -07:00
Dalton Hubble
b20233e05d aws: Add Ingress ELB DNS name output as ingress_dns_name
* Expose the Ingress ELB DNS name so application DNS records can
be defined in Terraform to resolve to the Ingress ELB
2017-09-28 00:46:17 -07:00
Dalton Hubble
77e387cf83 Add top-level README.md with module overview 2017-09-27 22:09:52 -07:00
Dalton Hubble
1b5caef4c1 Add Wants=rpc-statd.service to Kubelet
* Mounting NFS exports as volumes from some NFS servers fails because
the kubelet isn't starting rpc-statd as expected. Describing pods
that are stuck creating shows rpc.statd is required for remote locking
* Starting rpc-statd.service resolves the issue and all NFS mounts
seem to be working.
* Recommended approach https://github.com/coreos/bugs/issues/2074
2017-09-24 18:23:55 -07:00
Dalton Hubble
74d8b9dabe *: Update bootkube-terraform sha hash to corresponding named tag
* bootkube-terraform v0.6.2 dbfb11c6eafa08f839eac2834ca1aca35dafe965
2017-09-23 14:10:42 -07:00
Dalton Hubble
d8e4ac172a Add dghubble/pegasus AWS Kubernetes Terraform module 2017-09-17 21:40:33 -07:00