Commit Graph

301 Commits

Author SHA1 Message Date
Dalton Hubble 2019177b6b Fix implicit map assignments to be explicit
* Terraform v0.12 will require map assignments be explicit,
part of v0.12 readiness
2019-03-12 01:19:54 -07:00
Dalton Hubble fe96da27d7 Add support for terraform-provider-aws v2.0+
* Allow terraform-provider-aws >= v1.13, but < 3.0. No change
to the minimum version, but allow using v2.x.y releases
* Verify compatability with terraform-provider-aws v2.1.0
2019-03-09 12:06:44 -08:00
Dalton Hubble deec512c14 Resolve in-addr.arpa and ip6.arpa zones with CoreDNS kubernetes plugin
* Resolve in-addr.arpa and ip6.arpa DNS PTR requests for Kubernetes
service IPs and pod IPs
* Previously, CoreDNS was configured to resolve in-addr.arpa PTR
records for service IPs (but not pod IPs)
2019-03-04 23:03:00 -08:00
Dalton Hubble f598307998 Update Kubernetes from v1.13.3 to v1.13.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1134
2019-02-28 22:47:43 -08:00
Dalton Hubble 73ae5d5649 Update Calico from v3.5.1 to v3.5.2
* https://docs.projectcalico.org/v3.5/releases/
2019-02-25 21:23:13 -08:00
Dalton Hubble 42d7222f3d Add a readiness probe to CoreDNS
* https://github.com/poseidon/terraform-render-bootkube/pull/115
2019-02-23 13:25:23 -08:00
Dalton Hubble 4294bd0292 Assign Pod Priority classes to critical cluster and node components
* Assign pod priorityClassNames to critical cluster and node
components (higher is higher priority) to inform node out-of-resource
eviction order and scheduler preemption and scheduling order
* Priority Admission Controller has been enabled since Typhoon
v1.11.1
2019-02-19 22:21:39 -08:00
Dalton Hubble 584088397c Update etcd from v3.3.11 to v3.3.12
* https://github.com/etcd-io/etcd/releases/tag/v3.3.12
2019-02-09 11:54:54 -08:00
Dalton Hubble 0200058e0e Update Calico from v3.5.0 to v3.5.1
* Fix in confd https://github.com/projectcalico/confd/pull/205
2019-02-09 11:49:31 -08:00
Dalton Hubble ccd96c37da Update Kubernetes from v1.13.2 to v1.13.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1133
2019-02-01 23:26:13 -08:00
Dalton Hubble 244a1a601a Switch CoreDNS to use the forward plugin instead of proxy
* Use the forward plugin to forward to upstream resolvers, instead
of the proxy plugin. The forward plugin is reported to be a faster
alternative since it can re-use open sockets
* https://coredns.io/explugins/forward/
* https://coredns.io/plugins/proxy/
* https://github.com/kubernetes/kubernetes/issues/73254
2019-01-30 22:25:23 -08:00
Dalton Hubble 1ab06f69d7 Update flannel from v0.10.0 to v0.11.0
* https://github.com/coreos/flannel/releases/tag/v0.11.0
2019-01-29 21:51:25 -08:00
Dalton Hubble e9659a8539 Update Calico from v3.4.0 to v3.5.0
* https://docs.projectcalico.org/v3.5/releases/
2019-01-27 16:34:30 -08:00
Dalton Hubble f4d3508578 Update CoreDNS from v1.3.0 to v1.3.1
* https://coredns.io/2019/01/13/coredns-1.3.1-release/
2019-01-15 22:50:25 -08:00
Dalton Hubble 7eafa59d8f Fix instance shutdown automatic worker deletion on clouds
* Fix a regression caused by lowering the Kubelet TLS client
certificate to system:nodes group (#100) since dropping
cluster-admin dropped the Kubelet's ability to delete nodes.
* On clouds where workers can scale down (manual terraform apply,
AWS spot termination, Azure low priority deletion), worker shutdown
runs the delete-node.service to remove a node to prevent NotReady
nodes from accumulating
* Allow Kubelets to delete cluster nodes via system:nodes group. Kubelets
acting with system:node and kubelet-delete ClusterRoles is still an
improvement over acting as cluster-admin
2019-01-14 23:27:48 -08:00
Dalton Hubble 679079b242 Add AWS ingress_zone_id output with NLB DNS name's Route53 zone id
* DNS zones served by AWS Route53 may use AWS's special alias records
(other DNS providers would use a CNAME) to resolve the ingress NLB.
Alias records require the NLB DNS name's DNS zone id (not the cluster
`dns_zone_id`)
2019-01-13 16:45:52 -08:00
Dalton Hubble b74cc8afd2 Update etcd from v3.3.10 to v3.3.11
* https://github.com/etcd-io/etcd/releases/tag/v3.3.11
2019-01-12 14:17:25 -08:00
Dalton Hubble 1d66ad33f7 Change AWS worker modules' default type from t2.small to t3.small
* Worker instance types weren't updated in #365
2019-01-12 00:07:48 -08:00
Dalton Hubble 4d32b79c6f Update Kubernetes from v1.13.1 to v1.13.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1132
2019-01-12 00:00:53 -08:00
Dalton Hubble df4c0ba05d Use HTTPS liveness probes for kube-scheduler and kube-controller-manager
* Disable kube-scheduler and kube-controller-manager HTTP ports
2019-01-09 20:56:50 -08:00
Dalton Hubble bfe0c74793 Enable the certificates.k8s.io API to issue cluster certificates
* System components that require certificates signed by the cluster
CA can submit a CSR to the apiserver, have an administrator inspect
and approve it, and be issued a certificate
* Configure kube-controller-manager to sign Approved CSR's using the
cluster CA private key
* Admins are responsible for approving or denying CSRs, otherwise,
no certificate is issued. Read the Kubernetes docs carefully and
verify the entity making the request and the authorization level
* https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster
2019-01-06 17:33:37 -08:00
Dalton Hubble 60c70797ec Use a single format of the admin kubeconfig
* Use a single admin kubeconfig for initial bootkube bootstrap
and for use by a human admin. Previously, an admin kubeconfig
without a named context was used for bootstrap and direct usage
with KUBECONFIG=path, while one with a named context was used
for `kubectl config use-context` style usage. Confusing.
* Provide the admin kubeconfig via `assets/auth/kubeconfig`,
`assets/auth/CLUSTER-config`, or output `kubeconfig-admin`
2019-01-05 14:57:18 -08:00
Dalton Hubble 6795a753ea Update CoreDNS from v1.2.6 to v1.3.0
* https://coredns.io/2018/12/15/coredns-1.3.0-release/
2019-01-05 13:35:03 -08:00
Dalton Hubble b57273b6f1 Rename internal kube_dns_service_ip to cluster_dns_service_ip
* terraform-render-bootkube module deprecated kube_dns_service_ip
output in favor of cluster_dns_service_ip
* Rename k8s_dns_service_ip to cluster_dns_service_ip for
consistency too
2019-01-05 13:32:03 -08:00
Dalton Hubble 812a1adb49 Use a lower-privilege Kubelet kubeconfig in system:nodes
* Kubelets can use a lower-privilege TLS client certificate with
Org system:nodes and a binding to the system:node ClusterRole
* Admin kubeconfig's continue to belong to Org system:masters to
provide cluster-admin (available in assets/auth/kubeconfig or as
a Terraform output kubeconfig-admin)
* Remove bare-metal output variable kubeconfig
2019-01-05 13:08:56 -08:00
Dalton Hubble 66e1365cc4 Add ServiceAccounts for kube-apiserver and kube-scheduler
* Add ServiceAccounts and ClusterRoleBindings for kube-apiserver
and kube-scheduler
* Remove the ClusterRoleBinding for the kube-system default ServiceAccount
* Rename the CA certificate CommonName for consistency with upstream
2019-01-01 20:16:14 -08:00
Dalton Hubble f2f4deb8bb Change AWS default type from t2.small to t3.small
* T3 is the next generation general purpose burstable
instance type. Compared with t2.small, the t3.small is
cheaper, has 2 vCPU (instead of 1) and provides 5 Gbps
of pod-to-pod bandwidth (instead of 1 Gbps)
2018-12-18 12:38:35 -08:00
Dalton Hubble bcb200186d Add admin kubeconfig as a Terraform output
* May be used to write a local file
2018-12-15 22:52:28 -08:00
Dalton Hubble 479d498024 Update Calico from v3.3.2 to v3.4.0
* https://docs.projectcalico.org/v3.4/releases/
2018-12-15 18:05:16 -08:00
Dalton Hubble 018c5edc25 Update Kubernetes from v1.13.0 to v1.13.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1131
2018-12-15 11:44:57 -08:00
Dalton Hubble ff6ab571f3 Update Calico from v3.3.1 to v3.3.2
* https://docs.projectcalico.org/v3.3/releases/
2018-12-06 22:56:55 -08:00
Dalton Hubble d31f444fcd Update Kubernetes from v1.12.3 to v1.13.0 2018-12-03 20:44:32 -08:00
Dalton Hubble 76d993cdae Add experimental kube-router CNI provider
* Add kube-router for pod networking and NetworkPolicy
as an experiment
* Experiments are not documented or supported in any way,
and may be removed without notice. They have known issues
and aren't enabled without special options.
2018-12-03 19:52:28 -08:00
Dalton Hubble 64b4c10418 Improve features and modules list docs
* Remove bullet about isolating workloads on workers, its
now common practice and new users will assume it
* List advanced features available in each module
* Fix erroneous Kubernetes version listing for Google Cloud
Fedora Atomic
2018-11-26 22:58:00 -08:00
Dalton Hubble 5b27d8d889 Update Kubernetes from v1.12.2 to v1.12.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.12.md/#v1123
2018-11-26 21:06:09 -08:00
Dalton Hubble 840b73f9ba Update pod-checkpointer image to query Kubelet secure API
* Updates pod-checkpointer to prefer the Kubelet secure
API (before falling back to the Kubelet read-only API that
is disabled on Typhoon clusters since
https://github.com/poseidon/typhoon/pull/324)
* Previously, pod-checkpointer checkpointed an initial set
of pods during bootstrapping so recovery from power cycling
clusters was unaffected, but logs were noisy
* https://github.com/kubernetes-incubator/bootkube/pull/1027
* https://github.com/kubernetes-incubator/bootkube/pull/1025
2018-11-26 20:24:32 -08:00
Dalton Hubble 915af3c6cc Fix Calico Felix reporting usage data, require opt-in
* Calico Felix has been reporting anonymous usage data about the
version and cluster size, which violates Typhoon's privacy policy
where analytics should be opt-in only
* Add a variable enable_reporting (default: false) to allow opting
in to reporting usage data to Calico (or future components)
2018-11-20 01:03:00 -08:00
Dalton Hubble ea3fc6d2a7 Update CoreDNS from v1.2.4 to v1.2.6
* https://coredns.io/2018/11/05/coredns-1.2.6-release/
2018-11-18 16:45:53 -08:00
Dalton Hubble 56e9a82984 Add flannel resource request and mount only /run/flannel 2018-11-11 20:35:21 -08:00
Dalton Hubble e95b856a22 Enable CoreDNS loop and loadbalance plugins
* loop sends an initial query to detect infinite forwarding
loops in configured upstream DNS servers and fast exit with
an error (its a fatal misconfiguration on the network that
will otherwise cause resolvers to consume memory/CPU until
crashing, masking the problem)
* https://github.com/coredns/coredns/tree/master/plugin/loop
* loadbalance randomizes the ordering of A, AAAA, and MX records
in responses to provide round-robin load balancing (as usual,
clients may still cache responses though)
* https://github.com/coredns/coredns/tree/master/plugin/loadbalance
2018-11-10 17:36:56 -08:00
Dalton Hubble 2b3f61d1bb Update Calico from v3.3.0 to v3.3.1
* Structure Calico and flannel manifests
* Rename kube-flannel mentions to just flannel
2018-11-10 13:37:12 -08:00
Dalton Hubble 8fd2978c31 Update bootkube image version from v0.13.0 to v0.14.0
* https://github.com/kubernetes-incubator/bootkube/releases/tag/v0.14.0
2018-11-06 23:35:11 -08:00
Dalton Hubble 721c847943 Set kube-apiserver kubelet preferred address types
* Prefer InternalIP and ExternalIP over the node's hostname,
to match upstream behavior and kubeadm
* Previously, hostname-override was used to set node names
to internal IP's to work around some cloud providers not
resolving hostnames for instances (e.g. DO droplets)
2018-11-03 22:31:55 -07:00
Dalton Hubble 0e71f7e565 Ignore controller user_data changes to allow plugin updates
* Updating the `terraform-provider-ct` plugin is known to produce
a `user_data` diff in all pre-existing clusters. Applying the
diff to pre-existing cluster destroys controller nodes
* Ignore changes to controller `user_data`. Once all managed
clusters use a release containing this change, it is possible
to update the `terraform-provider-ct` plugin (worker `user_data`
will still be modified)
* Changing the module `ref` for an existing cluster and
re-applying is still NOT supported (although this PR
would protect controllers from being destroyed)
2018-10-28 16:48:12 -07:00
Dalton Hubble 9b405a19b2 Fix minor naming inconsistencies in Ignition and CLC data 2018-10-27 16:24:59 -07:00
Tamer Fahmy bfa1a679eb Name AWS and DigitalOcean Ignition data sources consistently 2018-10-27 16:14:44 -07:00
Dalton Hubble f1da0731d8 Update Kubernetes from v1.12.1 to v1.12.2
* Update CoreDNS from v1.2.2 to v1.2.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.12.md#v1122
* https://coredns.io/2018/10/17/coredns-1.2.4-release/
* https://coredns.io/2018/10/16/coredns-1.2.3-release/
2018-10-27 15:47:57 -07:00
Dalton Hubble d641a058fe Update Calico from v3.2.3 to v3.3.0
* https://docs.projectcalico.org/v3.3/releases/
2018-10-23 20:30:30 -07:00
Dalton Hubble 99a6d5478b Disable Kubelet read-only port 10255
* We can finally disable the Kubelet read-only port 10255!
* Journey: https://github.com/poseidon/typhoon/issues/322#issuecomment-431073073
2018-10-18 21:14:14 -07:00
Dalton Hubble d55bfd5589 Fix CoreDNS AntiAffinity spec to prefer spreading replicas
* Pods were still being scheduled at random due to a typo
2018-10-17 22:19:57 -07:00
Robert Fairburn 0be4673e44 Add disk_iops variable for AWS
* Setting disk_iops is required for disk_type io1
* https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html#EBSVolumeTypes
2018-10-17 22:18:54 -07:00
Dalton Hubble 9b6113a058 Update Kubernetes from v1.11.3 to v1.12.1
* Mount an empty dir for the controller-manager to work around
https://github.com/kubernetes/kubernetes/issues/68973
* Update coreos/pod-checkpointer to strip affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* https://github.com/kubernetes-incubator/bootkube/issues/1001
* https://github.com/kubernetes/kubernetes/pull/68173
2018-10-16 20:28:13 -07:00
Dalton Hubble 5eb4078d68 Add docker/default seccomp to control plane and addons
* Annotate pods, deployments, and daemonsets to start containers
with the Docker runtime's default seccomp profile
* Overrides Kubernetes default behavior which started containers
with seccomp=unconfined
* https://docs.docker.com/engine/security/seccomp/#pass-a-profile-for-a-container
2018-10-16 20:07:29 -07:00
Dalton Hubble 55bb4dfba6 Raise CoreDNS replica count to 2 or more
* Run at least two replicas of CoreDNS to better support
rolling updates (previously, kube-dns had a pod nanny)
* On multi-master clusters, set the CoreDNS replica count
to match the number of masters (e.g. a 3-master cluster
previously used replicas:1, now replicas:3)
* Add AntiAffinity preferred rule to favor distributing
CoreDNS pods across controller nodes nodes
2018-10-13 20:31:29 -07:00
Dalton Hubble 43fe78a2cc Raise scheduler/controller-manager replicas in multi-master
* Continue to ensure scheduler and controller-manager run
at least two replicas to support performing kubectl edits
on single-master clusters (no change)
* For multi-master clusters, set scheduler / controller-manager
replica count to the number of masters (e.g. a 3-master cluster
previously used replicas:2, now replicas:3)
2018-10-13 16:16:29 -07:00
Dalton Hubble 5a283b6443 Update etcd from v3.3.9 to v3.3.10
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.3.md#v3310-2018-10-10
2018-10-13 13:14:37 -07:00
Dalton Hubble 7653e511be Update CoreDNS and Calico versions
* Update CoreDNS from 1.1.3 to 1.2.2
* Update Calico from v3.2.1 to v3.2.3
2018-10-02 16:07:48 +02:00
Dalton Hubble ad871dbfa9 Update Kubernetes from v1.11.2 to v1.11.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1113
2018-09-13 18:50:41 -07:00
Dalton Hubble 7eb09237f4 Update Calico from v3.1.3 to v3.2.1
* Add new bird and felix readiness checks
* Read MTU from ConfigMap veth_mtu
* Add RBAC read for serviceaccounts
* Remove invalid description from CRDs
2018-08-25 17:53:11 -07:00
Dalton Hubble ea365b551a Fix docs mentions of ELBs to NLBs
* Typhoon AWS clusters use an NLB rather than an ELB,
since v1.10.5
* Add a few missing links in CHANGES
2018-08-21 21:40:06 -07:00
Dalton Hubble bbf2c13eef Remove AWS security rule allowing ICMP packets to nodes
* Deny ICMP packets for consistency across Typhoon clusters on
various clouds and because there isn't much need to allow them
2018-08-21 21:16:16 -07:00
Dalton Hubble bceec9fdf5 Sort firewall / security rules and add comments
* No functional changes to network firewalls
2018-08-21 20:53:16 -07:00
Dalton Hubble f7ebdf475d Update Kubernetes from v1.11.1 to v1.11.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1112
2018-08-07 21:57:25 -07:00
Dalton Hubble edc250d62a Fix Kublet version for Fedora Atomic modules
* Release v1.11.1 erroneously left Fedora Atomic clusters using
the v1.11.0 Kubelet. The rest of the control plane ran v1.11.1
as expected
* Update Kubelet from v1.11.0 to v1.11.1 so Fedora Atomic matches
Container Linux
* Container Linux modules were not affected
2018-07-29 12:13:29 -07:00
Dalton Hubble db64ce3312 Update etcd from v3.3.8 to v3.3.9
* https://github.com/coreos/etcd/blob/master/CHANGELOG-3.3.md#v339-2018-07-24
2018-07-29 11:27:37 -07:00
Dalton Hubble 7c327b8bf4 Update from bootkube v0.12.0 to v0.13.0 2018-07-29 11:20:17 -07:00
Dalton Hubble d8d524d10b Update Kubernetes from v1.11.0 to v1.11.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1111
2018-07-20 00:41:27 -07:00
Dalton Hubble 6f958d7577 Replace kube-dns with CoreDNS
* Add system:coredns ClusterRole and binding
* Annotate CoreDNS for Prometheus metrics scraping
* Remove kube-dns deployment, service, & service account
* https://github.com/poseidon/terraform-render-bootkube/pull/71
* https://kubernetes.io/blog/2018/06/27/kubernetes-1.11-release-announcement/
2018-07-01 22:55:01 -07:00
Dalton Hubble 93de7506ef Update Fedora Atomic from 27 to 28 on AWS 2018-06-30 18:55:18 -07:00
Dalton Hubble def445a344 Update Fedora Atomic kubelet from v1.10.5 to v1.11.0 2018-06-30 16:45:42 -07:00
Dalton Hubble 8464b258d8 Update Kubernetes from v1.10.5 to v1.11.0
* Force apiserver to stop listening on 127.0.0.1:8080
* Remove deprecated Kubelet `--allow-privileged`. Defaults to
true. Use `PodSecurityPolicy` if limiting is desired
* https://github.com/kubernetes/kubernetes/releases/tag/v1.11.0
* https://github.com/poseidon/terraform-render-bootkube/pull/68
2018-06-27 22:47:35 -07:00
Dalton Hubble 855aec5af3 Clarify AWS module output names and changes 2018-06-23 15:29:13 -07:00
Dalton Hubble 0227014fa0 Fix terraform formatting 2018-06-22 00:28:36 -07:00
Dalton Hubble fb6f40051f Disable AWS detailed monitoring on worker nodes
* Basic monitoring (free) is sufficient for casual console browsing
* Detailed monitoring (paid) is not leveraged for CloudWatch anyway
* Favor Prometheus for cloud-agnostic metrics, aggregation, and alerting
2018-06-22 00:26:06 -07:00
Dalton Hubble 316f06df06 Combine NLBs to use one NLB per cluster
* Simplify clusters to come with a single NLB
* Listen for apiserver traffic on port 6443 and forward
to controllers (with healthy apiserver)
* Listen for ingress traffic on ports 80/443 and forward
to workers (with healthy ingress controller)
* Reduce cost of default clusters by 1 NLB ($18.14/month)
* Keep using CNAME records to the `ingress_dns_name` NLB and
the nginx-ingress addon for Ingress (up to a few million RPS)
* Users with heavy traffic (many million RPS) can create their
own separate NLB(s) for Ingress and use the new output worker
target groups
* Fix issue where additional worker pools come with an
extraneous network load balancer
2018-06-21 23:46:57 -07:00
Dalton Hubble f4d3059b00 Update Kubernetes from v1.10.4 to v1.10.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1105
2018-06-21 22:51:39 -07:00
Dalton Hubble 6c5a1964aa Change kube-apiserver port from 443 to 6443
* Adjust firewall rules, security groups, cloud load balancers,
and generated kubeconfig's
* Facilitates some future simplifications and cost reductions
* Bare-Metal users who exposed kube-apiserver on a WAN via their
router or load balancer will need to adjust its configuration.
This is uncommon, most apiserver are on LAN and/or behind VPN
so no routing infrastructure is configured with the port number
2018-06-19 23:48:51 -07:00
Dalton Hubble 6e64634748 Update etcd from v3.3.7 to v3.3.8
* https://github.com/coreos/etcd/releases/tag/v3.3.8
2018-06-19 21:56:21 -07:00
Dalton Hubble 51906bf398 Update etcd from v3.3.6 to v3.3.7 2018-06-14 22:46:16 -07:00
Dalton Hubble 79260c48f6 Update Kubernetes from v1.10.3 to v1.10.4 2018-06-06 23:23:11 -07:00
Dalton Hubble 589c3569b7 Update etcd from v3.3.5 to v3.3.6
* https://github.com/coreos/etcd/releases/tag/v3.3.6
2018-06-06 23:19:30 -07:00
Dalton Hubble 6e968cd152 Update Calico from v3.1.2 to v3.1.3
* https://github.com/projectcalico/calico/releases/tag/v3.1.3
* https://github.com/projectcalico/cni-plugin/releases/tag/v3.1.3
2018-05-30 21:32:12 -07:00
Ben Drucker 6a581ab577 Render etcd_initial_cluster using a template_file 2018-05-30 21:14:49 -07:00
Dalton Hubble 4ea1fde9c5 Update Kubernetes from v1.10.2 to v1.10.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1103
* Update Calico from v3.1.1 to v3.1.2
2018-05-21 21:38:43 -07:00
William Zhang 2ae126bf68 Fix README link to tutorial 2018-05-19 13:10:22 -07:00
Dalton Hubble 983489bb52 Re-run terraform fmt for formatting 2018-05-14 23:38:16 -07:00
Dalton Hubble c2b719dc75 Configure Prometheus to scrape Kubelets directly
* Use Kubelet bearer token authn/authz to scrape metrics
* Drop RBAC permission from nodes/proxy to nodes/metrics
* Stop proxying kubelet scrapes through the apiserver, since
this required higher privilege (nodes/proxy) and can add
load to the apiserver on large clusters
2018-05-14 23:06:50 -07:00
Dalton Hubble 37981f9fb1 Allow bearer token authn/authz to the Kubelet
* Require Webhook authorization to the Kubelet
* Switch apiserver X509 client cert org to systems:masters
to grant the apiserver admin and satisfy the authorization
requirement. kubectl commands like logs or exec that have
the apiserver make requests of a kubelet continue to work
as before
* https://kubernetes.io/docs/admin/kubelet-authentication-authorization/
* https://github.com/poseidon/typhoon/issues/215
2018-05-13 23:20:42 -07:00
Dalton Hubble 5eb11f5104 Allow Flatcar Linux os_image on AWS, rename os_channel
* Replace os_channel variable with os_image to align naming
across clouds. Users who set this option to stable, beta, or
alpha should now set os_image to coreos-stable, coreos-beta,
or coreos-alpha.
* Default os_image to coreos-stable. This continues to use
the most recent image from the stable channel as always.
* Allow Container Linux derivative Flatcar Linux by setting
os_image to `flatcar-stable`, `flatcar-beta`, `flatcar-alpha`
2018-05-12 11:41:58 -07:00
Dalton Hubble f2ee75ac98 Require Terraform v0.11.x, drop v0.10.x support
* Raise minimum Terraform version to v0.11.0
* Terraform v0.11.x has been supported since Typhoon v1.9.2
and Terraform v0.10.x was last released in Nov 2017. I'd like
to stop worrying about v0.10.x and remove migration docs as
a later followup
* Migration docs docs/topics/maintenance.md#terraform-v011x
2018-05-10 02:20:46 -07:00
Dalton Hubble 8b8e364915 Update etcd from v3.3.4 to v3.3.5
* https://github.com/coreos/etcd/releases/tag/v3.3.5
2018-05-10 02:12:53 -07:00
Michael Holt a5916da0e2 Update min AWS provider from v1.11 to v1.13 2018-05-02 15:16:03 -07:00
Dalton Hubble cc29530ba0 Allow preemptible workers on AWS via spot instances
* Add `worker_price` to allow worker spot instances. Defaults
to empty string for the worker autoscaling group to use regular
on-demand instances.
* Add `spot_price` to internal `workers` module for spot worker
pools
* Note: Unlike GCP `preemptible` workers, spot instances require
you to pick a bid price.
2018-04-29 13:31:17 -07:00
Dalton Hubble e889430926 Update kube-dns from v1.14.9 to v1.14.10
* https://github.com/kubernetes/kubernetes/pull/62676
2018-04-28 00:43:09 -07:00
Dalton Hubble 32ddfa94e1 Update Kubernetes from v1.10.1 to v1.10.2
* https://github.com/kubernetes/kubernetes/releases/tag/v1.10.2
2018-04-28 00:27:00 -07:00
Dalton Hubble 681450aa0d Update etcd from v3.3.3 to v3.3.4
* https://github.com/coreos/etcd/releases/tag/v3.3.4
2018-04-27 23:57:26 -07:00
Dalton Hubble 567e18f015 Fix conflict between Calico and NetworkManager
* Observed frequent kube-scheduler and controller-manager
restarts with Calico as the CNI provider. Root cause was
unclear since control plane was functional and tests of
pod to pod network connectivity passed
* Root cause: Calico sets up cali* and tunl* network interfaces
for containers on hosts. NetworkManager tries to manage these
interfaces. It periodically disconnected veth pairs. Logs did
not surface this issue since its not an error per-se, just Calico
and NetworkManager dueling for control. Kubernetes correctly
restarted pods failing health checks and ensured 2 replicas were
running so the control plane functioned mostly normally. Pod to
pod connecitivity was only affected occassionally. Pain to debug.
* Solution: Configure NetworkManager to ignore the Calico ifaces
per Calico's recommendation. Cloud-init writes files after
NetworkManager starts, so a restart is required on first boot. On
subsequent boots, the file is present so no restart is needed
2018-04-25 21:45:58 -07:00
Dalton Hubble 0a7fab56e2 Load ip_vs kernel module on boot as workaround
* (containerized) kube-proxy warns that it is unable to
load the ip_vs kernel module despite having the correct
mounts. Atomic uses an xz compressed module and modprobe
in the container was not compiled with compression support
* Workaround issue for now by always loading ip_vs on-host
* https://github.com/kubernetes/kubernetes/issues/60
2018-04-25 21:45:58 -07:00
Dalton Hubble d784b0fca6 Switch to quay.io/poseidon tagged system containers 2018-04-25 18:15:18 -07:00
Dalton Hubble 7198b9016c Update Calico from v3.0.4 to v3.1.1 for Atomic 2018-04-21 18:46:56 -07:00
Dalton Hubble 233ec6dcb0 Update Fedora Atomic AMI to version 27.122
* http://www.projectatomic.io/blog/2018/04/fedora-atomic-20-apr-18/
* Atomic publishes nightly AMIs which sometimes don't boot
or have issues. Until there is a source of reliable AMIs,
pin the best known working AMI
* Rel 66a66f0d18544591ffdbf8fae9df790113c93d72
2018-04-21 18:46:56 -07:00
Dalton Hubble 9b88d4bbfd Use bootkube system container on fedora-atomic
* Use the upstream bootkube image packaged with the
required metadata to be usable as a system container
under systemd
* Run bootkube with runc so no host level components
use Docker any more. Docker is still the runtime
* Remove bootkube script and old systemd unit
2018-04-21 18:46:56 -07:00
Dalton Hubble 3dde4ba8ba Mount host's /etc/os-release in kubelet system containers
* Fix `kubectl describe node` to reflect the host's operating
system
2018-04-21 18:46:56 -07:00
Dalton Hubble e148552220 Enable kubelet allocatable enforcement and QoS cgroup hierarchy
* Change kubelet system image to use --cgroups-per-qos=true
(default) instead of false
* Change kubelet system image to use --enforce-node-allocatable=pods
instead of an empty string
2018-04-21 18:46:56 -07:00
Dalton Hubble d8d1468f03 Update kubelet system container image to mount /etc/hosts
* Fix kubelet port-forward on Google Cloud / Fedora Atomic
* Mount the host's /etc/hosts in kubelet system containers
* Problem: kubelet runc system containers on Atomic were not
mounting the host's /etc/hosts, like rkt-fly does on Container
Linux. `kubectl port-forward` calls socat with localhost. DNS
servers on AWS, DO, and in many bare-metal environments resolve
localhost to the caller as a convenience. Google Cloud notably
does not nor is it required to do so and this surfaced the
missing /etc/hosts in runc kubelet namespaces.
2018-04-21 18:46:56 -07:00
Dalton Hubble 24d230505a Add cloud-metadata.service on AWS fedora-atomic 2018-04-21 18:46:56 -07:00
Dalton Hubble b3cf9508b6 Update Fedora Atomic modules to Kubernetes v1.10.1 2018-04-21 18:46:56 -07:00
Dalton Hubble 5212684472 Temporarily pin Fedora Atomic AMI
* Atomic has published AMI images that shutdown
immediately after being powered on
2018-04-21 18:46:56 -07:00
Dalton Hubble f990473cde Update control plane manifests and add etcd metrics
* Enable etcd v3.3 metrics to expose metrics for
scraping by Prometheus
* Use k8s.gcr.io instead of gcr.io/google_containers
* Add flexvolume plugin mount to controller manager
* Update kube-dns from v1.14.8 to v1.14.9
2018-04-21 18:46:56 -07:00
Dalton Hubble 8523a086e2 Fix kubelet system container to mount CNI plugins
* Mount /opt/cni/bin in kubelet system container so
CNI plugin binaries can be found. Before, flannel
worked because the kubelet falls back to flannel
plugin baked into the hyperkube (undesired)
* Move the CNI bin install location later, since /opt
changes may be lost between ostree rebases
2018-04-21 18:46:56 -07:00
Dalton Hubble 19bc5aea9e Use kubelet system container on fedora-atomic
* Use the upstream hyperkube image packaged with the
required metadata to be usable as a system container
under systemd
* Fix port-forward since socat is included
2018-04-21 18:46:56 -07:00
Dalton Hubble 8d7cfc1a45 Use etcd system container on fedora-atomic
* Use the upstream etcd image packaged with the required
metadata to be usable as a system container (runc) under
systemd
2018-04-21 18:46:56 -07:00
Dalton Hubble 9969c357da Change AWS Fedora module to fedora-atomic 2018-04-21 18:46:56 -07:00
Dalton Hubble b80a2eb8a0 Sync fedora-cloud modules with Container Linux
* Update manifests for Kubernetes v1.10.0
* Update etcd from v3.3.2 to v3.3.3
* Add disk_type optional variable on AWS
* Remove redundant kubeconfig copy on AWS
* Distribute etcd secres only to controllers
* Organize module variables and ssh steps
2018-04-21 18:46:56 -07:00
Dalton Hubble 3610da8b71 Add fedora-cloud module for AWS 2018-04-21 18:46:56 -07:00
Dalton Hubble a54f76db2a Update Calico from v3.0.4 to v3.1.1
* https://github.com/projectcalico/calico/releases/tag/v3.1.1
* https://github.com/projectcalico/calico/releases/tag/v3.1.0
2018-04-21 18:30:36 -07:00
Dalton Hubble 23a8156bdf Fix a few typos in comments 2018-04-15 17:21:49 -07:00
Dalton Hubble 77c0a4cf2e Update Kubernetes from v1.10.0 to v1.10.1
* Use kubernetes-incubator/bootkube v0.12.0
2018-04-12 20:57:31 -07:00
Dalton Hubble 9bb3de5327 Skip creating unused dirs on worker nodes 2018-04-11 22:23:51 -07:00
Dalton Hubble 6b08bde479 Use k8s.gcr.io instead of gcr.io/google_containers
* Kubernetes recommends using the alias to fetch images
from the nearest GCR regional mirror, to abstract the use
of GCR, and to drop names containing 'google'
* https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ
2018-04-08 12:57:52 -07:00
Dalton Hubble f4b2396718 Return Prometheus deployment to be a worker workload
* Expose etcd metrics to workers so Prometheus can
run on a worker, rather than a controller
* Drop temporary firewall rules allowing Prometheus
to run on a controller and scrape targes
* Related to https://github.com/poseidon/typhoon/pull/175
2018-04-08 12:20:00 -07:00
Dalton Hubble 18dbaf74ce Update kube-dns from v1.14.8 to v1.14.9
* https://github.com/kubernetes/kubernetes/pull/61908
2018-04-04 21:00:23 -07:00
Dalton Hubble ce001e9d56 Update etcd from v3.3.2 to v3.3.3
* https://github.com/coreos/etcd/releases/tag/v3.3.3
2018-04-04 20:32:24 -07:00
Dalton Hubble d770393dbc Add etcd metrics, Prometheus scrapes, and Grafana dash
* Use etcd v3.3 --listen-metrics-urls to expose only metrics
data via http://0.0.0.0:2381 on controllers
* Add Prometheus discovery for etcd peers on controller nodes
* Temporarily drop two noisy Prometheus alerts
2018-04-03 20:31:00 -07:00
Dalton Hubble 1cc043d1eb Update Kubernetes from v1.9.6 to v1.10.0 2018-03-30 22:14:07 -07:00
Dalton Hubble f8e9bfb1c0 Add disk_type variable for EBS volume type on AWS
* Change EBS volume type from `standard` ("prior generation)
 to `gp2`. Prometheus alerts are tuned for SSDs
* Other platforms have fast enough disks by default
2018-03-29 22:51:54 -07:00
Dalton Hubble 7acd4931f6 Remove redundant kubeconfig copy on AWS and GCP
* AWS and Google Cloud make use of auto-scaling groups
and managed instance groups, respectively. As such, the
kubeconfig is already held in cloud user-data
* Controller instances are provisioned with a kubeconfig
from user-data. Its redundant to use a Terraform remote
file copy step for the kubeconfig.
2018-03-26 00:01:47 -07:00
Dalton Hubble e43cf9f608 Organize and cleanup variable descriptions 2018-03-25 21:44:43 -07:00
Dalton Hubble a04ef3919a Update Kubernetes from v1.9.5 to v1.9.6 2018-03-21 20:29:52 -07:00
Dalton Hubble 758c09fa5c Update Kubernetes from v1.9.4 to v1.9.5 2018-03-19 00:25:44 -07:00
Dalton Hubble f3730b2bfa Add Container Linux Config snippets feature
* Introduce the ability to support Container Linux Config
"snippets" for controllers and workers on cloud platforms.
This allows end-users to customize hosts by providing Container
Linux configs that are additively merged into the base configs
defined by Typhoon. Config snippets are validated, merged, and
show any errors during `terraform plan`
* Example uses include adding systemd units, network configs,
mounts, files, raid arrays, or other disk provisioning features
provided by Container Linux Configs (using Ignition low-level)
* Requires terraform-provider-ct v0.2.1 plugin
2018-03-18 18:28:18 -07:00
Dalton Hubble 88aa9a46e5 Add /var/lib/calico volume mount to Calico DaemonSet 2018-03-18 16:40:38 -07:00
Dalton Hubble efa90d8b44 Add a new key=value label to controller nodes
* Add a node-role.kubernetes.io/controller="true" node label
to controllers so Prometheus service discovery can filter to
services that only run on controllers (i.e. masters)
* Leave node-role.kubernetes.io/master="" untouched as its
a Kubernetes convention
2018-03-18 16:39:10 -07:00
Dalton Hubble 21f2cef12f Improve changelog, README, and index page 2018-03-12 20:58:02 -07:00
Dalton Hubble 931e311786 Update Kubernetes from v1.9.3 to v1.9.4 2018-03-12 18:07:50 -07:00
Dalton Hubble 8e7e6b9f7f Normalize Terraform configs with terraform fmt 2018-03-11 14:46:05 -07:00
Dalton Hubble 35f3b1b28c Enable AWS NLB cross-zone load balancing
* https://github.com/terraform-providers/terraform-provider-aws/pull/3537
* https://aws.amazon.com/about-aws/whats-new/2018/02/network-load-balancer-now-supports-cross-zone-load-balancing/
2018-03-10 23:25:18 -08:00
Dalton Hubble 9fb1e1a0e2 Update etcd from v3.3.1 to v3.3.2
* https://github.com/coreos/etcd/releases/tag/v3.3.2
2018-03-10 13:44:35 -08:00
Dalton Hubble b61d6373c5 Add ignore_changes for AWS worker image_id 2018-03-10 13:16:05 -08:00
Dalton Hubble c112ee3829 Rename cluster_name to name in internal module
* Ensure consistency between AWS and GCP platforms
2018-03-03 17:52:01 -08:00
Dalton Hubble da6aafe816 Revert "Add module version requirements to internal workers modules"
* This reverts commit cce4537487.
* Provider passing to child modules is complex and the behavior
changed between Terraform v0.10 and v0.11. We're continuing to
allow both versions so this change should be reverted. For the
time being, those using our internal Terraform modules will have
to be aware of the minimum version for AWS and GCP providers,
there is no good way to do enforcement.
2018-03-03 16:56:34 -08:00
Dalton Hubble cce4537487 Add module version requirements to internal workers modules 2018-03-03 14:39:25 -08:00
Dalton Hubble 73126eb7f8 Add support for worker pools on AWS
* Allow groups of workers to be defined and joined to
a cluster (i.e. worker pools)
* Move worker resources into a Terraform submodule
* Output variables needed for passing to worker pools
* Add usage docs for AWS worker pools (advanced)
2018-02-27 18:31:42 -08:00
Dalton Hubble 98985e5acd Remove unused etcd_service_ip template variable
* etcd_service_ip dates back to deprecated self-hosted etcd
2018-02-26 22:20:20 -08:00
Dalton Hubble 486fdb6968 Simplify CLC kubeconfig templating on AWS and GCP
* Template terraform-render-bootkube's multi-line kubeconfig
output using the right indentation
* Add `kubeconfig` variable to google-cloud controllers and
workers Terraform submodules
* Remove `kubeconfig_*` variables from google-cloud controllers
and workers Terraform submodules
2018-02-26 12:49:01 -08:00
Dalton Hubble a44cf0edbd Update Calico from v3.0.2 to v3.0.3
* https://github.com/projectcalico/calico/releases/tag/v3.0.3
2018-02-26 12:48:19 -08:00
Dalton Hubble 13f3745093 Add kubelet --volume-plugin-dir flag
* Set Kubelet search path for flexvolume plugins
to /var/lib/kubelet/volumeplugins
* Add support for flexvolume plugins on AWS, GCE, and DO
* See 9548572d98 which added flexvolume support for bare-metal
2018-02-22 22:11:45 -08:00
Dalton Hubble c4914c326b Update bootkube and terraform-render-bootkube to v0.11.0 2018-02-22 21:53:26 -08:00
Paul Saunders ceb5555222 Switch apiserver from ELB to a network load balancer 2018-02-22 16:10:31 -08:00
Dalton Hubble 22fa051002 Switch Ingress ELB to a network load balancer
* Require terraform-provider-aws 1.7 or higher
2018-02-20 17:34:38 -08:00