Commit Graph

14 Commits

Author SHA1 Message Date
Dalton Hubble
e9c8520359 Add experimental Cilium CNI provider
* Accept experimental CNI `networking` mode "cilium"
* Run Cilium v1.8.0-rc4 with overlay vxlan tunnels and a
minimal set of features. We're interested in:
  * IPAM: Divide pod_cidr into /24 subnets per node
  * CNI networking pod-to-pod, pod-to-external
  * BPF masquerade
  * NetworkPolicy as defined by Kubernetes (no L7 Policy)
* Continue using kube-proxy with Cilium probe mode
* Firewall changes:
  * Require UDP 8472 for vxlan (Linux kernel default) between nodes
  * Optional ICMP echo(8) between nodes for host reachability
    (health)
  * Optional TCP 4240 between nodes for endpoint reachability (health)

Known Issues:

* Containers with `hostPort` don't listen on all host addresses,
these workloads must use `hostNetwork` for now
https://github.com/cilium/cilium/issues/12116
* Erroneous warning on Fedora CoreOS
https://github.com/cilium/cilium/issues/10256

Note: This is experimental. It is not listed in docs and may be
changed or removed without a deprecation notice

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/192
* https://github.com/cilium/cilium/issues/12217
2020-06-21 20:41:53 -07:00
Dalton Hubble
43e05b9131 Enable kube-proxy metrics and allow Prometheus scrapes
* Configure kube-proxy --metrics-bind-address=0.0.0.0 (default
127.0.0.1) to serve metrics on 0.0.0.0:10249
* Add firewall rules to allow Prometheus (resides on a worker) to
scrape kube-proxy service endpoints on controllers or workers
* Add a clusterIP: None service for kube-proxy endpoint discovery
2020-01-06 21:11:18 -08:00
Dalton Hubble
c933bdfc26 Migrate Container Linux AWS to static pod control plane
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
2019-09-09 22:37:31 -07:00
Dalton Hubble
2ba0181dbe Migrate AWS module Terraform v0.11 to v0.12
* Replace v0.11 bracket type hints with Terraform v0.12 list expressions
* Use expression syntax instead of interpolated strings, where suggested
* Update AWS tutorial and worker pools documentation
* Define Terraform and plugin version requirements in versions.tf
  * Require aws ~> 2.7 to support Terraform v0.12
  * Require ct ~> 0.3.2 to support Terraform v0.12
2019-06-06 09:45:59 -07:00
Dalton Hubble
af18296bc5 Change flannel port from 8472 to 4789
* Change flannel port from the kernel default 8472 to the
IANA assigned VXLAN port 4789
* Update firewall rules or security groups for VXLAN
* Why now? Calico now offers its own VXLAN backend so
standardizing on the IANA port will simplify config
* https://github.com/coreos/flannel/blob/master/Documentation/backends.md#vxlan
2019-05-06 21:58:10 -07:00
Dalton Hubble
99a6d5478b Disable Kubelet read-only port 10255
* We can finally disable the Kubelet read-only port 10255!
* Journey: https://github.com/poseidon/typhoon/issues/322#issuecomment-431073073
2018-10-18 21:14:14 -07:00
Dalton Hubble
bbf2c13eef Remove AWS security rule allowing ICMP packets to nodes
* Deny ICMP packets for consistency across Typhoon clusters on
various clouds and because there isn't much need to allow them
2018-08-21 21:16:16 -07:00
Dalton Hubble
bceec9fdf5 Sort firewall / security rules and add comments
* No functional changes to network firewalls
2018-08-21 20:53:16 -07:00
Dalton Hubble
6c5a1964aa Change kube-apiserver port from 443 to 6443
* Adjust firewall rules, security groups, cloud load balancers,
and generated kubeconfig's
* Facilitates some future simplifications and cost reductions
* Bare-Metal users who exposed kube-apiserver on a WAN via their
router or load balancer will need to adjust its configuration.
This is uncommon, most apiserver are on LAN and/or behind VPN
so no routing infrastructure is configured with the port number
2018-06-19 23:48:51 -07:00
Dalton Hubble
983489bb52 Re-run terraform fmt for formatting 2018-05-14 23:38:16 -07:00
Dalton Hubble
c2b719dc75 Configure Prometheus to scrape Kubelets directly
* Use Kubelet bearer token authn/authz to scrape metrics
* Drop RBAC permission from nodes/proxy to nodes/metrics
* Stop proxying kubelet scrapes through the apiserver, since
this required higher privilege (nodes/proxy) and can add
load to the apiserver on large clusters
2018-05-14 23:06:50 -07:00
Dalton Hubble
f4b2396718 Return Prometheus deployment to be a worker workload
* Expose etcd metrics to workers so Prometheus can
run on a worker, rather than a controller
* Drop temporary firewall rules allowing Prometheus
to run on a controller and scrape targes
* Related to https://github.com/poseidon/typhoon/pull/175
2018-04-08 12:20:00 -07:00
Dalton Hubble
d770393dbc Add etcd metrics, Prometheus scrapes, and Grafana dash
* Use etcd v3.3 --listen-metrics-urls to expose only metrics
data via http://0.0.0.0:2381 on controllers
* Add Prometheus discovery for etcd peers on controller nodes
* Temporarily drop two noisy Prometheus alerts
2018-04-03 20:31:00 -07:00
Dalton Hubble
73126eb7f8 Add support for worker pools on AWS
* Allow groups of workers to be defined and joined to
a cluster (i.e. worker pools)
* Move worker resources into a Terraform submodule
* Output variables needed for passing to worker pools
* Add usage docs for AWS worker pools (advanced)
2018-02-27 18:31:42 -08:00