Commit Graph

1693 Commits

Author SHA1 Message Date
Dalton Hubble
808b8a948f
aws: Switch EC2 instances to use resource-based hostnames
* Use EC2 resource-based hostnames instead of IP-based hostnames. The Amazon
DNS server can resolve A and AAAA queries to IPv4 and IPv6 node addresses
* For example, nodes used to be named like `ip-10-11-12-13.us-east-1.compute.internal`
but going forward use the instance id `i-0123456789abcdef.us-east-1.compute.internal`
* Tag controller node EBS volumes with a name based on the controller node name
2024-08-22 20:02:53 -07:00
Dalton Hubble
effa13c141
Fix flannel-cni container image
* Close #1496
2024-08-22 19:26:19 -07:00
dghubble-renovate[bot]
b8645f3ec2 Bump mkdocs-material from 9.5.31 to v9.5.32 2024-08-22 10:36:50 -07:00
Dalton Hubble
10be34daa2
Update Kubernetes from v1.30.4 to v1.31.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1310
2024-08-17 08:32:35 -07:00
dghubble-renovate[bot]
1cb49e1267 Bump quay.io/cilium/cilium image from v1.16.0 to v1.16.1 2024-08-16 08:31:11 -07:00
dghubble-renovate[bot]
d79f94f4f5 Bump quay.io/cilium/operator-generic image from v1.16.0 to v1.16.1 2024-08-16 08:31:01 -07:00
Dalton Hubble
320d76c934
Update Kubernetes from v1.30.3 to v1.30.4
* Update Cilium from v1.16.0 to v1.16.1
2024-08-16 08:27:07 -07:00
Dalton Hubble
2daa23be50
Update default Cilium and CoreDNS components
* Update the CoreDNS and Cilium versons used by default when
folks aren't managing the components themselves
2024-08-05 08:47:06 -07:00
Dalton Hubble
6e2daded02
Remove some seldom used variables and set reasonable
* Set reasonable values and remove some variable clutter
* enable_reporting is only used with Calico and we can just default
to false, I doubt anyone uses Calico and cares much about reporting
metrics to upstream Calico
2024-08-02 20:45:37 -07:00
Dalton Hubble
83f1bd2373
Update ARM64 cluster and hybrid cluster docs
* Typhoon now supports arbitrary combinations of controller, worker,
and worker pool architectures so we can drop the specific details of
full-cluster vs hybrid cluster. Just pick the architecture for each
group of nodes accordingly.
* However, if a custom node taint is set, continue to configure the
cluster's daemonsets accordingly with `daemonset_tolerations`
2024-08-02 20:34:23 -07:00
dghubble-renovate[bot]
67e5ecf6f2 Bump mkdocs-material from 9.5.30 to v9.5.31 2024-08-02 16:46:36 -07:00
Dalton Hubble
0120b9f38d
Remove the cluster_domain_suffix variable
* Drop support for `cluster_domain_suffix` customization and
always use `cluster.local`. Many components in the Kubernetes
ecosystem assume this default suffix and its very rare to be
setting a special value here these days
* Cleanup a few variables that are seldom used
2024-08-02 15:05:25 -07:00
Dalton Hubble
af27661432 Configure controller and worker node architecture separately
* On platforms that support ARM64 instances, configure controller
and worker node host architectures separately
* For example, you can run arm64 controllers and amd64 workers
* Add `controller_arch` and `worker_arch` variables
* Remove `arch` variable
2024-08-02 15:04:57 -07:00
Dalton Hubble
516786d7bb
google: Configure controller and worker disk sizes
* Add `controller_disk_size` and `worker_disk_size` variables
* Remove `disk_size` variable
2024-08-02 13:07:41 -07:00
Dalton Hubble
1104b4bf28 AWS: Add CPU pricing mode and controller/worker disk variables
* Add `controller_disk_type`, `controller_disk_size`, and `controller_disk_iops`
variables
* Add `worker_disk_type`, `worker_disk_size`, and `worker_disk_iops` variables
and fix propagation to worker nodes
* Remove `disk_type`, `disk_size`, and `disk_iops` variables
* Add `controller_cpu_credits` and `worker_cpu_credits` variables to set CPU
pricing mode for burstable instance types
2024-07-31 15:02:28 -07:00
dghubble-renovate[bot]
39b5079bc3 Bump registry.k8s.io/coredns/coredns image from v1.11.1 to v1.11.3 2024-07-31 13:28:30 -07:00
dghubble-renovate[bot]
858d665d9b Bump quay.io/cilium/cilium image from v1.15.7 to v1.16.0 2024-07-28 11:59:07 -07:00
dghubble-renovate[bot]
8cea37cdd9 Bump quay.io/cilium/operator-generic image from v1.15.7 to v1.16.0 2024-07-28 11:58:58 -07:00
dghubble-renovate[bot]
4251ca937a Bump pymdown-extensions from 10.8.1 to v10.9 2024-07-28 11:58:50 -07:00
dghubble-renovate[bot]
329987187b Bump mkdocs-material from 9.5.29 to v9.5.30 2024-07-26 09:37:41 -07:00
Dalton Hubble
d046026511
Fix incorrect terraform-render-bootstrap SHA 2024-07-25 21:41:54 -07:00
Dalton Hubble
0669d44026
Update Kubernetes from v1.30.2 to v1.30.3
* Update builtin Cilium manifests from v1.15.6 to v1.15.7
* Update builtin flannel manifests from v0.25.4 to v0.25.5
2024-07-20 11:04:32 -07:00
Dalton Hubble
672bbad10b
Generate Azure Virtual Network IPv6 ULA space at random
* Private IPv6 address space should be assigned randomly within
an organization per https://datatracker.ietf.org/doc/html/rfc4193
2024-07-20 11:01:50 -07:00
dghubble-renovate[bot]
be0e516974 Bump mkdocs-material from 9.5.28 to v9.5.29 2024-07-20 10:44:04 -07:00
dghubble-renovate[bot]
6a61afcd3b Bump docker.io/flannel/flannel image from v0.25.4 to v0.25.5 2024-07-20 10:36:12 -07:00
dghubble-renovate[bot]
ca1f897b35 Bump quay.io/cilium/cilium image from v1.15.6 to v1.15.7 2024-07-14 13:42:35 -07:00
dghubble-renovate[bot]
d4514db00c Bump quay.io/cilium/operator-generic image from v1.15.6 to v1.15.7 2024-07-14 13:42:26 -07:00
Dalton Hubble
0d10d180f8
Change worker node pools from uniform to flexible orchestration mode
* Use flexible orchestration mode. Azure has started to recommend this
mode because it allows interacting with VMSS instances like regular VMs
via the CLI or via the Azure Portal
* Add options to allow workers nodes to use ephemeral local disks
  * Add `controller_disk_type` and `controller_disk_size` variables
  * Add `worker_disk_type`, `worker_disk_size`, and `worker_ephemeral_disk` variables
2024-07-14 11:58:15 -07:00
Dalton Hubble
a4fab61066
Remove an IPv4 address from Azure clusters
* Consolidate load balancer frontend IPs to just the minimal IPv4
and IPv6 addresses that are needed per load balancer. apiserver and
ingress use separate ports, so there is not a true need for a separate
public IPv4 address just for apiserver
* Some might prefer a separate IP just because it slightly hides the
apiserver, but these are public hosted endpoints that can be discovered
* Reduce the cost of an Azure cluster since IPv4 public IPs are billed
($3.60/mo/cluster)
2024-07-10 22:29:43 -07:00
Dalton Hubble
24b7f31c55
Rename Azure cluster region variable to location
* Rename the region variable to location to align with Azure
platform conventions, where resources are created within an
Azure location, which are themselves part of broader geographical
regions
2024-07-09 07:56:58 -07:00
Dalton Hubble
48d4973957
Add IPv6 support for Typhoon Azure clusters
* Define a dual-stack virtual network with both IPv4 and IPv6 private
address space. Change `host_cidr` variable (string) to a `network_cidr`
variable (object) with "ipv4" and "ipv6" fields that list CIDR strings.
* Define dual-stack controller and worker subnets. Disable Azure
default outbound access (a deprecated fallback mechanism)
* Enable dual-stack load balancing to Kubernetes Ingress by adding
a public IPv6 frontend IP and LB rule to the load balancer.
* Enable worker outbound IPv6 connectivity through load balancer
SNAT by adding an IPv6 frontend IP and outbound rule
* Configure controller nodes with a public IPv6 address to provide
direct outbound IPv6 connectivity
* Add an IPv6 worker backend pool. Azure requires separate IPv4 and
IPv6 backend pools, though the health probe can be shared
* Extend network security group rules for IPv6 source/destinations

Checklist:

Access to controller and worker nodes via IPv6 addresses:

  * SSH access to controller nodes via public IPv6 address
  * SSH access to worker nodes via (private) IPv6 address (via
    controller)

Outbound IPv6 connectivity from controller and worker nodes:

```
nc -6 -zv ipv6.google.com 80
Ncat: Version 7.94 ( https://nmap.org/ncat )
Ncat: Connected to [2607:f8b0:4001:c16::66]:80.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.
```

Serve Ingress traffic via IPv4 or IPv6 just requires setting
up A and AAAA records and running the ingress controller with
`hostNetwork: true` since, hostPort only forwards IPv4 traffic
2024-07-09 07:55:00 -07:00
dghubble-renovate[bot]
3483ed8bd5 Bump mkdocs-material from 9.5.27 to v9.5.28 2024-07-03 22:23:23 -07:00
Dalton Hubble
931d6d18de
Update Kubernetes from v1.30.1 to v1.30.2
* Update CoreDNS from v1.9.4 to v1.11.1
* Update Cilium from v1.15.5 to v1.15.6
* Update flannel from v0.25.1 to v0.25.4
2024-06-17 08:20:03 -07:00
dghubble-renovate[bot]
da99a01f43 Bump mkdocs-material from 9.5.26 to v9.5.27 2024-06-16 16:57:27 -07:00
dghubble-renovate[bot]
5090e60fe0 Bump quay.io/cilium/operator-generic image from v1.15.5 to v1.15.6 2024-06-15 08:01:29 -07:00
dghubble-renovate[bot]
158a681a8b Bump quay.io/cilium/cilium image from v1.15.5 to v1.15.6 2024-06-15 08:00:23 -07:00
dghubble-renovate[bot]
8fd2c95cec Bump docker.io/flannel/flannel image from v0.25.3 to v0.25.4 2024-06-15 07:55:44 -07:00
dghubble-renovate[bot]
9be5250a71 Bump mkdocs-material from 9.5.25 to v9.5.26 2024-06-09 15:58:05 -07:00
dghubble-renovate[bot]
d6e4f49cd9 Bump docker.io/flannel/flannel image from v0.25.2 to v0.25.3 2024-05-31 17:13:30 -07:00
dghubble-renovate[bot]
2d020a2ce3 Bump mkdocs-material from 9.5.24 to v9.5.25 2024-05-27 07:43:40 -07:00
dghubble-renovate[bot]
e942ae9f4a Bump docker.io/flannel/flannel image from v0.25.1 to v0.25.2 2024-05-26 12:45:03 -07:00
dghubble-renovate[bot]
fa8f3d81b4 Bump mkdocs-material from 9.5.23 to v9.5.24 2024-05-26 12:23:13 -07:00
Dalton Hubble
c48b04ea88 Update docs to mention components 2024-05-19 17:10:47 -07:00
Dalton Hubble
7b8a51070f Add Terraform modules for CoreDNS, Cilium, and flannel
* With the new component system, these components can be managed
independent from the cluster and rolled or edited in advanced
ways
2024-05-19 17:00:10 -07:00
Dalton Hubble
533ace7011 Update Cilium from v1.15.4 to v1.15.5
* https://github.com/cilium/cilium/releases/tag/v1.15.5
2024-05-19 16:38:08 -07:00
Dalton Hubble
b3c384fbc0 Introduce the component system for managing pre-installed addons
* Previously: Typhoon provisions clusters with kube-system components
like CoreDNS, kube-proxy, and a chosen CNI provider (among flannel,
Calico, or Cilium) pre-installed. This is convenient since clusters
come with "batteries included". But it also means upgrading these
components is generally done in lock-step, by upgrading to a new
Typhoon / Kubernetes release
* It can be valuable to manage these components with a separate
plan/apply process or through automations and deploy systems. For
example, this allows managing CoreDNS separately from the cluster's
lifecycle.
* These "components" will continue to be pre-installed by default,
but a new `components` variable allows them to be disabled and
managed as "addons", components you apply after cluster creation
and manage on a rolling basis. For some of these, we may provide
Terraform modules to aide in managing these components.

```
module "cluster" {
  # defaults
  components = {
    enable = true
    coredns = {
      enable = true
    }
    kube_proxy = {
      enable = true
    }
    # Only the CNI set in var.networking will be installed
    flannel = {
      enable = true
    }
    calico = {
      enable = true
    }
    cilium = {
      enable = true
    }
  }
}
```

An earlier variable `install_container_networking = true/false` has
been removed, since it can now be achieved with this more extensible
and general components mechanism by setting the chosen networking
provider enable field to false.
2024-05-19 16:33:57 -07:00
Dalton Hubble
563feacd29 Update Kubernetes from v1.30.0 to v1.30.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1301
2024-05-15 21:59:00 -07:00
dghubble-renovate[bot]
178d1e6eb1 Bump mkdocs-material from 9.5.22 to v9.5.23 2024-05-15 20:52:03 -07:00
Dalton Hubble
3f34e047f1 azure: Add controller security group and subnet outputs
* Output the network security group name and address prefixes
for controller nodes, to allow adding custom network security
rules that apply specifically to controller nodes
2024-05-14 21:34:31 -07:00
Dalton Hubble
cc80ec9b98 Add firewall and security rules for Cilium/Hubble metrics
* Add firewall or security riles to allow node-to-node traffic
on ports 9962-9965 for Cilium and Hubble metrics. Cilium runs
with host network, so these require cloud firewall changes
2024-05-13 21:27:38 -07:00