Adjust Google Cloud worker health checks to use kube-proxy healthz

* Change the workers managed instance group to health check nodes via HTTP probe of the kube-proxy port 10256 /healthz endpoints * Advantages: kube-proxy is a lower value target (in case there were bugs in firewalls) that Kubelet, its more representative than health checking Kubelet (Kubelet must run AND kube-proxy Daemonset must be healthy), and its already used by kube-proxy liveness probes (better discoverability via kubectl or alerts on pods crashlooping) * Another motivator is that GKE clusters also use kube-proxy port 10256 checks to assess node health
2025-09-18 06:19:43 +02:00 · 2022-08-17 20:50:52 -07:00
parent 760b4cd5ee
commit e87d5aabc3
5 changed files with 27 additions and 25 deletions
--- a/google-cloud/flatcar-linux/kubernetes/network.tf
+++ b/google-cloud/flatcar-linux/kubernetes/network.tf
@@ -196,13 +196,13 @@ resource "google_compute_firewall" "allow-ingress" {
  target_tags   = ["${var.cluster_name}-worker"]
 }

-resource "google_compute_firewall" "google-kubelet-health-checks" {
-  name    = "${var.cluster_name}-kubelet-health"
+resource "google_compute_firewall" "google-worker-health-checks" {
+  name    = "${var.cluster_name}-worker-health"
  network = google_compute_network.network.name

  allow {
    protocol = "tcp"
-    ports    = [10250]
+    ports    = [10256]
  }

  # https://cloud.google.com/compute/docs/instance-groups/autohealing-instances-in-migs