Add module for Fedora CoreOS on Google Cloud

* Add Typhoon Fedora CoreOS on Google Cloud as alpha
* Add docs on uploading the Fedora CoreOS GCP gzipped tarball to
Google Cloud storage to create a boot disk image
This commit is contained in:
Dalton Hubble 2020-02-01 15:10:40 -08:00
parent b19ba16afa
commit 8cc303c9ac
25 changed files with 1682 additions and 3 deletions

View File

@ -4,6 +4,10 @@ Notable changes between versions.
## Latest ## Latest
#### Google Cloud
* Add Terraform module for Fedora CoreOS ([#632](https://github.com/poseidon/typhoon/pull/632))
#### Addons #### Addons
* Update nginx-ingress from v0.27.1 to v0.28.0 * Update nginx-ingress from v0.27.1 to v0.28.0

View File

@ -35,6 +35,7 @@ Typhoon is available for [Fedora CoreOS](https://getfedora.org/coreos/) in alpha
|---------------|------------------|------------------|--------| |---------------|------------------|------------------|--------|
| AWS | Fedora CoreOS | [aws/fedora-coreos/kubernetes](aws/fedora-coreos/kubernetes) | alpha | | AWS | Fedora CoreOS | [aws/fedora-coreos/kubernetes](aws/fedora-coreos/kubernetes) | alpha |
| Bare-Metal | Fedora CoreOS | [bare-metal/fedora-coreos/kubernetes](bare-metal/fedora-coreos/kubernetes) | alpha | | Bare-Metal | Fedora CoreOS | [bare-metal/fedora-coreos/kubernetes](bare-metal/fedora-coreos/kubernetes) | alpha |
| Google Cloud | Fedora CoreOS | [google-cloud/fedora-coreos/kubernetes](google-cloud/fedora-coreos/kubernetes) | alpha |
## Documentation ## Documentation

View File

@ -0,0 +1,254 @@
# Google Cloud
!!! danger
Typhoon for Fedora CoreOS is an alpha. Please report Fedora CoreOS bugs to [Fedora](https://github.com/coreos/fedora-coreos-tracker/issues) and Typhoon issues to Typhoon.
In this tutorial, we'll create a Kubernetes v1.17.2 cluster on Google Compute Engine with Fedora CoreOS.
We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a network, firewall rules, health checks, controller instances, worker managed instance group, load balancers, and TLS assets.
Controller hosts are provisioned to run an `etcd-member` peer and a `kubelet` service. Worker hosts run a `kubelet` service. Controller nodes run `kube-apiserver`, `kube-scheduler`, `kube-controller-manager`, and `coredns`, while `kube-proxy` and `calico` (or `flannel`) run on every node. A generated `kubeconfig` provides `kubectl` access to the cluster.
## Requirements
* Google Cloud Account and Service Account
* Google Cloud DNS Zone (registered Domain Name or delegated subdomain)
* Terraform v0.12.6+ and [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) installed locally
## Terraform Setup
Install [Terraform](https://www.terraform.io/downloads.html) v0.12.6+ on your system.
```sh
$ terraform version
Terraform v0.12.16
```
Add the [terraform-provider-ct](https://github.com/poseidon/terraform-provider-ct) plugin binary for your system to `~/.terraform.d/plugins/`, noting the final name.
```sh
wget https://github.com/poseidon/terraform-provider-ct/releases/download/v0.4.0/terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
tar xzf terraform-provider-ct-v0.4.0-linux-amd64.tar.gz
mv terraform-provider-ct-v0.4.0-linux-amd64/terraform-provider-ct ~/.terraform.d/plugins/terraform-provider-ct_v0.4.0
```
Read [concepts](/architecture/concepts/) to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. `infra`).
```
cd infra/clusters
```
## Provider
Login to your Google Console [API Manager](https://console.cloud.google.com/apis/dashboard) and select a project, or [signup](https://cloud.google.com/free/) if you don't have an account.
Select "Credentials" and create a service account key. Choose the "Compute Engine Admin" and "DNS Administrator" roles and save the JSON private key to a file that can be referenced in configs.
```sh
mv ~/Downloads/project-id-43048204.json ~/.config/google-cloud/terraform.json
```
Configure the Google Cloud provider to use your service account key, project-id, and region in a `providers.tf` file.
```tf
provider "google" {
version = "3.4.0"
project = "project-id"
region = "us-central1"
credentials = file("~/.config/google-cloud/terraform.json")
}
provider "ct" {
version = "0.4.0"
}
```
Additional configuration options are described in the `google` provider [docs](https://www.terraform.io/docs/providers/google/index.html).
!!! tip
Regions are listed in [docs](https://cloud.google.com/compute/docs/regions-zones/regions-zones) or with `gcloud compute regions list`. A project may contain multiple clusters across different regions.
## Fedora CoreOS Images
Fedora CoreOS publishes images for Google Cloud, but does not yet upload them. Google Cloud allows [custom boot images](https://cloud.google.com/compute/docs/images/import-existing-image) to be uploaded to a bucket and imported into your project.
[Download](https://getfedora.org/coreos/download/) the Fedora CoreOS GCP gzipped tarball. Then upload the file to a GCS storage bucket.
```
gsutil list
gsutil cp fedora-coreos-31.20200113.3.1-gcp.x86_64.tar.gz gs://BUCKET_NAME
```
Create a Google Compute Engine image from the bucket file.
```
gcloud compute images create fedora-coreos-31-20200113-3-1 --source-uri gs://BUCKET/fedora-coreos-31.20200113.3.1-gcp.x86_64.tar.gz
```
## Cluster
Define a Kubernetes cluster using the module `google-cloud/fedora-coreos/kubernetes`.
```tf
module "yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=development-sha"
# Google Cloud
cluster_name = "yavin"
region = "us-central1"
dns_zone = "example.com"
dns_zone_name = "example-zone"
# temporary
os_image = "fedora-coreos-31-20200113-3-1"
# configuration
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
# optional
worker_count = 2
}
```
Reference the [variables docs](#variables) or the [variables.tf](https://github.com/poseidon/typhoon/blob/master/google-cloud/container-linux/kubernetes/variables.tf) source.
## ssh-agent
Initial bootstrapping requires `bootstrap.service` be started on one controller node. Terraform uses `ssh-agent` to automate this step. Add your SSH private key to `ssh-agent`.
```sh
ssh-add ~/.ssh/id_rsa
ssh-add -L
```
## Apply
Initialize the config directory if this is the first use with Terraform.
```sh
terraform init
```
Plan the resources to be created.
```sh
$ terraform plan
Plan: 64 to add, 0 to change, 0 to destroy.
```
Apply the changes to create the cluster.
```sh
$ terraform apply
module.yavin.null_resource.bootstrap: Still creating... (10s elapsed)
...
module.yavin.null_resource.bootstrap: Still creating... (5m30s elapsed)
module.yavin.null_resource.bootstrap: Still creating... (5m40s elapsed)
module.yavin.null_resource.bootstrap: Creation complete (ID: 5768638456220583358)
Apply complete! Resources: 62 added, 0 changed, 0 destroyed.
```
In 4-8 minutes, the Kubernetes cluster will be ready.
## Verify
[Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on your system. Obtain the generated cluster `kubeconfig` from module outputs (e.g. write to a local file).
```
resource "local_file" "kubeconfig-yavin" {
content = module.yavin.kubeconfig-admin
filename = "/home/user/.kube/configs/yavin-config"
}
```
List nodes in the cluster.
```
$ export KUBECONFIG=/home/user/.kube/configs/yavin-config
$ kubectl get nodes
NAME ROLES STATUS AGE VERSION
yavin-controller-0.c.example-com.internal <none> Ready 6m v1.17.2
yavin-worker-jrbf.c.example-com.internal <none> Ready 5m v1.17.2
yavin-worker-mzdm.c.example-com.internal <none> Ready 5m v1.17.2
```
List the pods.
```
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-1cs8z 2/2 Running 0 6m
kube-system calico-node-d1l5b 2/2 Running 0 6m
kube-system calico-node-sp9ps 2/2 Running 0 6m
kube-system coredns-1187388186-dkh3o 1/1 Running 0 6m
kube-system coredns-1187388186-zj5dl 1/1 Running 0 6m
kube-system kube-apiserver-controller-0 1/1 Running 0 6m
kube-system kube-controller-manager-controller-0 1/1 Running 0 6m
kube-system kube-proxy-117v6 1/1 Running 0 6m
kube-system kube-proxy-9886n 1/1 Running 0 6m
kube-system kube-proxy-njn47 1/1 Running 0 6m
kube-system kube-scheduler-controller-0 1/1 Running 0 6m
```
## Going Further
Learn about [maintenance](/topics/maintenance/) and [addons](/addons/overview/).
## Variables
Check the [variables.tf](https://github.com/poseidon/typhoon/blob/master/google-cloud/container-linux/kubernetes/variables.tf) source.
### Required
| Name | Description | Example |
|:-----|:------------|:--------|
| cluster_name | Unique cluster name (prepended to dns_zone) | "yavin" |
| region | Google Cloud region | "us-central1" |
| dns_zone | Google Cloud DNS zone | "google-cloud.example.com" |
| dns_zone_name | Google Cloud DNS zone name | "example-zone" |
| ssh_authorized_key | SSH public key for user 'core' | "ssh-rsa AAAAB3NZ..." |
Check the list of valid [regions](https://cloud.google.com/compute/docs/regions-zones/regions-zones) and list Fedora CoreOS [images](https://cloud.google.com/compute/docs/images) with `gcloud compute images list | grep fedora-coreos`.
#### DNS Zone
Clusters create a DNS A record `${cluster_name}.${dns_zone}` to resolve a TCP proxy load balancer backed by controller instances. This FQDN is used by workers and `kubectl` to access the apiserver(s). In this example, the cluster's apiserver would be accessible at `yavin.google-cloud.example.com`.
You'll need a registered domain name or delegated subdomain on Google Cloud DNS. You can set this up once and create many clusters with unique names.
```tf
resource "google_dns_managed_zone" "zone-for-clusters" {
dns_name = "google-cloud.example.com."
name = "example-zone"
description = "Production DNS zone"
}
```
!!! tip ""
If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on Google Cloud (e.g. google-cloud.mydomain.com) and [update nameservers](https://cloud.google.com/dns/update-name-servers).
### Optional
| Name | Description | Default | Example |
|:-----|:------------|:--------|:--------|
| asset_dir | Absolute path to a directory where generated assets should be placed (contains secrets) | "" (disabled) | "/home/user/.secrets/clusters/yavin" |
| controller_count | Number of controllers (i.e. masters) | 1 | 3 |
| worker_count | Number of workers | 1 | 3 |
| controller_type | Machine type for controllers | "n1-standard-1" | See below |
| worker_type | Machine type for workers | "n1-standard-1" | See below |
| os_image | Fedora CoreOS image for compute instances | "" | "fedora-coreos-31-20200113-3-1" |
| disk_size | Size of the disk in GB | 40 | 100 |
| worker_preemptible | If enabled, Compute Engine will terminate workers randomly within 24 hours | false | true |
| controller_snippets | Controller Fedora CoreOS Config snippets | [] | UNSUPPORTED |
| worker_snippets | Worker Fedora CoreOS Config snippets | [] | UNSUPPORTED |
| networking | Choice of networking provider | "calico" | "calico" or "flannel" |
| pod_cidr | CIDR IPv4 range to assign to Kubernetes pods | "10.2.0.0/16" | "10.22.0.0/16" |
| service_cidr | CIDR IPv4 range to assign to Kubernetes services | "10.3.0.0/16" | "10.3.0.0/24" |
| worker_node_labels | List of initial worker node labels | [] | ["worker-pool=default"] |
Check the list of valid [machine types](https://cloud.google.com/compute/docs/machine-types).
#### Preemption
Add `worker_preemptible = "true"` to allow worker nodes to be [preempted](https://cloud.google.com/compute/docs/instances/preemptible) at random, but pay [significantly](https://cloud.google.com/compute/pricing) less. Clusters tolerate stopping instances fairly well (reschedules pods, but cannot drain) and preemption provides a nice reward for running fault-tolerant cluster systems.`

View File

@ -35,6 +35,7 @@ Typhoon is available for [Fedora CoreOS](https://getfedora.org/coreos/) in alpha
|---------------|------------------|------------------|--------| |---------------|------------------|------------------|--------|
| AWS | Fedora CoreOS | [aws/fedora-coreos/kubernetes](fedora-coreos/aws.md) | alpha | | AWS | Fedora CoreOS | [aws/fedora-coreos/kubernetes](fedora-coreos/aws.md) | alpha |
| Bare-Metal | Fedora CoreOS | [bare-metal/fedora-coreos/kubernetes](fedora-coreos/bare-metal.md) | alpha | | Bare-Metal | Fedora CoreOS | [bare-metal/fedora-coreos/kubernetes](fedora-coreos/bare-metal.md) | alpha |
| Google Cloud | Fedora CoreOS | [google-cloud/fedora-coreos/kubernetes](google-cloud/fedora-coreos/kubernetes) | alpha |
## Documentation ## Documentation

View File

@ -0,0 +1,23 @@
The MIT License (MIT)
Copyright (c) 2020 Typhoon Authors
Copyright (c) 2020 Dalton Hubble
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@ -0,0 +1,23 @@
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
Typhoon is a minimal and free Kubernetes distribution.
* Minimal, stable base Kubernetes distribution
* Declarative infrastructure and configuration
* Free (freedom and cost) and privacy-respecting
* Practical for labs, datacenters, and clouds
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.17.2 (upstream)
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/cl/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs
Please see the [official docs](https://typhoon.psdn.io) and the Google Cloud [tutorial](https://typhoon.psdn.io/cl/google-cloud/).

View File

@ -0,0 +1,93 @@
# TCP Proxy load balancer DNS record
resource "google_dns_record_set" "apiserver" {
# DNS Zone name where record should be created
managed_zone = var.dns_zone_name
# DNS record
name = format("%s.%s.", var.cluster_name, var.dns_zone)
type = "A"
ttl = 300
# IPv4 address of apiserver TCP Proxy load balancer
rrdatas = [google_compute_global_address.apiserver-ipv4.address]
}
# Static IPv4 address for the TCP Proxy Load Balancer
resource "google_compute_global_address" "apiserver-ipv4" {
name = "${var.cluster_name}-apiserver-ip"
ip_version = "IPV4"
}
# Forward IPv4 TCP traffic to the TCP proxy load balancer
resource "google_compute_global_forwarding_rule" "apiserver" {
name = "${var.cluster_name}-apiserver"
ip_address = google_compute_global_address.apiserver-ipv4.address
ip_protocol = "TCP"
port_range = "443"
target = google_compute_target_tcp_proxy.apiserver.self_link
}
# Global TCP Proxy Load Balancer for apiservers
resource "google_compute_target_tcp_proxy" "apiserver" {
name = "${var.cluster_name}-apiserver"
description = "Distribute TCP load across ${var.cluster_name} controllers"
backend_service = google_compute_backend_service.apiserver.self_link
}
# Global backend service backed by unmanaged instance groups
resource "google_compute_backend_service" "apiserver" {
name = "${var.cluster_name}-apiserver"
description = "${var.cluster_name} apiserver service"
protocol = "TCP"
port_name = "apiserver"
session_affinity = "NONE"
timeout_sec = "300"
# controller(s) spread across zonal instance groups
dynamic "backend" {
for_each = google_compute_instance_group.controllers
content {
group = backend.value.self_link
}
}
health_checks = [google_compute_health_check.apiserver.self_link]
}
# Instance group of heterogeneous (unmanged) controller instances
resource "google_compute_instance_group" "controllers" {
count = min(var.controller_count, length(local.zones))
name = format("%s-controllers-%s", var.cluster_name, element(local.zones, count.index))
zone = element(local.zones, count.index)
named_port {
name = "apiserver"
port = "6443"
}
# add instances in the zone into the instance group
instances = matchkeys(
google_compute_instance.controllers.*.self_link,
google_compute_instance.controllers.*.zone,
[element(local.zones, count.index)],
)
}
# TCP health check for apiserver
resource "google_compute_health_check" "apiserver" {
name = "${var.cluster_name}-apiserver-tcp-health"
description = "TCP health check for kube-apiserver"
timeout_sec = 5
check_interval_sec = 5
healthy_threshold = 1
unhealthy_threshold = 3
tcp_health_check {
port = "6443"
}
}

View File

@ -0,0 +1,22 @@
# Kubernetes assets (kubeconfig, manifests)
module "bootstrap" {
source = "git::https://github.com/poseidon/terraform-render-bootstrap.git?ref=05297b94a936c356851e180e4963034e0047e1c0"
cluster_name = var.cluster_name
api_servers = [format("%s.%s", var.cluster_name, var.dns_zone)]
etcd_servers = google_dns_record_set.etcds.*.name
asset_dir = var.asset_dir
networking = var.networking
network_mtu = 1440
pod_cidr = var.pod_cidr
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
enable_reporting = var.enable_reporting
enable_aggregation = var.enable_aggregation
trusted_certs_dir = "/etc/pki/tls/certs"
// temporary
external_apiserver_port = 443
}

View File

@ -0,0 +1,103 @@
# Discrete DNS records for each controller's private IPv4 for etcd usage
resource "google_dns_record_set" "etcds" {
count = var.controller_count
# DNS Zone name where record should be created
managed_zone = var.dns_zone_name
# DNS record
name = format("%s-etcd%d.%s.", var.cluster_name, count.index, var.dns_zone)
type = "A"
ttl = 300
# private IPv4 address for etcd
rrdatas = [google_compute_instance.controllers.*.network_interface.0.network_ip[count.index]]
}
# Zones in the region
data "google_compute_zones" "all" {
region = var.region
}
locals {
zones = data.google_compute_zones.all.names
controllers_ipv4_public = google_compute_instance.controllers.*.network_interface.0.access_config.0.nat_ip
}
# Controller instances
resource "google_compute_instance" "controllers" {
count = var.controller_count
name = "${var.cluster_name}-controller-${count.index}"
# use a zone in the region and wrap around (e.g. controllers > zones)
zone = element(local.zones, count.index)
machine_type = var.controller_type
metadata = {
user-data = data.ct_config.controller-ignitions.*.rendered[count.index]
}
boot_disk {
auto_delete = true
initialize_params {
image = var.os_image
size = var.disk_size
}
}
network_interface {
network = google_compute_network.network.name
# Ephemeral external IP
access_config {
}
}
can_ip_forward = true
tags = ["${var.cluster_name}-controller"]
lifecycle {
ignore_changes = [metadata]
}
}
# Controller Ignition configs
data "ct_config" "controller-ignitions" {
count = var.controller_count
content = data.template_file.controller-configs.*.rendered[count.index]
strict = true
snippets = var.controller_snippets
}
# Controller Fedora CoreOS configs
data "template_file" "controller-configs" {
count = var.controller_count
template = file("${path.module}/fcc/controller.yaml")
vars = {
# Cannot use cyclic dependencies on controllers or their DNS records
etcd_name = "etcd${count.index}"
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
etcd_initial_cluster = join(",", data.template_file.etcds.*.rendered)
kubeconfig = indent(10, module.bootstrap.kubeconfig-kubelet)
ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
}
}
data "template_file" "etcds" {
count = var.controller_count
template = "etcd$${index}=https://$${cluster_name}-etcd$${index}.$${dns_zone}:2380"
vars = {
index = count.index
cluster_name = var.cluster_name
dns_zone = var.dns_zone
}
}

View File

@ -0,0 +1,216 @@
---
variant: fcos
version: 1.0.0
systemd:
units:
- name: etcd-member.service
enabled: true
contents: |
[Unit]
Description=etcd (System Container)
Documentation=https://github.com/coreos/etcd
Wants=network-online.target network.target
After=network-online.target
[Service]
# https://github.com/opencontainers/runc/pull/1807
# Type=notify
# NotifyAccess=exec
Type=exec
Restart=on-failure
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
ExecStartPre=/bin/mkdir -p /var/lib/etcd
ExecStartPre=-/usr/bin/podman rm etcd
#--volume $${NOTIFY_SOCKET}:/run/systemd/notify \
ExecStart=/usr/bin/podman run --name etcd \
--env-file /etc/etcd/etcd.env \
--network host \
--volume /var/lib/etcd:/var/lib/etcd:rw,Z \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro,Z \
quay.io/coreos/etcd:v3.4.3
ExecStop=/usr/bin/podman stop etcd
[Install]
WantedBy=multi-user.target
- name: docker.service
enabled: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet via Hyperkube (System Container)
Wants=rpc-statd.service
[Service]
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/podman rm kubelet
ExecStart=/usr/bin/podman run --name kubelet \
--privileged \
--pid host \
--network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \
--volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico \
--volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \
--volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \
k8s.gcr.io/hyperkube:v1.17.2 kubelet \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--cgroup-driver=systemd \
--cgroups-per-qos=true \
--enforce-node-allocatable=pods \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--healthz-port=0 \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/master \
--node-labels=node.kubernetes.io/controller="true" \
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/podman stop kubelet
Delegate=yes
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: bootstrap.service
contents: |
[Unit]
Description=Kubernetes control plane
ConditionPathExists=!/opt/bootstrap/bootstrap.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
ExecStartPre=-/usr/bin/bash -c 'set -x && [ -n "$(ls /opt/bootstrap/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootstrap/assets/manifests-*/* /opt/bootstrap/assets/manifests && rm -rf /opt/bootstrap/assets/manifests-*'
ExecStart=/usr/bin/podman run --name bootstrap \
--network host \
--volume /etc/kubernetes/bootstrap-secrets:/etc/kubernetes/secrets:ro,Z \
--volume /opt/bootstrap/assets:/assets:ro,Z \
--volume /opt/bootstrap/apply:/apply:ro,Z \
--entrypoint=/apply \
k8s.gcr.io/hyperkube:v1.17.2
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
ExecStartPost=-/usr/bin/podman stop bootstrap
storage:
directories:
- path: /etc/kubernetes
- path: /opt/bootstrap
files:
- path: /etc/kubernetes/kubeconfig
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /opt/bootstrap/layout
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/bootstrap-secrets
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/bootstrap-secrets/
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
mv auth/kubeconfig /etc/kubernetes/bootstrap-secrets/
mv tls/k8s/* /etc/kubernetes/bootstrap-secrets/
sudo mkdir -p /etc/kubernetes/manifests
sudo mv static-manifests/* /etc/kubernetes/manifests/
sudo mkdir -p /opt/bootstrap/assets
sudo mv manifests /opt/bootstrap/assets/manifests
sudo mv manifests-networking /opt/bootstrap/assets/manifests-networking
rm -rf assets auth static-manifests tls
- path: /opt/bootstrap/apply
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/etc/kubernetes/secrets/kubeconfig
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5
done
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.all.rp_filter=1
- path: /etc/sysctl.d/max-user-watches.conf
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/systemd/system.conf.d/accounting.conf
contents:
inline: |
[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes
- path: /etc/etcd/etcd.env
mode: 0644
contents:
inline: |
# TODO: Use a systemd dropin once podman v1.4.5 is avail.
NOTIFY_SOCKET=/run/systemd/notify
ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381
ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}
ETCD_STRICT_RECONFIG_CHECK=true
ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt
ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt
ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true
passwd:
users:
- name: core
ssh_authorized_keys:
- ${ssh_authorized_key}

View File

@ -0,0 +1,123 @@
# Static IPv4 address for Ingress Load Balancing
resource "google_compute_global_address" "ingress-ipv4" {
name = "${var.cluster_name}-ingress-ipv4"
ip_version = "IPV4"
}
# Static IPv6 address for Ingress Load Balancing
resource "google_compute_global_address" "ingress-ipv6" {
name = "${var.cluster_name}-ingress-ipv6"
ip_version = "IPV6"
}
# Forward IPv4 TCP traffic to the HTTP proxy load balancer
# Google Cloud does not allow TCP proxies for port 80. Must use HTTP proxy.
resource "google_compute_global_forwarding_rule" "ingress-http-ipv4" {
name = "${var.cluster_name}-ingress-http-ipv4"
ip_address = google_compute_global_address.ingress-ipv4.address
ip_protocol = "TCP"
port_range = "80"
target = google_compute_target_http_proxy.ingress-http.self_link
}
# Forward IPv4 TCP traffic to the TCP proxy load balancer
resource "google_compute_global_forwarding_rule" "ingress-https-ipv4" {
name = "${var.cluster_name}-ingress-https-ipv4"
ip_address = google_compute_global_address.ingress-ipv4.address
ip_protocol = "TCP"
port_range = "443"
target = google_compute_target_tcp_proxy.ingress-https.self_link
}
# Forward IPv6 TCP traffic to the HTTP proxy load balancer
# Google Cloud does not allow TCP proxies for port 80. Must use HTTP proxy.
resource "google_compute_global_forwarding_rule" "ingress-http-ipv6" {
name = "${var.cluster_name}-ingress-http-ipv6"
ip_address = google_compute_global_address.ingress-ipv6.address
ip_protocol = "TCP"
port_range = "80"
target = google_compute_target_http_proxy.ingress-http.self_link
}
# Forward IPv6 TCP traffic to the TCP proxy load balancer
resource "google_compute_global_forwarding_rule" "ingress-https-ipv6" {
name = "${var.cluster_name}-ingress-https-ipv6"
ip_address = google_compute_global_address.ingress-ipv6.address
ip_protocol = "TCP"
port_range = "443"
target = google_compute_target_tcp_proxy.ingress-https.self_link
}
# HTTP proxy load balancer for ingress controllers
resource "google_compute_target_http_proxy" "ingress-http" {
name = "${var.cluster_name}-ingress-http"
description = "Distribute HTTP load across ${var.cluster_name} workers"
url_map = google_compute_url_map.ingress-http.self_link
}
# TCP proxy load balancer for ingress controllers
resource "google_compute_target_tcp_proxy" "ingress-https" {
name = "${var.cluster_name}-ingress-https"
description = "Distribute HTTPS load across ${var.cluster_name} workers"
backend_service = google_compute_backend_service.ingress-https.self_link
}
# HTTP URL Map (required)
resource "google_compute_url_map" "ingress-http" {
name = "${var.cluster_name}-ingress-http"
# Do not add host/path rules for applications here. Use Ingress resources.
default_service = google_compute_backend_service.ingress-http.self_link
}
# Backend service backed by managed instance group of workers
resource "google_compute_backend_service" "ingress-http" {
name = "${var.cluster_name}-ingress-http"
description = "${var.cluster_name} ingress service"
protocol = "HTTP"
port_name = "http"
session_affinity = "NONE"
timeout_sec = "60"
backend {
group = module.workers.instance_group
}
health_checks = [google_compute_health_check.ingress.self_link]
}
# Backend service backed by managed instance group of workers
resource "google_compute_backend_service" "ingress-https" {
name = "${var.cluster_name}-ingress-https"
description = "${var.cluster_name} ingress service"
protocol = "TCP"
port_name = "https"
session_affinity = "NONE"
timeout_sec = "60"
backend {
group = module.workers.instance_group
}
health_checks = [google_compute_health_check.ingress.self_link]
}
# Ingress HTTP Health Check
resource "google_compute_health_check" "ingress" {
name = "${var.cluster_name}-ingress-health"
description = "Health check for Ingress controller"
timeout_sec = 5
check_interval_sec = 5
healthy_threshold = 2
unhealthy_threshold = 4
http_health_check {
port = 10254
request_path = "/healthz"
}
}

View File

@ -0,0 +1,193 @@
resource "google_compute_network" "network" {
name = var.cluster_name
description = "Network for the ${var.cluster_name} cluster"
auto_create_subnetworks = true
timeouts {
delete = "6m"
}
}
resource "google_compute_firewall" "allow-ssh" {
name = "${var.cluster_name}-allow-ssh"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = [22]
}
source_ranges = ["0.0.0.0/0"]
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
}
resource "google_compute_firewall" "internal-etcd" {
name = "${var.cluster_name}-internal-etcd"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = [2379, 2380]
}
source_tags = ["${var.cluster_name}-controller"]
target_tags = ["${var.cluster_name}-controller"]
}
# Allow Prometheus to scrape etcd metrics
resource "google_compute_firewall" "internal-etcd-metrics" {
name = "${var.cluster_name}-internal-etcd-metrics"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = [2381]
}
source_tags = ["${var.cluster_name}-worker"]
target_tags = ["${var.cluster_name}-controller"]
}
# Allow Prometheus to scrape kube-scheduler and kube-controller-manager metrics
resource "google_compute_firewall" "internal-kube-metrics" {
name = "${var.cluster_name}-internal-kube-metrics"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = [10251, 10252]
}
source_tags = ["${var.cluster_name}-worker"]
target_tags = ["${var.cluster_name}-controller"]
}
resource "google_compute_firewall" "allow-apiserver" {
name = "${var.cluster_name}-allow-apiserver"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = [6443]
}
source_ranges = ["0.0.0.0/0"]
target_tags = ["${var.cluster_name}-controller"]
}
# BGP and IPIP
# https://docs.projectcalico.org/latest/reference/public-cloud/gce
resource "google_compute_firewall" "internal-bgp" {
count = var.networking != "flannel" ? 1 : 0
name = "${var.cluster_name}-internal-bgp"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = ["179"]
}
allow {
protocol = "ipip"
}
source_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
}
# flannel VXLAN
resource "google_compute_firewall" "internal-vxlan" {
count = var.networking == "flannel" ? 1 : 0
name = "${var.cluster_name}-internal-vxlan"
network = google_compute_network.network.name
allow {
protocol = "udp"
ports = [4789]
}
source_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
}
# Allow Prometheus to scrape node-exporter daemonset
resource "google_compute_firewall" "internal-node-exporter" {
name = "${var.cluster_name}-internal-node-exporter"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = [9100]
}
source_tags = ["${var.cluster_name}-worker"]
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
}
# Allow Prometheus to scrape kube-proxy metrics
resource "google_compute_firewall" "internal-kube-proxy" {
name = "${var.cluster_name}-internal-kube-proxy"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = [10249]
}
source_tags = ["${var.cluster_name}-worker"]
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
}
# Allow apiserver to access kubelets for exec, log, port-forward
resource "google_compute_firewall" "internal-kubelet" {
name = "${var.cluster_name}-internal-kubelet"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = [10250]
}
# allow Prometheus to scrape kubelet metrics too
source_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
}
# Workers
resource "google_compute_firewall" "allow-ingress" {
name = "${var.cluster_name}-allow-ingress"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = [80, 443]
}
source_ranges = ["0.0.0.0/0"]
target_tags = ["${var.cluster_name}-worker"]
}
resource "google_compute_firewall" "google-ingress-health-checks" {
name = "${var.cluster_name}-ingress-health"
network = google_compute_network.network.name
allow {
protocol = "tcp"
ports = [10254]
}
# https://cloud.google.com/load-balancing/docs/health-check-concepts#method
source_ranges = [
"35.191.0.0/16",
"130.211.0.0/22",
"35.191.0.0/16",
"209.85.152.0/22",
"209.85.204.0/22",
]
target_tags = ["${var.cluster_name}-worker"]
}

View File

@ -0,0 +1,44 @@
output "kubeconfig-admin" {
value = module.bootstrap.kubeconfig-admin
}
# Outputs for Kubernetes Ingress
output "ingress_static_ipv4" {
description = "Global IPv4 address for proxy load balancing to the nearest Ingress controller"
value = google_compute_global_address.ingress-ipv4.address
}
output "ingress_static_ipv6" {
description = "Global IPv6 address for proxy load balancing to the nearest Ingress controller"
value = google_compute_global_address.ingress-ipv6.address
}
# Outputs for worker pools
output "network_name" {
value = google_compute_network.network.name
}
output "kubeconfig" {
value = module.bootstrap.kubeconfig-kubelet
}
# Outputs for custom firewalling
output "network_self_link" {
value = google_compute_network.network.self_link
}
# Outputs for custom load balancing
output "worker_instance_group" {
description = "Worker managed instance group full URL"
value = module.workers.instance_group
}
output "worker_target_pool" {
description = "Worker target pool self link"
value = module.workers.target_pool
}

View File

@ -0,0 +1,58 @@
locals {
# format assets for distribution
assets_bundle = [
# header with the unpack location
for key, value in module.bootstrap.assets_dist :
format("##### %s\n%s", key, value)
]
}
# Secure copy assets to controllers.
resource "null_resource" "copy-controller-secrets" {
count = var.controller_count
depends_on = [
module.bootstrap,
]
connection {
type = "ssh"
host = local.controllers_ipv4_public[count.index]
user = "core"
timeout = "15m"
}
provisioner "file" {
content = join("\n", local.assets_bundle)
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo /opt/bootstrap/layout",
]
}
}
# Connect to a controller to perform one-time cluster bootstrap.
resource "null_resource" "bootstrap" {
depends_on = [
null_resource.copy-controller-secrets,
module.workers,
google_dns_record_set.apiserver,
]
connection {
type = "ssh"
host = local.controllers_ipv4_public[0]
user = "core"
timeout = "15m"
}
provisioner "remote-exec" {
inline = [
"sudo systemctl start bootstrap",
]
}
}

View File

@ -0,0 +1,138 @@
variable "cluster_name" {
type = string
description = "Unique cluster name (prepended to dns_zone)"
}
# Google Cloud
variable "region" {
type = string
description = "Google Cloud Region (e.g. us-central1, see `gcloud compute regions list`)"
}
variable "dns_zone" {
type = string
description = "Google Cloud DNS Zone (e.g. google-cloud.example.com)"
}
variable "dns_zone_name" {
type = string
description = "Google Cloud DNS Zone name (e.g. example-zone)"
}
# instances
variable "controller_count" {
type = number
description = "Number of controllers (i.e. masters)"
default = 1
}
variable "worker_count" {
type = number
description = "Number of workers"
default = 1
}
variable "controller_type" {
type = string
description = "Machine type for controllers (see `gcloud compute machine-types list`)"
default = "n1-standard-1"
}
variable "worker_type" {
type = string
description = "Machine type for controllers (see `gcloud compute machine-types list`)"
default = "n1-standard-1"
}
variable "os_image" {
type = string
description = "Fedora CoreOS image for compute instances (e.g. fedora-coreos)"
}
variable "disk_size" {
type = number
description = "Size of the disk in GB"
default = 40
}
variable "worker_preemptible" {
type = bool
description = "If enabled, Compute Engine will terminate workers randomly within 24 hours"
default = false
}
variable "controller_snippets" {
type = list(string)
description = "Controller Fedora CoreOS Config snippets"
default = []
}
variable "worker_snippets" {
type = list(string)
description = "Worker Fedora CoreOS Config snippets"
default = []
}
# configuration
variable "ssh_authorized_key" {
type = string
description = "SSH public key for user 'core'"
}
variable "asset_dir" {
type = string
description = "Absolute path to a directory where generated assets should be placed (contains secrets)"
default = ""
}
variable "networking" {
type = string
description = "Choice of networking provider (flannel or calico)"
default = "calico"
}
variable "pod_cidr" {
type = string
description = "CIDR IPv4 range to assign Kubernetes pods"
default = "10.2.0.0/16"
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
default = "10.3.0.0/16"
}
variable "enable_reporting" {
type = bool
description = "Enable usage or analytics reporting to upstreams (Calico)"
default = false
}
variable "enable_aggregation" {
type = bool
description = "Enable the Kubernetes Aggregation Layer (defaults to false)"
default = false
}
variable "worker_node_labels" {
type = list(string)
description = "List of initial worker node labels"
default = []
}
# unofficial, undocumented, unsupported
variable "cluster_domain_suffix" {
type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
default = "cluster.local"
}

View File

@ -0,0 +1,11 @@
# Terraform version and plugin versions
terraform {
required_version = "~> 0.12.6"
required_providers {
google = ">= 2.19, < 4.0"
ct = "~> 0.3"
template = "~> 2.1"
null = "~> 2.1"
}
}

View File

@ -0,0 +1,23 @@
module "workers" {
source = "./workers"
name = var.cluster_name
cluster_name = var.cluster_name
# GCE
region = var.region
network = google_compute_network.network.name
worker_count = var.worker_count
machine_type = var.worker_type
os_image = var.os_image
disk_size = var.disk_size
preemptible = var.worker_preemptible
# configuration
kubeconfig = module.bootstrap.kubeconfig-kubelet
ssh_authorized_key = var.ssh_authorized_key
service_cidr = var.service_cidr
cluster_domain_suffix = var.cluster_domain_suffix
snippets = var.worker_snippets
node_labels = var.worker_node_labels
}

View File

@ -0,0 +1,110 @@
---
variant: fcos
version: 1.0.0
systemd:
units:
- name: docker.service
enabled: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet via Hyperkube (System Container)
Wants=rpc-statd.service
[Service]
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/podman rm kubelet
ExecStart=/usr/bin/podman run --name kubelet \
--privileged \
--pid host \
--network host \
--volume /etc/kubernetes:/etc/kubernetes:ro,z \
--volume /usr/lib/os-release:/etc/os-release:ro \
--volume /etc/ssl/certs:/etc/ssl/certs:ro \
--volume /lib/modules:/lib/modules:ro \
--volume /run:/run \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--volume /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd \
--volume /etc/pki/tls/certs:/usr/share/ca-certificates:ro \
--volume /var/lib/calico:/var/lib/calico \
--volume /var/lib/docker:/var/lib/docker \
--volume /var/lib/kubelet:/var/lib/kubelet:rshared,z \
--volume /var/log:/var/log \
--volume /var/run/lock:/var/run/lock:z \
--volume /opt/cni/bin:/opt/cni/bin:z \
k8s.gcr.io/hyperkube:v1.17.2 kubelet \
--anonymous-auth=false \
--authentication-token-webhook \
--authorization-mode=Webhook \
--cgroup-driver=systemd \
--cgroups-per-qos=true \
--enforce-node-allocatable=pods \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${cluster_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--healthz-port=0 \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node.kubernetes.io/node \
%{ for label in split(",", node_labels) }
--node-labels=${label} \
%{ endfor ~}
--pod-manifest-path=/etc/kubernetes/manifests \
--read-only-port=0 \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/podman stop kubelet
Delegate=yes
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
storage:
directories:
- path: /etc/kubernetes
files:
- path: /etc/kubernetes/kubeconfig
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/sysctl.d/reverse-path-filter.conf
contents:
inline: |
net.ipv4.conf.all.rp_filter=1
- path: /etc/sysctl.d/max-user-watches.conf
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/systemd/system.conf.d/accounting.conf
contents:
inline: |
[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes
passwd:
users:
- name: core
ssh_authorized_keys:
- ${ssh_authorized_key}

View File

@ -0,0 +1,14 @@
# Outputs for global load balancing
output "instance_group" {
description = "Worker managed instance group full URL"
value = google_compute_region_instance_group_manager.workers.instance_group
}
# Outputs for regional load balancing
output "target_pool" {
description = "Worker target pool self link"
value = google_compute_target_pool.workers.self_link
}

View File

@ -0,0 +1,23 @@
# Target pool for TCP/UDP load balancing
resource "google_compute_target_pool" "workers" {
name = "${var.name}-worker-pool"
region = var.region
session_affinity = "NONE"
health_checks = [
google_compute_http_health_check.workers.name,
]
}
# HTTP Health Check (for TCP/UDP load balancing)
# Forward rules (regional) to target pools don't support different external
# and internal ports. Health check for nodes with Ingress controllers that
# may support proxying or otherwise satisfy the check.
resource "google_compute_http_health_check" "workers" {
name = "${var.name}-target-pool-health"
description = "Health check for the worker target pool"
port = 10254
request_path = "/healthz"
}

View File

@ -0,0 +1,106 @@
variable "name" {
type = string
description = "Unique name for the worker pool"
}
variable "cluster_name" {
type = string
description = "Must be set to `cluster_name of cluster`"
}
# Google Cloud
variable "region" {
type = string
description = "Must be set to `region` of cluster"
}
variable "network" {
type = string
description = "Must be set to `network_name` output by cluster"
}
# instances
variable "worker_count" {
type = number
description = "Number of worker compute instances the instance group should manage"
default = 1
}
variable "machine_type" {
type = string
description = "Machine type for compute instances (e.g. gcloud compute machine-types list)"
default = "n1-standard-1"
}
variable "os_image" {
type = string
description = "Fedora CoreOS image for compute instanges (e.g. gcloud compute images list)"
}
variable "disk_size" {
type = number
description = "Size of the disk in GB"
default = 40
}
variable "preemptible" {
type = bool
description = "If enabled, Compute Engine will terminate instances randomly within 24 hours"
default = false
}
variable "snippets" {
type = list(string)
description = "Fedora CoreOS Config snippets"
default = []
}
# configuration
variable "kubeconfig" {
type = string
description = "Must be set to `kubeconfig` output by cluster"
}
variable "ssh_authorized_key" {
type = string
description = "SSH public key for user 'core'"
}
variable "service_cidr" {
type = string
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for coredns.
EOD
default = "10.3.0.0/16"
}
variable "node_labels" {
type = list(string)
description = "List of initial node labels"
default = []
}
# unofficial, undocumented, unsupported, temporary
variable "cluster_domain_suffix" {
type = string
description = "Queries for domains with the suffix will be answered by coredns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
default = "cluster.local"
}
variable "accelerator_type" {
type = string
default = ""
description = "Google Compute Engine accelerator type (e.g. nvidia-tesla-k80, see gcloud compute accelerator-types list)"
}
variable "accelerator_count" {
type = string
default = "0"
description = "Number of compute engine accelerators"
}

View File

@ -0,0 +1,4 @@
terraform {
required_version = ">= 0.12"
}

View File

@ -0,0 +1,91 @@
# Managed instance group of workers
resource "google_compute_region_instance_group_manager" "workers" {
name = "${var.name}-worker-group"
description = "Compute instance group of ${var.name} workers"
# instance name prefix for instances in the group
base_instance_name = "${var.name}-worker"
region = var.region
version {
name = "default"
instance_template = google_compute_instance_template.worker.self_link
}
target_size = var.worker_count
target_pools = [google_compute_target_pool.workers.self_link]
named_port {
name = "http"
port = "80"
}
named_port {
name = "https"
port = "443"
}
}
# Worker instance template
resource "google_compute_instance_template" "worker" {
name_prefix = "${var.name}-worker-"
description = "Worker Instance template"
machine_type = var.machine_type
metadata = {
user-data = data.ct_config.worker-ignition.rendered
}
scheduling {
automatic_restart = var.preemptible ? false : true
preemptible = var.preemptible
}
disk {
auto_delete = true
boot = true
source_image = var.os_image
disk_size_gb = var.disk_size
}
network_interface {
network = var.network
# Ephemeral external IP
access_config {
}
}
can_ip_forward = true
tags = ["worker", "${var.cluster_name}-worker", "${var.name}-worker"]
guest_accelerator {
count = var.accelerator_count
type = var.accelerator_type
}
lifecycle {
# To update an Instance Template, Terraform should replace the existing resource
create_before_destroy = true
}
}
# Worker Ignition config
data "ct_config" "worker-ignition" {
content = data.template_file.worker-config.rendered
strict = true
snippets = var.snippets
}
# Worker Fedora CoreOS config
data "template_file" "worker-config" {
template = file("${path.module}/fcc/worker.yaml")
vars = {
kubeconfig = indent(10, var.kubeconfig)
ssh_authorized_key = var.ssh_authorized_key
cluster_dns_service_ip = cidrhost(var.service_cidr, 10)
cluster_domain_suffix = var.cluster_domain_suffix
node_labels = join(",", var.node_labels)
}
}

View File

@ -62,6 +62,7 @@ nav:
- 'Fedora CoreOS': - 'Fedora CoreOS':
- 'AWS': 'fedora-coreos/aws.md' - 'AWS': 'fedora-coreos/aws.md'
- 'Bare-Metal': 'fedora-coreos/bare-metal.md' - 'Bare-Metal': 'fedora-coreos/bare-metal.md'
- 'Google Cloud': 'fedora-coreos/google-cloud.md'
- 'Topics': - 'Topics':
- 'Maintenance': 'topics/maintenance.md' - 'Maintenance': 'topics/maintenance.md'
- 'Hardware': 'topics/hardware.md' - 'Hardware': 'topics/hardware.md'