typhoon/azure/flatcar-linux/kubernetes/butane/controller.yaml

228 lines
8.1 KiB
YAML
Raw Normal View History

variant: flatcar
version: 1.0.0
systemd:
units:
- name: etcd-member.service
enabled: true
contents: |
[Unit]
Description=etcd (System Container)
Documentation=https://github.com/etcd-io/etcd
Requires=docker.service
After=docker.service
[Service]
Environment=ETCD_IMAGE=quay.io/coreos/etcd:v3.5.13
ExecStartPre=/usr/bin/docker run -d \
--name etcd \
--network host \
--env-file /etc/etcd/etcd.env \
--user 232:232 \
--volume /etc/ssl/etcd:/etc/ssl/certs:ro \
--volume /var/lib/etcd:/var/lib/etcd:rw \
$${ETCD_IMAGE}
ExecStart=docker logs -f etcd
ExecStop=docker stop etcd
ExecStopPost=docker rm etcd
Restart=always
RestartSec=10s
TimeoutStartSec=0
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
- name: docker.service
enabled: true
- name: locksmithd.service
mask: true
- name: wait-for-dns.service
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
enabled: true
contents: |
[Unit]
Description=Kubelet (System Container)
Requires=docker.service
After=docker.service
Wants=rpc-statd.service
[Service]
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.30.0
ExecStartPre=/bin/mkdir -p /etc/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /var/lib/calico
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=/usr/bin/docker run -d \
--name kubelet \
--privileged \
--pid host \
--network host \
-v /etc/cni/net.d:/etc/cni/net.d:ro \
-v /etc/kubernetes:/etc/kubernetes:ro \
-v /etc/machine-id:/etc/machine-id:ro \
-v /usr/lib/os-release:/etc/os-release:ro \
-v /lib/modules:/lib/modules:ro \
-v /run:/run \
-v /sys/fs/cgroup:/sys/fs/cgroup \
-v /var/lib/calico:/var/lib/calico:ro \
-v /var/lib/containerd:/var/lib/containerd \
-v /var/lib/kubelet:/var/lib/kubelet:rshared \
-v /var/log:/var/log \
-v /opt/cni/bin:/opt/cni/bin \
$${KUBELET_IMAGE} \
Enable Kubelet TLS bootstrap and NodeRestriction * Enable bootstrap token authentication on kube-apiserver * Generate the bootstrap.kubernetes.io/token Secret that may be used as a bootstrap token * Generate a bootstrap kubeconfig (with a bootstrap token) to be securely distributed to nodes. Each Kubelet will use the bootstrap kubeconfig to authenticate to kube-apiserver as `system:bootstrappers` and send a node-unique CSR for kube-controller-manager to automatically approve to issue a Kubelet certificate and kubeconfig (expires in 72 hours) * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the `system:node-bootstrapper` ClusterRole * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the csr nodeclient ClusterRole * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the csr selfnodeclient ClusterRole * Enable NodeRestriction admission controller to limit the scope of Node or Pod objects a Kubelet can modify to those of the node itself * Ability for a Kubelet to delete its Node object is retained as preemptible nodes or those in auto-scaling instance groups need to be able to remove themselves on shutdown. This need continues to have precedence over any risk of a node deleting itself maliciously Security notes: 1. Issued Kubelet certificates authenticate as user `system:node:NAME` and group `system:nodes` and are limited in their authorization to perform API operations by Node authorization and NodeRestriction admission. Previously, a Kubelet's authorization was broader. This is the primary security motivation. 2. The bootstrap kubeconfig credential has the same sensitivity as the previous generated TLS client-certificate kubeconfig. It must be distributed securely to nodes. Its compromise still allows an attacker to obtain a Kubelet kubeconfig 3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers a slight security improvement. * An attacker who obtains the kubeconfig can likely obtain the bootstrap kubeconfig as well, to obtain the ability to renew their access * A compromised bootstrap kubeconfig could plausibly be handled by replacing the bootstrap token Secret, distributing the token to new nodes, and expiration. Whereas a compromised TLS-client certificate kubeconfig can't be revoked (no CRL). However, replacing a bootstrap token can be impractical in real cluster environments, so the limited lifetime is mostly a theoretical benefit. * Cluster CSR objects are visible via kubectl which is nice 4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet clients have more identity information, which can improve the utility of audits and future features Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/ Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/185
2020-04-26 01:50:51 +02:00
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--config=/etc/kubernetes/kubelet.yaml \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock \
Enable Kubelet TLS bootstrap and NodeRestriction * Enable bootstrap token authentication on kube-apiserver * Generate the bootstrap.kubernetes.io/token Secret that may be used as a bootstrap token * Generate a bootstrap kubeconfig (with a bootstrap token) to be securely distributed to nodes. Each Kubelet will use the bootstrap kubeconfig to authenticate to kube-apiserver as `system:bootstrappers` and send a node-unique CSR for kube-controller-manager to automatically approve to issue a Kubelet certificate and kubeconfig (expires in 72 hours) * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the `system:node-bootstrapper` ClusterRole * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the csr nodeclient ClusterRole * Add ClusterRoleBinding for bootstrap token subjects (`system:bootstrappers`) to have the csr selfnodeclient ClusterRole * Enable NodeRestriction admission controller to limit the scope of Node or Pod objects a Kubelet can modify to those of the node itself * Ability for a Kubelet to delete its Node object is retained as preemptible nodes or those in auto-scaling instance groups need to be able to remove themselves on shutdown. This need continues to have precedence over any risk of a node deleting itself maliciously Security notes: 1. Issued Kubelet certificates authenticate as user `system:node:NAME` and group `system:nodes` and are limited in their authorization to perform API operations by Node authorization and NodeRestriction admission. Previously, a Kubelet's authorization was broader. This is the primary security motivation. 2. The bootstrap kubeconfig credential has the same sensitivity as the previous generated TLS client-certificate kubeconfig. It must be distributed securely to nodes. Its compromise still allows an attacker to obtain a Kubelet kubeconfig 3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers a slight security improvement. * An attacker who obtains the kubeconfig can likely obtain the bootstrap kubeconfig as well, to obtain the ability to renew their access * A compromised bootstrap kubeconfig could plausibly be handled by replacing the bootstrap token Secret, distributing the token to new nodes, and expiration. Whereas a compromised TLS-client certificate kubeconfig can't be revoked (no CRL). However, replacing a bootstrap token can be impractical in real cluster environments, so the limited lifetime is mostly a theoretical benefit. * Cluster CSR objects are visible via kubectl which is nice 4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet clients have more identity information, which can improve the utility of audits and future features Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/ Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/185
2020-04-26 01:50:51 +02:00
--kubeconfig=/var/lib/kubelet/kubeconfig \
--node-labels=node.kubernetes.io/controller="true" \
--register-with-taints=node-role.kubernetes.io/controller=:NoSchedule
ExecStart=docker logs -f kubelet
ExecStop=docker stop kubelet
ExecStopPost=docker rm kubelet
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: bootstrap.service
contents: |
[Unit]
Description=Kubernetes control plane
Wants=docker.service
After=docker.service
ConditionPathExists=!/opt/bootstrap/bootstrap.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootstrap
Environment=KUBELET_IMAGE=quay.io/poseidon/kubelet:v1.30.0
ExecStart=/usr/bin/docker run \
-v /etc/kubernetes/pki:/etc/kubernetes/pki:ro \
-v /opt/bootstrap/assets:/assets:ro \
-v /opt/bootstrap/apply:/apply:ro \
--entrypoint=/apply \
$${KUBELET_IMAGE}
ExecStartPost=/bin/touch /opt/bootstrap/bootstrap.done
[Install]
WantedBy=multi-user.target
storage:
directories:
- path: /var/lib/etcd
mode: 0700
overwrite: true
files:
- path: /etc/kubernetes/kubeconfig
mode: 0644
contents:
inline: |
${kubeconfig}
- path: /etc/kubernetes/kubelet.yaml
mode: 0644
contents:
inline: |
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: /etc/kubernetes/ca.crt
authorization:
mode: Webhook
cgroupDriver: systemd
clusterDNS:
- ${cluster_dns_service_ip}
clusterDomain: ${cluster_domain_suffix}
healthzPort: 0
rotateCertificates: true
shutdownGracePeriod: 45s
shutdownGracePeriodCriticalPods: 30s
staticPodPath: /etc/kubernetes/manifests
readOnlyPort: 0
resolvConf: /run/systemd/resolve/resolv.conf
volumePluginDir: /var/lib/kubelet/volumeplugins
Introduce cluster creation without local writes to asset_dir * Allow generated assets (TLS materials, manifests) to be securely distributed to controller node(s) via file provisioner (i.e. ssh-agent) as an assets bundle file, rather than relying on assets being locally rendered to disk in an asset_dir and then securely distributed * Change `asset_dir` from required to optional. Left unset, asset_dir defaults to "" and no assets will be written to files on the machine that runs terraform apply * Enhancement: Managed cluster assets are kept only in Terraform state, which supports different backends (GCS, S3, etcd, etc) and optional encryption. terraform apply accesses state, runs in-memory, and distributes sensitive materials to controllers without making use of local disk (simplifies use in CI systems) * Enhancement: Improve asset unpack and layout process to position etcd certificates and control plane certificates more cleanly, without unneeded secret materials Details: * Terraform file provisioner support for distributing directories of contents (with unknown structure) has been limited to reading from a local directory, meaning local writes to asset_dir were required. https://github.com/poseidon/typhoon/issues/585 discusses the problem and newer or upcoming Terraform features that might help. * Observation: Terraform provisioner support for single files works well, but iteration isn't viable. We're also constrained to Terraform language features on the apply side (no extra plugins, no shelling out) and CoreOS / Fedora tools on the receive side. * Take a map representation of the contents that would have been splayed out in asset_dir and pack/encode them into a single file format devised for easy unpacking. Use an awk one-liner on the receive side to unpack. In pratice, this has worked well and its rather nice that a single assets file is transferred by file provisioner (all or none) Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162
2019-12-05 07:10:55 +01:00
- path: /opt/bootstrap/layout
mode: 0544
contents:
inline: |
#!/bin/bash -e
mkdir -p -- auth tls/etcd tls/k8s static-manifests manifests/coredns manifests-networking
awk '/#####/ {filename=$2; next} {print > filename}' assets
mkdir -p /etc/ssl/etcd/etcd
mkdir -p /etc/kubernetes/pki
Introduce cluster creation without local writes to asset_dir * Allow generated assets (TLS materials, manifests) to be securely distributed to controller node(s) via file provisioner (i.e. ssh-agent) as an assets bundle file, rather than relying on assets being locally rendered to disk in an asset_dir and then securely distributed * Change `asset_dir` from required to optional. Left unset, asset_dir defaults to "" and no assets will be written to files on the machine that runs terraform apply * Enhancement: Managed cluster assets are kept only in Terraform state, which supports different backends (GCS, S3, etcd, etc) and optional encryption. terraform apply accesses state, runs in-memory, and distributes sensitive materials to controllers without making use of local disk (simplifies use in CI systems) * Enhancement: Improve asset unpack and layout process to position etcd certificates and control plane certificates more cleanly, without unneeded secret materials Details: * Terraform file provisioner support for distributing directories of contents (with unknown structure) has been limited to reading from a local directory, meaning local writes to asset_dir were required. https://github.com/poseidon/typhoon/issues/585 discusses the problem and newer or upcoming Terraform features that might help. * Observation: Terraform provisioner support for single files works well, but iteration isn't viable. We're also constrained to Terraform language features on the apply side (no extra plugins, no shelling out) and CoreOS / Fedora tools on the receive side. * Take a map representation of the contents that would have been splayed out in asset_dir and pack/encode them into a single file format devised for easy unpacking. Use an awk one-liner on the receive side to unpack. In pratice, this has worked well and its rather nice that a single assets file is transferred by file provisioner (all or none) Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162
2019-12-05 07:10:55 +01:00
mv tls/etcd/{peer*,server*} /etc/ssl/etcd/etcd/
mv tls/etcd/etcd-client* /etc/kubernetes/pki/
Introduce cluster creation without local writes to asset_dir * Allow generated assets (TLS materials, manifests) to be securely distributed to controller node(s) via file provisioner (i.e. ssh-agent) as an assets bundle file, rather than relying on assets being locally rendered to disk in an asset_dir and then securely distributed * Change `asset_dir` from required to optional. Left unset, asset_dir defaults to "" and no assets will be written to files on the machine that runs terraform apply * Enhancement: Managed cluster assets are kept only in Terraform state, which supports different backends (GCS, S3, etcd, etc) and optional encryption. terraform apply accesses state, runs in-memory, and distributes sensitive materials to controllers without making use of local disk (simplifies use in CI systems) * Enhancement: Improve asset unpack and layout process to position etcd certificates and control plane certificates more cleanly, without unneeded secret materials Details: * Terraform file provisioner support for distributing directories of contents (with unknown structure) has been limited to reading from a local directory, meaning local writes to asset_dir were required. https://github.com/poseidon/typhoon/issues/585 discusses the problem and newer or upcoming Terraform features that might help. * Observation: Terraform provisioner support for single files works well, but iteration isn't viable. We're also constrained to Terraform language features on the apply side (no extra plugins, no shelling out) and CoreOS / Fedora tools on the receive side. * Take a map representation of the contents that would have been splayed out in asset_dir and pack/encode them into a single file format devised for easy unpacking. Use an awk one-liner on the receive side to unpack. In pratice, this has worked well and its rather nice that a single assets file is transferred by file provisioner (all or none) Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162
2019-12-05 07:10:55 +01:00
chown -R etcd:etcd /etc/ssl/etcd
chmod -R 500 /etc/ssl/etcd
chmod -R 700 /var/lib/etcd
mv auth/* /etc/kubernetes/pki/
mv tls/k8s/* /etc/kubernetes/pki/
mkdir -p /etc/kubernetes/manifests
mv static-manifests/* /etc/kubernetes/manifests/
mkdir -p /opt/bootstrap/assets
mv manifests /opt/bootstrap/assets/manifests
mv manifests-networking/* /opt/bootstrap/assets/manifests/ 2>/dev/null || true
rm -rf assets auth static-manifests tls manifests-networking manifests
- path: /opt/bootstrap/apply
mode: 0544
contents:
inline: |
#!/bin/bash -e
export KUBECONFIG=/etc/kubernetes/pki/admin.conf
until kubectl version; do
echo "Waiting for static pod control plane"
sleep 5
done
until kubectl apply -f /assets/manifests -R; do
echo "Retry applying manifests"
sleep 5
done
- path: /etc/systemd/logind.conf.d/inhibitors.conf
contents:
inline: |
[Login]
InhibitDelayMaxSec=45s
- path: /etc/sysctl.d/max-user-watches.conf
mode: 0644
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/etcd/etcd.env
mode: 0644
contents:
inline: |
ETCD_NAME=${etcd_name}
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
ETCD_LISTEN_METRICS_URLS=http://0.0.0.0:2381
ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}
ETCD_STRICT_RECONFIG_CHECK=true
ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt
ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt
ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true
passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_key}"