Commit Graph

565 Commits

Author SHA1 Message Date
Dalton Hubble d7061020ba Update Kubernetes from v1.16.2 to v1.16.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163
2019-11-13 13:05:15 -08:00
Dalton Hubble 2c163503f1 Update etcd from v3.4.2 to v3.4.3
* etcd v3.4.3 builds with Go v1.12.12 instead of v1.12.9
and adds a few minor metrics fixes
* https://github.com/etcd-io/etcd/compare/v3.4.2...v3.4.3
2019-11-07 11:41:01 -08:00
Dalton Hubble 0034a15711 Update Calico from v3.10.0 to v3.10.1
* https://docs.projectcalico.org/v3.10/release-notes/
2019-11-07 11:38:32 -08:00
Dalton Hubble 4775e9d0f7 Upgrade Calico v3.9.2 to v3.10.0
* Allow advertising Kubernetes service ClusterIPs to BGPPeer
routers via a BGPConfiguration
* Improve EdgeRouter docs about routes and BGP
* https://docs.projectcalico.org/v3.10/release-notes/
* https://docs.projectcalico.org/v3.10/networking/advertise-service-ips
2019-10-27 14:13:41 -07:00
Dalton Hubble d418045929 Switch kube-proxy from iptables mode to ipvs mode
* Kubernetes v1.11 considered kube-proxy IPVS mode GA
* Many problems were found #321
* Since then, major blockers seem to have been addressed
2019-10-27 00:37:41 -07:00
Dalton Hubble 24fc440d83 Update Kubernetes from v1.16.1 to v1.16.2
* Update Calico from v3.9.1 to v3.9.2
2019-10-15 22:42:52 -07:00
Dalton Hubble a6702573a2 Update etcd from v3.4.1 to v3.4.2
* https://github.com/etcd-io/etcd/releases/tag/v3.4.2
2019-10-15 00:06:15 -07:00
Dalton Hubble d874bdd17d Update bootstrap module control plane manifests and type constraints
* Remove unneeded control plane flags that correspond to defaults
* Adopt Terraform v0.12 type constraints in bootstrap module
2019-10-06 21:09:30 -07:00
Dalton Hubble 5b9dab6659 Introduce list of detail objects for bare-metal machines
* Define bare-metal `controllers` and `workers` as a complex type
list(object{name=string, mac=string, domain=string}) to allow
clusters with many machines to be defined more cleanly
* Remove `controller_names` list variable
* Remove `controller_macs` list variable
* Remove `controller_domains` list variable
* Remove `worker_names` list variable
* Remove `worker_macs` list variable
* Remove `worker_domains` list variable
2019-10-06 20:22:45 -07:00
Dalton Hubble 15c4b793c3 Use new Fedora CoreOS kernel/initrd/raw asset names
* Fedora CoreOS changed the kernel, initramfs, and raw
image asset download paths and names in 30.20191002.0
2019-10-06 17:31:21 -07:00
Dalton Hubble 36ed53924f Add stricter types for bare-metal modules
* Review variables available in bare-metal kubernetes modules
for Container Linux and Fedora CoreOS
* Deprecate cluster_domain_suffix variable
* Remove deprecated container_linux_oem variable
2019-10-06 17:18:50 -07:00
Dalton Hubble 1c5ed84fc2 Update Kubernetes from v1.16.0 to v1.16.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161
2019-10-02 21:31:55 -07:00
Dalton Hubble a6de245d8a Rename bootkube.tf to bootstrap.tf
* Typhoon no longer uses the bootkube project
2019-09-29 11:30:49 -07:00
Dalton Hubble 96afa6a531 Update Calico from v3.8.2 to v3.9.1
* https://docs.projectcalico.org/v3.9/release-notes/
2019-09-29 11:22:53 -07:00
Dalton Hubble 3e34fb075b Update etcd from v3.4.0 to v3.4.1
* https://github.com/etcd-io/etcd/releases/tag/v3.4.1
2019-09-28 15:09:57 -07:00
Dalton Hubble 8703f2c3c5 Fix missing comma separator on bare-metal and DO
* Introduced in bare-metal and DigitalOcean in #544
while addressing possible ordering race, but after
the v1.16 upgrade validation
2019-09-23 11:05:26 -07:00
Dalton Hubble 5b06e0e869 Organize and cleanup Kubelet ExecStartPre
* Sort Kubelet ExecStartPre mkdir commands
* Remove unused inactive-manifests and checkpoint-secrets
directories (were used by bootkube self-hosting)
2019-09-19 00:38:34 -07:00
Dalton Hubble b951aca66f Create /etc/kubernetes/manifests before asset copy
* Fix issue (present since bootkube->bootstrap switch) where
controller asset copy could fail if /etc/kubernetes/manifests
wasn't created in time on platforms using path activation for
the Kubelet (observed on DigitalOcean, also possible on
bare-metal)
2019-09-19 00:30:53 -07:00
Dalton Hubble 9da3725738 Update Kubernetes from v1.15.3 to v1.16.0
* Drop `node-role.kubernetes.io/master` and
`node-role.kubernetes.io/node` node labels
* Kubelet (v1.16) now rejects the node labels used
in the kubectl get nodes ROLES output
* https://github.com/kubernetes/kubernetes/issues/75457
2019-09-18 22:53:06 -07:00
Dalton Hubble fd12f3612b Rename CA organization from bootkube to typhoon
* Rename the organization in generated CA certificates from
bootkube to typhoon. Avoid confusion with the bootkube project
* https://github.com/poseidon/terraform-render-bootstrap/pull/149
2019-09-14 16:56:53 -07:00
Dalton Hubble 96b646cf6d Rename bootkube modules to bootstrap
* Rename render module from bootkube to bootstrap. Avoid
confusion with the kubernetes-incubator/bootkube tool since
it is no longer used
* Use the poseidon/terraform-render-bootstrap Terraform module
(formerly poseidon/terraform-render-bootkube)
* https://github.com/poseidon/terraform-render-bootkube/pull/149
2019-09-14 16:24:32 -07:00
Dalton Hubble b15c60fa2f Update CHANGES for control plane static pod switch
* Remove old references to bootkube / self-hosted
2019-09-09 22:48:48 -07:00
Dalton Hubble db947537d1 Migrate GCP, DO, Azure to static pod control plane
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
2019-09-09 22:37:31 -07:00
Dalton Hubble 21632c6674 Migrate Container Linux bare-metal to static pod control plane
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
2019-09-09 22:37:31 -07:00
Dalton Hubble 74780fb09f Migrate Fedora CoreOS bare-metal to static pod control plane
* Run a kube-apiserver, kube-scheduler, and kube-controller-manager
static pod on each controller node. Previously, kube-apiserver was
self-hosted as a DaemonSet across controllers and kube-scheduler
and kube-controller-manager were a Deployment (with 2 or
controller_count many replicas).
* Remove bootkube bootstrap and pivot to self-hosted
* Remove pod-checkpointer manifests (no longer needed)
2019-09-09 22:37:31 -07:00
Dalton Hubble c20683067d Update etcd from v3.3.15 to v3.4.0
* https://github.com/etcd-io/etcd/releases/tag/v3.4.0
2019-09-08 15:32:49 -07:00
Dalton Hubble e8d586f3b3 Enable QoS on Fedora CoreOS controllers
* Kubelet race should be fixed in Kubernetes v1.15.1
* https://github.com/kubernetes/kubernetes/issues/79046
* Reverts temporary mitigation https://github.com/poseidon/typhoon/pull/515
2019-09-04 21:09:45 -07:00
Dalton Hubble 4d5f962d76 Update CoreDNS from v1.5.0 to v1.6.2
* https://coredns.io/2019/06/26/coredns-1.5.1-release/
* https://coredns.io/2019/07/03/coredns-1.5.2-release/
* https://coredns.io/2019/07/28/coredns-1.6.0-release/
* https://coredns.io/2019/08/02/coredns-1.6.1-release/
* https://coredns.io/2019/08/13/coredns-1.6.2-release/
2019-08-31 15:57:42 -07:00
Dalton Hubble c42139beaa Update etcd from v3.3.14 to v3.3.15
* No functional changes, just changes to vendoring tools
(go modules -> glide). Still, update to v3.3.15 anyway
* https://github.com/etcd-io/etcd/compare/v3.3.14...v3.3.15
2019-08-19 15:05:21 -07:00
Dalton Hubble 35c2763ab0 Update Kubernetes from v1.15.2 to v1.15.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md/#v1153
2019-08-19 14:49:24 -07:00
Dalton Hubble 8f412e2f09 Update etcd from v3.3.13 to v3.3.14
* https://github.com/etcd-io/etcd/releases/tag/v3.3.14
2019-08-18 21:05:06 -07:00
Dalton Hubble 3c3708d58e Update Calico from v3.8.1 to v3.8.2
* https://docs.projectcalico.org/v3.8/release-notes/
2019-08-16 15:38:23 -07:00
Dalton Hubble 2227f2cc62 Update Kubernetes from v1.15.1 to v1.15.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1152
2019-08-05 08:48:57 -07:00
Dalton Hubble dcd6733649 Update Calico from v3.8.0 to v3.8.1
* https://docs.projectcalico.org/v3.8/release-notes/
2019-07-27 15:31:13 -07:00
Dalton Hubble 1409bc62d8 Remove download_protocol variable from Fedora CoreOS
* For Fedora CoreOS, only HTTPS downloads are available.
Any iPXE firmware must be compiled to support TLS fetching.
* For Container Linux, using public kernel/initramfs images
defaults to using HTTPS, but can be set to HTTP for iPXE
firmware that hasn't been custom compiled to support TLS
2019-07-27 15:23:34 -07:00
Dalton Hubble e0c7676a15 Update Kubernetes from v1.15.0 to v1.15.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#downloads-for-v1151
2019-07-19 01:21:08 -07:00
Dalton Hubble 339e323491 Temporarily turn off QoS cgroups on Fedora CoreOS controllers
* Kubelets can hit the ContainerManager Delegation issue and fail
to start (noted in 72c94f1c6). Its unclear why this occurs only
to some Kubelets (possibly an ordering concern)
* QoS cgroups remain a goal
* When a controller node is affected, bootstrapping fails, which
makes other development harder. Temporarily disable QoS on
controllers only. This should safeguard bring-up and hopefully
still allow the issue to occur on some workers for debugging
2019-07-19 00:17:03 -07:00
Dalton Hubble 5fdeb9bc78 Adjust Fedora CoreOS image locations
* Use the xz compressed images published by Fedora testing,
instead of gzippped tarballs. This is possible because the
initramfs now supports xz and coreos-installer 0.8 was added
* Separate bios and uefi raw images are no longer needed
2019-07-18 01:15:29 -07:00
Dalton Hubble 155bffa773 Add docs for Fedora CoreOS AWS and bare-metal 2019-07-18 00:55:22 -07:00
Dalton Hubble 72c94f1c6a Add Kubelet System Container and bootkube bootstrap
* First semi-working cluster using 30.307-metal-bios
* Enable CPU, Memory, and BlockIO accounting
* Mount /var/lib/kubelet with `rshare` so mounted tmpfs Secrets
(e.g. serviceaccount's) are visible within appropriate containers
* SELinux relabel /etc/kubernetes so install-cni init containers
can write the CNI config to the host /etc/kubernetes/net.d
* SELinux relabel /var/lib/kubelet so ConfigMaps can be read
by containers
* SELinux relabel /opt/cni/bin so install-cni containers can
write CNI binaries to the host
* Set net.ipv4_conf.all.rp_filter to 1 (not 2, loose mode) to
satisfy Calico requirement
* Enable the QoS cgroup hierarchy for pod workloads (kubepods,
burstable, besteffort). Mount /sys/fs/cgroup and
/sys/fs/cgroup/systemd into the Kubelet. Its still rather racy
whether Kubelet will fail on ContainerManager Delegation
2019-07-18 00:55:22 -07:00
Dalton Hubble aab14c5573 Run etcd-member.service across controllers
* Running the etcd container with NOTIFY_SOCKET mounted
(to use systemd Type=notify) causes podman to hang so
for now just use exec
* https://github.com/opencontainers/runc/pull/1807
2019-07-18 00:55:22 -07:00
Dalton Hubble eb92f67125 Start prototype of Fedora CoreOS on bare-metal
* Use terraform-provider-ct v0.4.0 with Fedora CoreOS Config
support (not yet released)
2019-07-18 00:55:22 -07:00
Dalton Hubble dfa6bcfecf Relax terraform-provider-ct version constraint
* Allow updating terraform-provider-ct to any release
beyond v0.3.2, but below v1.0. This relaxes the prior
constraint that allowed only v0.3.y provider versions
2019-07-16 22:07:37 -07:00
Dalton Hubble 9e91d7f011 Upgrade Calico from v3.7.4 to v3.8.0
* Enable CNI bandwidth plugin for traffic shaping
* https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#support-traffic-shaping
2019-07-11 21:01:41 -07:00
Dalton Hubble 69d064bfdf Run kube-apiserver with lower privilege user (nobody)
* Run kube-apiserver as a non-root user (nobody). User
no longer needs to bind low number ports.
* On most platforms, the kube-apiserver load balancer listens
on 6443 and fronts controllers with kube-apiserver pods using
port 6443. Google Cloud TCP proxy load balancers cannot listen
on 6443. However, GCP's load balancer can be made to listen on
443, while kube-apiserver uses 6443 across all platforms.
2019-07-08 20:52:00 -07:00
Dalton Hubble 8d373b5850 Update Calico from v3.7.3 to v3.7.4
* https://docs.projectcalico.org/v3.7/release-notes/
2019-07-02 20:18:02 -07:00
Dalton Hubble fff7cc035d Remove Fedora Atomic modules
* Typhoon for Fedora Atomic was deprecated in March 2019
* https://typhoon.psdn.io/announce/#march-27-2019
2019-06-23 13:40:51 -07:00
Dalton Hubble 408e60075a Update Kubernetes from v1.14.3 to v1.15.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1150
* Remove docs referring to possible v1.14.4 release
2019-06-23 13:12:18 -07:00
Dalton Hubble 79d910821d Configure Kubelet cgroup-driver for Flatcar Linux Edge
* For Container Linux or Flatcar Linux alpha/beta/stable,
continue using the `cgroupfs` driver
* For Fedora Atomic, continue using the `systemd` driver
* For Flatcar Linux Edge, use the `systemd` driver
2019-06-22 23:38:42 -07:00
Dalton Hubble 5c4486f57b Allow using Flatcar Linux Edge on bare-metal and AWS
* On AWS, use Flatcar Linux Edge by setting `os_image` to
"flatcar-edge"
* On bare-metal, Flatcar Linux Edge by setting `os_channel` to
"flatcar-edge"
2019-06-22 23:38:42 -07:00
Dalton Hubble 21fb632e90 Update Calico from v3.7.2 to v3.7.3
* https://docs.projectcalico.org/v3.7/release-notes/
2019-06-13 23:54:20 -07:00
Dalton Hubble db36959178 Migrate bare-metal module Terraform v0.11 to v0.12
* Replace v0.11 bracket type hints with Terraform v0.12 list expressions
* Use expression syntax instead of interpolated strings, where suggested
* Update bare-metal tutorial
* Define `clc_snippets` type constraint map(list(string))
* Define Terraform and plugin version requirements in versions.tf
  * Require matchbox ~> 0.3.0 to support Terraform v0.12
  * Require ct ~> 0.3.2 to support Terraform v0.12
2019-06-06 09:51:21 -07:00
Dalton Hubble 0ccb2217b5 Update Kubernetes from v1.14.2 to v1.14.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#v1143
2019-05-31 01:08:32 -07:00
Dalton Hubble 2a71cba0e3 Update CoreDNS from v1.3.1 to v1.5.0
* Add `ready` plugin to improve readinessProbe
* https://coredns.io/2019/04/06/coredns-1.5.0-release/
2019-05-27 00:11:52 -07:00
Dalton Hubble 6e4cf65c4c Fix terraform-render-bootkube to remove trailing slash
* Fix to remove a trailing slash that was erroneously introduced
in the scripting that updated from v1.14.1 to v1.14.2
* Workaround before this fix was to re-run `terraform init`
2019-05-22 18:29:11 +02:00
Dalton Hubble da97bd4f12 Update Kubernetes from v1.14.1 to v1.14.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#v1142
2019-05-17 13:09:15 +02:00
Dalton Hubble f62286b677 Update Calico from v3.7.0 to v3.7.2
* https://docs.projectcalico.org/v3.7/release-notes/
2019-05-17 12:29:46 +02:00
Dalton Hubble af18296bc5 Change flannel port from 8472 to 4789
* Change flannel port from the kernel default 8472 to the
IANA assigned VXLAN port 4789
* Update firewall rules or security groups for VXLAN
* Why now? Calico now offers its own VXLAN backend so
standardizing on the IANA port will simplify config
* https://github.com/coreos/flannel/blob/master/Documentation/backends.md#vxlan
2019-05-06 21:58:10 -07:00
Dalton Hubble 09e0230111 Upgrade Calico from v3.6.1 to v3.7.0
* https://docs.projectcalico.org/v3.7/release-notes/
* https://github.com/poseidon/terraform-render-bootkube/pull/131
2019-05-06 00:44:15 -07:00
Dalton Hubble feb6192aac Update etcd from v3.3.12 to v3.3.13 on Container Linux
* Skip updating etcd for Fedora Atomic clusters, now that
Fedora Atomic has been deprecated
2019-05-04 12:55:42 -07:00
Dalton Hubble 452253081b Update Kubernetes from v1.14.0 to v1.14.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#changelog-since-v1140
2019-04-09 21:47:23 -07:00
Dalton Hubble be29f52039 Add enable_aggregation option (defaults to false)
* Add an `enable_aggregation` variable to enable the kube-apiserver
aggregation layer for adding extension apiservers to clusters
* Aggregation is **disabled** by default. Typhoon recommends you not
enable aggregation. Consider whether less invasive ways to achieve your
goals are possible and whether those goals are well-founded
* Enabling aggregation and extension apiservers increases the attack
surface of a cluster and makes extensions a part of the control plane.
Admins must scrutinize and trust any extension apiserver used.
* Passing a v1.14 CNCF conformance test requires aggregation be enabled.
Having an option for aggregation keeps compliance, but retains the
stricter security posture on default clusters
2019-04-07 12:00:38 -07:00
Dalton Hubble 5271e410eb Update Kubernetes from v1.13.5 to v1.14.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#v1140
2019-04-07 00:15:59 -07:00
Dalton Hubble b3ec5f73e3 Update Calico from v3.6.0 to v3.6.1
* https://docs.projectcalico.org/v3.6/release-notes/
2019-03-31 17:43:43 -07:00
Dalton Hubble 46196af500 Remove Haswell minimum CPU platform requirement
* Google Cloud API implements `min_cpu_platform` to mean
"use exactly this CPU"
* Fix error creating clusters in newer regions lacking Haswell
platform (e.g. europe-west2) (#438)
* Reverts #405, added in v1.13.4
* Original goal of ignoring old Ivy/Sandy bridge CPUs in older regions
will be achieved shortly anyway. Google Cloud is deprecating those CPUs
in April 2019
* https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform#how_selecting_a_minimum_cpu_platform_works
2019-03-27 19:51:32 -07:00
Dalton Hubble 4fea526ebf Update Kubernetes from v1.13.4 to v1.13.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1135
2019-03-25 21:43:47 -07:00
Dalton Hubble 1feefbe9c6 Update Calico from v3.5.2 to v3.6.0
* Add calico-ipam CRDs and RBAC permissions
* Switch IPAM from host-local to calico-ipam
  * `calico-ipam` subnets `ippools` (defaults to pod CIDR) into
`ipamblocks` (defaults to /26, but set to /24 in Typhoon)
  * `host-local` subnets the pod CIDR based on the node PodCIDR
field (set via kube-controller-manager as /24's)
* Create a custom default IPv4 IPPool to ensure the block size
is kept at /24 to allow 110 pods per node (Kubernetes default)
* Retaining host-local was slightly preferred, but Calico v3.6
is migrating all usage to calico-ipam. The codepath that skipped
calico-ipam for KDD was removed
*  https://docs.projectcalico.org/v3.6/release-notes/
2019-03-19 22:49:56 -07:00
Dalton Hubble 2019177b6b Fix implicit map assignments to be explicit
* Terraform v0.12 will require map assignments be explicit,
part of v0.12 readiness
2019-03-12 01:19:54 -07:00
Dalton Hubble 9493ed3b1d Change default iPXE kernel/initrd download from HTTP to HTTPS
* Require an iPXE-enabled network boot environment with support for
TLS downloads. PXE clients must chainload to iPXE firmware compiled
with `DOWNLOAD_PROTO_HTTPS` enabled ([crypto](https://ipxe.org/crypto))
* iPXE's pre-compiled firmware binaries do _not_ enable HTTPS. Admins
should build iPXE from source with support enabled
* Affects the Container Linux and Flatcar Linux install profiles that
pull from public downloads. No effect when cached_install=true
or using Fedora Atomic, as those download from Matchbox
* Add `download_protocol` variable. Recognizing boot firmware TLS
support is difficult in some environments, set the protocol to "http"
for the old behavior (discouraged)
2019-03-09 23:23:40 -08:00
Dalton Hubble deec512c14 Resolve in-addr.arpa and ip6.arpa zones with CoreDNS kubernetes plugin
* Resolve in-addr.arpa and ip6.arpa DNS PTR requests for Kubernetes
service IPs and pod IPs
* Previously, CoreDNS was configured to resolve in-addr.arpa PTR
records for service IPs (but not pod IPs)
2019-03-04 23:03:00 -08:00
Dalton Hubble f598307998 Update Kubernetes from v1.13.3 to v1.13.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1134
2019-02-28 22:47:43 -08:00
Dalton Hubble 73ae5d5649 Update Calico from v3.5.1 to v3.5.2
* https://docs.projectcalico.org/v3.5/releases/
2019-02-25 21:23:13 -08:00
Dalton Hubble 42d7222f3d Add a readiness probe to CoreDNS
* https://github.com/poseidon/terraform-render-bootkube/pull/115
2019-02-23 13:25:23 -08:00
Dalton Hubble 4294bd0292 Assign Pod Priority classes to critical cluster and node components
* Assign pod priorityClassNames to critical cluster and node
components (higher is higher priority) to inform node out-of-resource
eviction order and scheduler preemption and scheduling order
* Priority Admission Controller has been enabled since Typhoon
v1.11.1
2019-02-19 22:21:39 -08:00
Dalton Hubble 584088397c Update etcd from v3.3.11 to v3.3.12
* https://github.com/etcd-io/etcd/releases/tag/v3.3.12
2019-02-09 11:54:54 -08:00
Dalton Hubble 0200058e0e Update Calico from v3.5.0 to v3.5.1
* Fix in confd https://github.com/projectcalico/confd/pull/205
2019-02-09 11:49:31 -08:00
Dalton Hubble ccd96c37da Update Kubernetes from v1.13.2 to v1.13.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1133
2019-02-01 23:26:13 -08:00
Dalton Hubble 244a1a601a Switch CoreDNS to use the forward plugin instead of proxy
* Use the forward plugin to forward to upstream resolvers, instead
of the proxy plugin. The forward plugin is reported to be a faster
alternative since it can re-use open sockets
* https://coredns.io/explugins/forward/
* https://coredns.io/plugins/proxy/
* https://github.com/kubernetes/kubernetes/issues/73254
2019-01-30 22:25:23 -08:00
Dalton Hubble 1ab06f69d7 Update flannel from v0.10.0 to v0.11.0
* https://github.com/coreos/flannel/releases/tag/v0.11.0
2019-01-29 21:51:25 -08:00
Dalton Hubble e9659a8539 Update Calico from v3.4.0 to v3.5.0
* https://docs.projectcalico.org/v3.5/releases/
2019-01-27 16:34:30 -08:00
Dalton Hubble f4d3508578 Update CoreDNS from v1.3.0 to v1.3.1
* https://coredns.io/2019/01/13/coredns-1.3.1-release/
2019-01-15 22:50:25 -08:00
Dalton Hubble 7eafa59d8f Fix instance shutdown automatic worker deletion on clouds
* Fix a regression caused by lowering the Kubelet TLS client
certificate to system:nodes group (#100) since dropping
cluster-admin dropped the Kubelet's ability to delete nodes.
* On clouds where workers can scale down (manual terraform apply,
AWS spot termination, Azure low priority deletion), worker shutdown
runs the delete-node.service to remove a node to prevent NotReady
nodes from accumulating
* Allow Kubelets to delete cluster nodes via system:nodes group. Kubelets
acting with system:node and kubelet-delete ClusterRoles is still an
improvement over acting as cluster-admin
2019-01-14 23:27:48 -08:00
Dalton Hubble b74cc8afd2 Update etcd from v3.3.10 to v3.3.11
* https://github.com/etcd-io/etcd/releases/tag/v3.3.11
2019-01-12 14:17:25 -08:00
Dalton Hubble 4d32b79c6f Update Kubernetes from v1.13.1 to v1.13.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1132
2019-01-12 00:00:53 -08:00
Dalton Hubble df4c0ba05d Use HTTPS liveness probes for kube-scheduler and kube-controller-manager
* Disable kube-scheduler and kube-controller-manager HTTP ports
2019-01-09 20:56:50 -08:00
Dalton Hubble bfe0c74793 Enable the certificates.k8s.io API to issue cluster certificates
* System components that require certificates signed by the cluster
CA can submit a CSR to the apiserver, have an administrator inspect
and approve it, and be issued a certificate
* Configure kube-controller-manager to sign Approved CSR's using the
cluster CA private key
* Admins are responsible for approving or denying CSRs, otherwise,
no certificate is issued. Read the Kubernetes docs carefully and
verify the entity making the request and the authorization level
* https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster
2019-01-06 17:33:37 -08:00
Dalton Hubble 60c70797ec Use a single format of the admin kubeconfig
* Use a single admin kubeconfig for initial bootkube bootstrap
and for use by a human admin. Previously, an admin kubeconfig
without a named context was used for bootstrap and direct usage
with KUBECONFIG=path, while one with a named context was used
for `kubectl config use-context` style usage. Confusing.
* Provide the admin kubeconfig via `assets/auth/kubeconfig`,
`assets/auth/CLUSTER-config`, or output `kubeconfig-admin`
2019-01-05 14:57:18 -08:00
Dalton Hubble 6795a753ea Update CoreDNS from v1.2.6 to v1.3.0
* https://coredns.io/2018/12/15/coredns-1.3.0-release/
2019-01-05 13:35:03 -08:00
Dalton Hubble b57273b6f1 Rename internal kube_dns_service_ip to cluster_dns_service_ip
* terraform-render-bootkube module deprecated kube_dns_service_ip
output in favor of cluster_dns_service_ip
* Rename k8s_dns_service_ip to cluster_dns_service_ip for
consistency too
2019-01-05 13:32:03 -08:00
Dalton Hubble 812a1adb49 Use a lower-privilege Kubelet kubeconfig in system:nodes
* Kubelets can use a lower-privilege TLS client certificate with
Org system:nodes and a binding to the system:node ClusterRole
* Admin kubeconfig's continue to belong to Org system:masters to
provide cluster-admin (available in assets/auth/kubeconfig or as
a Terraform output kubeconfig-admin)
* Remove bare-metal output variable kubeconfig
2019-01-05 13:08:56 -08:00
Dalton Hubble 66e1365cc4 Add ServiceAccounts for kube-apiserver and kube-scheduler
* Add ServiceAccounts and ClusterRoleBindings for kube-apiserver
and kube-scheduler
* Remove the ClusterRoleBinding for the kube-system default ServiceAccount
* Rename the CA certificate CommonName for consistency with upstream
2019-01-01 20:16:14 -08:00
Dalton Hubble bcb200186d Add admin kubeconfig as a Terraform output
* May be used to write a local file
2018-12-15 22:52:28 -08:00
Dalton Hubble 479d498024 Update Calico from v3.3.2 to v3.4.0
* https://docs.projectcalico.org/v3.4/releases/
2018-12-15 18:05:16 -08:00
Dalton Hubble 018c5edc25 Update Kubernetes from v1.13.0 to v1.13.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1131
2018-12-15 11:44:57 -08:00
Dalton Hubble ff6ab571f3 Update Calico from v3.3.1 to v3.3.2
* https://docs.projectcalico.org/v3.3/releases/
2018-12-06 22:56:55 -08:00
Dalton Hubble d31f444fcd Update Kubernetes from v1.12.3 to v1.13.0 2018-12-03 20:44:32 -08:00
Dalton Hubble 76d993cdae Add experimental kube-router CNI provider
* Add kube-router for pod networking and NetworkPolicy
as an experiment
* Experiments are not documented or supported in any way,
and may be removed without notice. They have known issues
and aren't enabled without special options.
2018-12-03 19:52:28 -08:00
yokhahn bcce02a9ce Add Kubelet /etc/iscsi and iscsiadm mounts on bare-metal
* Allow using iSCSI with Container Linux bare-metal clusters
* Warning, iSCSI isn't part of Kubernetes conformance and isn't
regularly evaluated
2018-11-28 00:28:46 -08:00
Dalton Hubble 64b4c10418 Improve features and modules list docs
* Remove bullet about isolating workloads on workers, its
now common practice and new users will assume it
* List advanced features available in each module
* Fix erroneous Kubernetes version listing for Google Cloud
Fedora Atomic
2018-11-26 22:58:00 -08:00
Dalton Hubble 5b27d8d889 Update Kubernetes from v1.12.2 to v1.12.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.12.md/#v1123
2018-11-26 21:06:09 -08:00
Dalton Hubble 840b73f9ba Update pod-checkpointer image to query Kubelet secure API
* Updates pod-checkpointer to prefer the Kubelet secure
API (before falling back to the Kubelet read-only API that
is disabled on Typhoon clusters since
https://github.com/poseidon/typhoon/pull/324)
* Previously, pod-checkpointer checkpointed an initial set
of pods during bootstrapping so recovery from power cycling
clusters was unaffected, but logs were noisy
* https://github.com/kubernetes-incubator/bootkube/pull/1027
* https://github.com/kubernetes-incubator/bootkube/pull/1025
2018-11-26 20:24:32 -08:00
Dalton Hubble 915af3c6cc Fix Calico Felix reporting usage data, require opt-in
* Calico Felix has been reporting anonymous usage data about the
version and cluster size, which violates Typhoon's privacy policy
where analytics should be opt-in only
* Add a variable enable_reporting (default: false) to allow opting
in to reporting usage data to Calico (or future components)
2018-11-20 01:03:00 -08:00
Dalton Hubble ea3fc6d2a7 Update CoreDNS from v1.2.4 to v1.2.6
* https://coredns.io/2018/11/05/coredns-1.2.6-release/
2018-11-18 16:45:53 -08:00
Dalton Hubble 56e9a82984 Add flannel resource request and mount only /run/flannel 2018-11-11 20:35:21 -08:00
Dalton Hubble e95b856a22 Enable CoreDNS loop and loadbalance plugins
* loop sends an initial query to detect infinite forwarding
loops in configured upstream DNS servers and fast exit with
an error (its a fatal misconfiguration on the network that
will otherwise cause resolvers to consume memory/CPU until
crashing, masking the problem)
* https://github.com/coredns/coredns/tree/master/plugin/loop
* loadbalance randomizes the ordering of A, AAAA, and MX records
in responses to provide round-robin load balancing (as usual,
clients may still cache responses though)
* https://github.com/coredns/coredns/tree/master/plugin/loadbalance
2018-11-10 17:36:56 -08:00
Dalton Hubble 2b3f61d1bb Update Calico from v3.3.0 to v3.3.1
* Structure Calico and flannel manifests
* Rename kube-flannel mentions to just flannel
2018-11-10 13:37:12 -08:00
Dalton Hubble 8fd2978c31 Update bootkube image version from v0.13.0 to v0.14.0
* https://github.com/kubernetes-incubator/bootkube/releases/tag/v0.14.0
2018-11-06 23:35:11 -08:00
Dalton Hubble 721c847943 Set kube-apiserver kubelet preferred address types
* Prefer InternalIP and ExternalIP over the node's hostname,
to match upstream behavior and kubeadm
* Previously, hostname-override was used to set node names
to internal IP's to work around some cloud providers not
resolving hostnames for instances (e.g. DO droplets)
2018-11-03 22:31:55 -07:00
Dalton Hubble f1da0731d8 Update Kubernetes from v1.12.1 to v1.12.2
* Update CoreDNS from v1.2.2 to v1.2.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.12.md#v1122
* https://coredns.io/2018/10/17/coredns-1.2.4-release/
* https://coredns.io/2018/10/16/coredns-1.2.3-release/
2018-10-27 15:47:57 -07:00
Dalton Hubble d641a058fe Update Calico from v3.2.3 to v3.3.0
* https://docs.projectcalico.org/v3.3/releases/
2018-10-23 20:30:30 -07:00
Dalton Hubble 99a6d5478b Disable Kubelet read-only port 10255
* We can finally disable the Kubelet read-only port 10255!
* Journey: https://github.com/poseidon/typhoon/issues/322#issuecomment-431073073
2018-10-18 21:14:14 -07:00
Dalton Hubble d55bfd5589 Fix CoreDNS AntiAffinity spec to prefer spreading replicas
* Pods were still being scheduled at random due to a typo
2018-10-17 22:19:57 -07:00
Michael Schubert d10620fb58 Add support for Flatcar Linux bare-metal cached_install
* Support bare-metal cached_install=true mode with Flatcar Linux
where assets are fetched from the Matchbox assets cache instead
of from the upstream Flatcar download server
* Skipped in original Flatcar support to keep it simple
https://github.com/poseidon/typhoon/pull/209
2018-10-16 21:15:24 -07:00
Dalton Hubble 9b6113a058 Update Kubernetes from v1.11.3 to v1.12.1
* Mount an empty dir for the controller-manager to work around
https://github.com/kubernetes/kubernetes/issues/68973
* Update coreos/pod-checkpointer to strip affinity from
checkpointed pod manifests. Kubernetes v1.12.0-rc.1 introduced
a default affinity that appears on checkpointed manifests; but
it prevented scheduling and checkpointed pods should not have an
affinity, they're run directly by the Kubelet on the local node
* https://github.com/kubernetes-incubator/bootkube/issues/1001
* https://github.com/kubernetes/kubernetes/pull/68173
2018-10-16 20:28:13 -07:00
Dalton Hubble 5eb4078d68 Add docker/default seccomp to control plane and addons
* Annotate pods, deployments, and daemonsets to start containers
with the Docker runtime's default seccomp profile
* Overrides Kubernetes default behavior which started containers
with seccomp=unconfined
* https://docs.docker.com/engine/security/seccomp/#pass-a-profile-for-a-container
2018-10-16 20:07:29 -07:00
Dalton Hubble 55bb4dfba6 Raise CoreDNS replica count to 2 or more
* Run at least two replicas of CoreDNS to better support
rolling updates (previously, kube-dns had a pod nanny)
* On multi-master clusters, set the CoreDNS replica count
to match the number of masters (e.g. a 3-master cluster
previously used replicas:1, now replicas:3)
* Add AntiAffinity preferred rule to favor distributing
CoreDNS pods across controller nodes nodes
2018-10-13 20:31:29 -07:00
Dalton Hubble 43fe78a2cc Raise scheduler/controller-manager replicas in multi-master
* Continue to ensure scheduler and controller-manager run
at least two replicas to support performing kubectl edits
on single-master clusters (no change)
* For multi-master clusters, set scheduler / controller-manager
replica count to the number of masters (e.g. a 3-master cluster
previously used replicas:2, now replicas:3)
2018-10-13 16:16:29 -07:00
Dalton Hubble 5a283b6443 Update etcd from v3.3.9 to v3.3.10
* https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.3.md#v3310-2018-10-10
2018-10-13 13:14:37 -07:00
Dalton Hubble 7653e511be Update CoreDNS and Calico versions
* Update CoreDNS from 1.1.3 to 1.2.2
* Update Calico from v3.2.1 to v3.2.3
2018-10-02 16:07:48 +02:00
Dalton Hubble ad871dbfa9 Update Kubernetes from v1.11.2 to v1.11.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1113
2018-09-13 18:50:41 -07:00
Dalton Hubble 7eb09237f4 Update Calico from v3.1.3 to v3.2.1
* Add new bird and felix readiness checks
* Read MTU from ConfigMap veth_mtu
* Add RBAC read for serviceaccounts
* Remove invalid description from CRDs
2018-08-25 17:53:11 -07:00
Dalton Hubble bdf1e6986e Fix terraform fmt 2018-08-21 21:59:55 -07:00
Dalton Hubble bec5250e73 Remove unofficial bare-metal *_networkds variables
* Remove controller_networkds and worker_networkds variables. These
variables were always listed as experimental, unsupported, and excluded
from documentation in anticipation of Container Linux Config snippets
* Use Container Linux Config snippets on bare-metal instead. They
provide safer, more powerful, and more elegant host customization
2018-08-13 23:33:29 -07:00
Dalton Hubble f7ebdf475d Update Kubernetes from v1.11.1 to v1.11.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1112
2018-08-07 21:57:25 -07:00
Dalton Hubble edc250d62a Fix Kublet version for Fedora Atomic modules
* Release v1.11.1 erroneously left Fedora Atomic clusters using
the v1.11.0 Kubelet. The rest of the control plane ran v1.11.1
as expected
* Update Kubelet from v1.11.0 to v1.11.1 so Fedora Atomic matches
Container Linux
* Container Linux modules were not affected
2018-07-29 12:13:29 -07:00
Dalton Hubble db64ce3312 Update etcd from v3.3.8 to v3.3.9
* https://github.com/coreos/etcd/blob/master/CHANGELOG-3.3.md#v339-2018-07-24
2018-07-29 11:27:37 -07:00
Dalton Hubble 7c327b8bf4 Update from bootkube v0.12.0 to v0.13.0 2018-07-29 11:20:17 -07:00
Dalton Hubble 13beb13aab Add descriptions to bare-metal fedora-atomic variables 2018-07-29 11:07:48 -07:00
Dalton Hubble 90c4a7483d Combine bare-metal CLC snippets maps into one map 2018-07-26 23:31:08 -07:00
Dalton Hubble 4e7dfc115d Support Container Linux Config snippets on bare-metal 2018-07-25 23:14:54 -07:00
Dalton Hubble d8d524d10b Update Kubernetes from v1.11.0 to v1.11.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#v1111
2018-07-20 00:41:27 -07:00
Dalton Hubble 915f89d3c8 Update Fedora Atomic from 27 to 28 on bare-metal 2018-07-04 11:41:54 -07:00
Dalton Hubble 6f958d7577 Replace kube-dns with CoreDNS
* Add system:coredns ClusterRole and binding
* Annotate CoreDNS for Prometheus metrics scraping
* Remove kube-dns deployment, service, & service account
* https://github.com/poseidon/terraform-render-bootkube/pull/71
* https://kubernetes.io/blog/2018/06/27/kubernetes-1.11-release-announcement/
2018-07-01 22:55:01 -07:00
Dalton Hubble def445a344 Update Fedora Atomic kubelet from v1.10.5 to v1.11.0 2018-06-30 16:45:42 -07:00
Dalton Hubble 8464b258d8 Update Kubernetes from v1.10.5 to v1.11.0
* Force apiserver to stop listening on 127.0.0.1:8080
* Remove deprecated Kubelet `--allow-privileged`. Defaults to
true. Use `PodSecurityPolicy` if limiting is desired
* https://github.com/kubernetes/kubernetes/releases/tag/v1.11.0
* https://github.com/poseidon/terraform-render-bootkube/pull/68
2018-06-27 22:47:35 -07:00
Dalton Hubble 0227014fa0 Fix terraform formatting 2018-06-22 00:28:36 -07:00
Dalton Hubble f4d3059b00 Update Kubernetes from v1.10.4 to v1.10.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1105
2018-06-21 22:51:39 -07:00
Dalton Hubble 6c5a1964aa Change kube-apiserver port from 443 to 6443
* Adjust firewall rules, security groups, cloud load balancers,
and generated kubeconfig's
* Facilitates some future simplifications and cost reductions
* Bare-Metal users who exposed kube-apiserver on a WAN via their
router or load balancer will need to adjust its configuration.
This is uncommon, most apiserver are on LAN and/or behind VPN
so no routing infrastructure is configured with the port number
2018-06-19 23:48:51 -07:00
Dalton Hubble 6e64634748 Update etcd from v3.3.7 to v3.3.8
* https://github.com/coreos/etcd/releases/tag/v3.3.8
2018-06-19 21:56:21 -07:00
Dalton Hubble ed0b781296 Fix possible deadlock for provisioning bare-metal clusters
* Closes #235
2018-06-14 23:15:28 -07:00
Dalton Hubble 51906bf398 Update etcd from v3.3.6 to v3.3.7 2018-06-14 22:46:16 -07:00
Dalton Hubble 79260c48f6 Update Kubernetes from v1.10.3 to v1.10.4 2018-06-06 23:23:11 -07:00
Dalton Hubble 589c3569b7 Update etcd from v3.3.5 to v3.3.6
* https://github.com/coreos/etcd/releases/tag/v3.3.6
2018-06-06 23:19:30 -07:00
Dalton Hubble 6e968cd152 Update Calico from v3.1.2 to v3.1.3
* https://github.com/projectcalico/calico/releases/tag/v3.1.3
* https://github.com/projectcalico/cni-plugin/releases/tag/v3.1.3
2018-05-30 21:32:12 -07:00
Dalton Hubble 4ea1fde9c5 Update Kubernetes from v1.10.2 to v1.10.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#v1103
* Update Calico from v3.1.1 to v3.1.2
2018-05-21 21:38:43 -07:00
William Zhang 2ae126bf68 Fix README link to tutorial 2018-05-19 13:10:22 -07:00
Dalton Hubble 0c3557e68e Allow Flatcar Linux os_channel on bare-metal
* Choose the Container Linux derivative Flatcar Linux on
bare-metal by setting os_channel to flatcar-stable, flatcar-beta
or flatcar-alpha
* As with Container Linux from Red Hat, the version (os_version)
must correspond to the channel being used
* Thank you to @dongsupark from Kinvolk
2018-05-17 20:09:36 -07:00
Dalton Hubble adc6c6866d Rename container_linux_ bare-metal variables
* Allow for Container Linux derivatives
* Replace container_linux_channel variable with `os_channel`
* Replace `container_linux_version` variable with `os_version`
* Please change values `stable`, `beta`, or `alpha` to `coreos-stable`,
`coreos-beta`, `coreos-alpha` (action required!)
2018-05-16 22:40:39 -07:00
Dalton Hubble 9ac7b0655f Add bare-metal network_ip_autodetection_method variable for multi-NIC
* Allow setting the Calico host IPv4 address autodetection method
* Use Calico's default "first-found" method to support single NIC
and bonded NIC nodes
* Allow methods like `can-reach=IP` or `interface=REGEX` for multi
NIC nodes
* https://docs.projectcalico.org/v3.1/reference/node/configuration#ip-autodetection-methods
2018-05-15 23:27:34 -07:00
Dalton Hubble 37981f9fb1 Allow bearer token authn/authz to the Kubelet
* Require Webhook authorization to the Kubelet
* Switch apiserver X509 client cert org to systems:masters
to grant the apiserver admin and satisfy the authorization
requirement. kubectl commands like logs or exec that have
the apiserver make requests of a kubelet continue to work
as before
* https://kubernetes.io/docs/admin/kubelet-authentication-authorization/
* https://github.com/poseidon/typhoon/issues/215
2018-05-13 23:20:42 -07:00
Dalton Hubble f2ee75ac98 Require Terraform v0.11.x, drop v0.10.x support
* Raise minimum Terraform version to v0.11.0
* Terraform v0.11.x has been supported since Typhoon v1.9.2
and Terraform v0.10.x was last released in Nov 2017. I'd like
to stop worrying about v0.10.x and remove migration docs as
a later followup
* Migration docs docs/topics/maintenance.md#terraform-v011x
2018-05-10 02:20:46 -07:00
Dalton Hubble 8b8e364915 Update etcd from v3.3.4 to v3.3.5
* https://github.com/coreos/etcd/releases/tag/v3.3.5
2018-05-10 02:12:53 -07:00
Dalton Hubble 9d4cbb38f6 Rerun terraform fmt 2018-05-01 21:41:22 -07:00
Dalton Hubble e889430926 Update kube-dns from v1.14.9 to v1.14.10
* https://github.com/kubernetes/kubernetes/pull/62676
2018-04-28 00:43:09 -07:00
Dalton Hubble 32ddfa94e1 Update Kubernetes from v1.10.1 to v1.10.2
* https://github.com/kubernetes/kubernetes/releases/tag/v1.10.2
2018-04-28 00:27:00 -07:00
Dalton Hubble 681450aa0d Update etcd from v3.3.3 to v3.3.4
* https://github.com/coreos/etcd/releases/tag/v3.3.4
2018-04-27 23:57:26 -07:00
Dalton Hubble 567e18f015 Fix conflict between Calico and NetworkManager
* Observed frequent kube-scheduler and controller-manager
restarts with Calico as the CNI provider. Root cause was
unclear since control plane was functional and tests of
pod to pod network connectivity passed
* Root cause: Calico sets up cali* and tunl* network interfaces
for containers on hosts. NetworkManager tries to manage these
interfaces. It periodically disconnected veth pairs. Logs did
not surface this issue since its not an error per-se, just Calico
and NetworkManager dueling for control. Kubernetes correctly
restarted pods failing health checks and ensured 2 replicas were
running so the control plane functioned mostly normally. Pod to
pod connecitivity was only affected occassionally. Pain to debug.
* Solution: Configure NetworkManager to ignore the Calico ifaces
per Calico's recommendation. Cloud-init writes files after
NetworkManager starts, so a restart is required on first boot. On
subsequent boots, the file is present so no restart is needed
2018-04-25 21:45:58 -07:00
Dalton Hubble 0a7fab56e2 Load ip_vs kernel module on boot as workaround
* (containerized) kube-proxy warns that it is unable to
load the ip_vs kernel module despite having the correct
mounts. Atomic uses an xz compressed module and modprobe
in the container was not compiled with compression support
* Workaround issue for now by always loading ip_vs on-host
* https://github.com/kubernetes/kubernetes/issues/60
2018-04-25 21:45:58 -07:00
Dalton Hubble d784b0fca6 Switch to quay.io/poseidon tagged system containers 2018-04-25 18:15:18 -07:00
Dalton Hubble 7198b9016c Update Calico from v3.0.4 to v3.1.1 for Atomic 2018-04-21 18:46:56 -07:00
Dalton Hubble f36c890234 Fix ostree repo to be called fedora-atomic on bare-metal
* atomic host updates were fetching updates from the repo cache
fedora-atomic-27, instead of from upstream
2018-04-21 18:46:56 -07:00
Dalton Hubble 3f2978821b Add atomic_assets_endpoint var for fedora-atomic bare-metal 2018-04-21 18:46:56 -07:00
Dalton Hubble 9b88d4bbfd Use bootkube system container on fedora-atomic
* Use the upstream bootkube image packaged with the
required metadata to be usable as a system container
under systemd
* Run bootkube with runc so no host level components
use Docker any more. Docker is still the runtime
* Remove bootkube script and old systemd unit
2018-04-21 18:46:56 -07:00
Dalton Hubble 3dde4ba8ba Mount host's /etc/os-release in kubelet system containers
* Fix `kubectl describe node` to reflect the host's operating
system
2018-04-21 18:46:56 -07:00
Dalton Hubble e148552220 Enable kubelet allocatable enforcement and QoS cgroup hierarchy
* Change kubelet system image to use --cgroups-per-qos=true
(default) instead of false
* Change kubelet system image to use --enforce-node-allocatable=pods
instead of an empty string
2018-04-21 18:46:56 -07:00
Dalton Hubble d8d1468f03 Update kubelet system container image to mount /etc/hosts
* Fix kubelet port-forward on Google Cloud / Fedora Atomic
* Mount the host's /etc/hosts in kubelet system containers
* Problem: kubelet runc system containers on Atomic were not
mounting the host's /etc/hosts, like rkt-fly does on Container
Linux. `kubectl port-forward` calls socat with localhost. DNS
servers on AWS, DO, and in many bare-metal environments resolve
localhost to the caller as a convenience. Google Cloud notably
does not nor is it required to do so and this surfaced the
missing /etc/hosts in runc kubelet namespaces.
2018-04-21 18:46:56 -07:00
Dalton Hubble cf22e70b46 Name ostree remote repo fedora-atomic across platforms 2018-04-21 18:46:56 -07:00
Dalton Hubble b3cf9508b6 Update Fedora Atomic modules to Kubernetes v1.10.1 2018-04-21 18:46:56 -07:00
Dalton Hubble f990473cde Update control plane manifests and add etcd metrics
* Enable etcd v3.3 metrics to expose metrics for
scraping by Prometheus
* Use k8s.gcr.io instead of gcr.io/google_containers
* Add flexvolume plugin mount to controller manager
* Update kube-dns from v1.14.8 to v1.14.9
2018-04-21 18:46:56 -07:00
Dalton Hubble 8523a086e2 Fix kubelet system container to mount CNI plugins
* Mount /opt/cni/bin in kubelet system container so
CNI plugin binaries can be found. Before, flannel
worked because the kubelet falls back to flannel
plugin baked into the hyperkube (undesired)
* Move the CNI bin install location later, since /opt
changes may be lost between ostree rebases
2018-04-21 18:46:56 -07:00
Dalton Hubble 19bc5aea9e Use kubelet system container on fedora-atomic
* Use the upstream hyperkube image packaged with the
required metadata to be usable as a system container
under systemd
* Fix port-forward since socat is included
2018-04-21 18:46:56 -07:00
Dalton Hubble 8d7cfc1a45 Use etcd system container on fedora-atomic
* Use the upstream etcd image packaged with the required
metadata to be usable as a system container (runc) under
systemd
2018-04-21 18:46:56 -07:00
Dalton Hubble ddc75e99ac Add bare-metal Fedora Atomic module
* Several known hacks and broken areas
* Download v1.10 Kubelet from release tarball
* Install flannel CNI binaries to /opt/cni
* Switch SELinux to Permissive
* Disable firewalld service
* port-forward won't work, socat missing
2018-04-21 18:46:56 -07:00
Dalton Hubble a54f76db2a Update Calico from v3.0.4 to v3.1.1
* https://github.com/projectcalico/calico/releases/tag/v3.1.1
* https://github.com/projectcalico/calico/releases/tag/v3.1.0
2018-04-21 18:30:36 -07:00
Dalton Hubble 77c0a4cf2e Update Kubernetes from v1.10.0 to v1.10.1
* Use kubernetes-incubator/bootkube v0.12.0
2018-04-12 20:57:31 -07:00
Dalton Hubble 9bb3de5327 Skip creating unused dirs on worker nodes 2018-04-11 22:23:51 -07:00
Dalton Hubble d276fffcda Fix bare-metal multiple apply/ssh on Terraform v0.11.4+
* Terraform v0.11.4 introduced changes to remote-exec
that mean Typhoon bare-metal clusters require multiple
runs of terraform apply to ssh and bootstrap.
* Bare-metal installs PXE boot a live instance to install
to disk and then reboot from disk as controllers/workers.
Terraform remote-exec has no way to "know" to wait until
the reboot has occurred to kickoff Kubernetes bootstrap.
Previously Typhoon created a "debug" user during this
install phase to allow an admin to SSH, but remote-exec
would hang, trying to connect as user "core". Terraform
v0.11.4 changes this behavior so remote-exec fails and
a user must re-run terraform apply until succeeding.
* A new way to "trick" remote-exec into waiting for the
reboot into the disk install is to run SSH on a non-standard
port during the disk install. This retains the ability
for an admin to SSH during install (most distros don't have
this) and fixes the issue so only a single run of terraform
apply is needed.
* https://github.com/hashicorp/terraform/pull/17359#issuecomment-376415464
2018-04-08 13:32:31 -07:00
Dalton Hubble 6b08bde479 Use k8s.gcr.io instead of gcr.io/google_containers
* Kubernetes recommends using the alias to fetch images
from the nearest GCR regional mirror, to abstract the use
of GCR, and to drop names containing 'google'
* https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ
2018-04-08 12:57:52 -07:00
Dalton Hubble 18dbaf74ce Update kube-dns from v1.14.8 to v1.14.9
* https://github.com/kubernetes/kubernetes/pull/61908
2018-04-04 21:00:23 -07:00
Dalton Hubble ce001e9d56 Update etcd from v3.3.2 to v3.3.3
* https://github.com/coreos/etcd/releases/tag/v3.3.3
2018-04-04 20:32:24 -07:00
Dalton Hubble d770393dbc Add etcd metrics, Prometheus scrapes, and Grafana dash
* Use etcd v3.3 --listen-metrics-urls to expose only metrics
data via http://0.0.0.0:2381 on controllers
* Add Prometheus discovery for etcd peers on controller nodes
* Temporarily drop two noisy Prometheus alerts
2018-04-03 20:31:00 -07:00
Dalton Hubble 1cc043d1eb Update Kubernetes from v1.9.6 to v1.10.0 2018-03-30 22:14:07 -07:00
Dalton Hubble de4d90750e Use consistent naming of remote provision steps 2018-03-26 00:29:57 -07:00
Dalton Hubble ba9daf439e Remove unmaintained pxe-worker internal module 2018-03-25 21:57:52 -07:00
Dalton Hubble e43cf9f608 Organize and cleanup variable descriptions 2018-03-25 21:44:43 -07:00
Dalton Hubble a04ef3919a Update Kubernetes from v1.9.5 to v1.9.6 2018-03-21 20:29:52 -07:00
Dalton Hubble 758c09fa5c Update Kubernetes from v1.9.4 to v1.9.5 2018-03-19 00:25:44 -07:00
Dalton Hubble 88aa9a46e5 Add /var/lib/calico volume mount to Calico DaemonSet 2018-03-18 16:40:38 -07:00
Dalton Hubble efa90d8b44 Add a new key=value label to controller nodes
* Add a node-role.kubernetes.io/controller="true" node label
to controllers so Prometheus service discovery can filter to
services that only run on controllers (i.e. masters)
* Leave node-role.kubernetes.io/master="" untouched as its
a Kubernetes convention
2018-03-18 16:39:10 -07:00
Dalton Hubble 21f2cef12f Improve changelog, README, and index page 2018-03-12 20:58:02 -07:00
Dalton Hubble 931e311786 Update Kubernetes from v1.9.3 to v1.9.4 2018-03-12 18:07:50 -07:00
Dalton Hubble 9fb1e1a0e2 Update etcd from v3.3.1 to v3.3.2
* https://github.com/coreos/etcd/releases/tag/v3.3.2
2018-03-10 13:44:35 -08:00
Dalton Hubble 98985e5acd Remove unused etcd_service_ip template variable
* etcd_service_ip dates back to deprecated self-hosted etcd
2018-02-26 22:20:20 -08:00
Dalton Hubble a44cf0edbd Update Calico from v3.0.2 to v3.0.3
* https://github.com/projectcalico/calico/releases/tag/v3.0.3
2018-02-26 12:48:19 -08:00
Dalton Hubble c4914c326b Update bootkube and terraform-render-bootkube to v0.11.0 2018-02-22 21:53:26 -08:00
Dalton Hubble 195d902ab6 Upgrade etcd from v3.2.15 to v3.3.1 2018-02-15 19:29:46 -08:00
Dalton Hubble c19a68b59b Update bootkube control-plane manifests
* Remove PersistentVolumeLabel admission controller flag
* Switch Deployments and DaemonSets to apps/v1
* Minor update to pod-checkpointer image version
2018-02-15 11:06:35 -08:00
Dalton Hubble a41691b222 Update Kubernetes from v1.9.2 to v1.9.3
* Add flannel service account and limited RBAC cluster role
* Change DaemonSets to tolerate NoSchedule and NoExecute taints
* Remove deprecated apiserver --etcd-quorum-read flag
* Update Calico from v3.0.1 to v3.0.2
* Add Calico GlobalNetworkSet CRD
* https://github.com/poseidon/terraform-render-bootkube/pull/44
2018-02-10 13:37:07 -08:00
bkcsfi 9034203d7a Fix typo in list of maps comment 2018-02-09 19:11:06 -08:00
Dalton Hubble 2fa1840c30 Update flannel from v0.9.0 to v0.10.0
* https://github.com/coreos/flannel/releases/tag/v0.10.0
2018-01-28 23:09:21 -08:00
Dalton Hubble 8e0b8d7e40 Upgrade Calico from 2.6.6 to 3.0.1 2018-01-28 11:47:23 -08:00
Dalton Hubble 3e6e4ea339 Update etcd from 3.2.14 to 3.2.15
* https://github.com/coreos/etcd/releases/tag/v3.2.15
2018-01-23 23:50:04 -08:00
Dalton Hubble 868265988b Update bootkube and terraform-render-bootkube to v0.10.0 2018-01-19 23:10:45 -08:00
Dalton Hubble 6adffcb778 Update Kubernetes from v1.9.1 to v1.9.2 2018-01-19 08:40:09 -08:00
Dalton Hubble 38fa7dff1a Create separate bare-metal container-linux-install profiles
* Create separate container-linux-install profiles (and
cached-container-linux-install) for each node in a cluster
* Fix contention bug on bare-metal during `terraform apply`.
With only a global install profile, terraform would create
(or retain) the profile for each cluster and try to delete
it for each cluster being deleted. As a result, in some cases
apply had to be run multiple times before terraform's repr
of constraints was satisfied (profile deleted and recreated)
* Allow Container Linux install properties to vary between
clusters, such as using a different Container Linux channel
or version for different clusters
2018-01-15 08:23:03 -08:00
Dalton Hubble d8db296932 Update kube-dns and use separate service account
* Update kube-dns from v1.14.7 to v1.14.8
* Use a separate kube-dns service account
* https://github.com/kubernetes/kubernetes/pull/57918
2018-01-12 10:29:30 -08:00
Dalton Hubble 388ac08492 Update etcd from 3.2.13 to 3.2.14
* https://github.com/coreos/etcd/releases/tag/v3.2.14
2018-01-12 07:20:55 -08:00
Dalton Hubble fc455c8624 Remove old mention of ACIs in bootkube.service description 2018-01-06 16:20:34 -08:00
Dalton Hubble 51a5f64024 Enable portmap plugin alongside Calico to fix hostPort
* https://github.com/poseidon/terraform-render-bootkube/pull/36
2018-01-06 14:01:18 -08:00
Dalton Hubble e1f2125f02 Update etcd from 3.2.0 to 3.2.13
* https://github.com/coreos/etcd/releases/tag/v3.2.13
2018-01-06 14:01:18 -08:00
Dalton Hubble 9329b775f6 Update Kubernetes from v1.8.6 to v1.9.1 2018-01-06 14:01:16 -08:00
Dalton Hubble fbdd946601 Update Kubernetes from v1.8.5 to v1.8.6 2017-12-21 11:20:37 -08:00
Barak Michener e79088baa0 Add optional cluster_domain_suffix variable
* Allow kube-dns to respond to DNS queries with a custom
suffix, instead of the default 'cluster.local'
* Useful when multiple clusters exist on the same local
network and wish to query services on one another
2017-12-15 01:45:52 -08:00
Dalton Hubble 495e33e213 Update bootkube and terraform-render-bootkube to v0.9.1 2017-12-15 01:45:02 -08:00
Dalton Hubble 63f5a26a72 Eliminate steps to move self-hosted etcd assets
* bootkube/assets/experimental/* assets corresponded to self-hosted
etcd manifests, which are no longer an option in Typhoon
2017-12-13 01:06:56 -08:00
Lars Fenneberg eea79e895d Fix manifest consolidation in bootkube start wrapper
* Fix manifest existence test in /opt/bootkube/bootkube-start
to also work with more than one directory
2017-12-12 23:08:22 -08:00
Dalton Hubble 165396d6aa Update Kubernetes from v1.8.4 to v1.8.5 2017-12-09 21:28:31 -08:00
Vincent Palmer ce49a93d5d Fix issue with etcd-member failing to resolve peers
* When restarting masters, `etcd-member.service` may fail to lookup peers if
/etc/resolv.conf hasn't been populated yet. Require the wait-for-dns.service.
2017-12-09 20:12:49 -08:00
Dalton Hubble 9548572d98 Add kubelet --volume-plugin-dir flag on bare-metal
* Kubelet will search path for flexvolume plugins
2017-12-05 13:12:53 -08:00
Dalton Hubble 5f5eec1175 Update bootkube and terraform-render-bootkube to v0.9.0 2017-12-01 22:27:48 -08:00
Dalton Hubble 5308fde3d3 Add Kubernetes certification badge 2017-11-29 19:26:49 -08:00
Dalton Hubble 6483f613c5 Update Kubernetes from v1.8.3 to v1.8.4 2017-11-28 21:52:11 -08:00
Dalton Hubble 56c6bf431a Update terraform-render-bootkube for Kubernetes v1.8.4
* Update hyperkube from v1.8.3 to v1.8.4
* Remove flock from bootstrap-apiserver and kube-apiserver
* Remove unused critical-pod annotations in manifests
* Use service accounts for kube-proxy and pod-checkpointer
* Update Calico from v2.6.1 to v2.6.3
* Update flannel from v0.9.0 to v0.9.1
* Remove Calico termination grace period to prevent calico
from getting stuck for extended periods
* https://github.com/poseidon/terraform-render-bootkube/pull/29
2017-11-28 21:42:26 -08:00
Dalton Hubble 07d257aa7b Add initrd kernel argument needed by UEFI clients
* https://github.com/coreos/bugs/issues/1239
2017-11-16 23:19:51 -08:00
Dalton Hubble 5f6b0728c5 Update bootkube and terraform-render-bootkube to v0.8.2 2017-11-10 20:01:37 -08:00
Dalton Hubble d774c51297 Update Kubernetes from v1.8.2 to v1.8.3 2017-11-08 23:34:19 -08:00
Dalton Hubble f6a8fb363e Remove deprecated kubelet --require-kubeconfig flag
* https://github.com/kubernetes/kubernetes/pull/40050
2017-11-08 23:34:19 -08:00
Dalton Hubble 168c487484 Remove mention of self-hosted etcd, its deprecated 2017-11-06 01:03:53 -08:00
Dalton Hubble 878f5a3647 Bump bootkube and terraform-render-bootkube to v0.8.1
* Use the v0.8.1 tagged terraform-render-bootkube module
* Use the v0.8.1 quay.io/coreos/bootkube image to bootstrap
2017-10-28 12:50:37 -07:00
Dalton Hubble 34ec7e9862 Relax pessimistic constraints on 1.0+ providers
* Constrains ~> 1.0 means users can use 1.0.1, 1.1, but not 2.0
* https://www.terraform.io/docs/configuration/terraform.html
2017-10-25 23:27:28 -07:00
Dalton Hubble f6c6e85f84 Require minimum Terraform and plugin versions
* Bump minimum Terraform version to v0.10.4
* Allow minor version updates for 1.0+ plugins
* Fix versions for plugins which are pre-1.0
2017-10-25 23:00:31 -07:00
Dalton Hubble 60bc8957c9 Update Kubernetes from v1.8.1 to v1.8.2
* Kubernetes v1.8.2 fixes a memory leak in the v1.8.1 apiserver
* Switch to using the `gcr.io/google_containers/hyperkube` for the
on-host kubelet and shutdown drains
* Update terraform-render-bootkube manifests generation
  * Update flannel from v0.8.0 to v0.9.0
  * Add `hairpinMode` to flannel CNI config
  * Add `--no-negcache` to kube-dns dnsmasq
2017-10-24 21:44:26 -07:00
Dalton Hubble e4c479554c Update AWS, DO, BM Kubernetes from v1.7.7 to v1.8.1
* Update from bootkube v0.7.0 to v0.8.0
* Leave Google Cloud update to a followup commit
2017-10-19 21:10:04 -07:00
Dalton Hubble bfa8dfc75d Conditionally set networkd content on bare-metal
* Without this change, if a cluster doesn't set the controller
or worker networkd lists, an err "element() may not be used
with an empty list" occurs.
* controller_networkds and worker_networks are intended to be
optional and temporary, not required at all
2017-10-17 18:47:12 -07:00
Dalton Hubble 43dc44623f Fix the terraform fmt of configs 2017-10-16 01:32:25 -07:00
Dalton Hubble 41e632280f Remove unused storage section ala PXE-only Matchbox templating 2017-10-16 00:42:20 -07:00
Dalton Hubble fc22f04dd6 Add temporary variables for multi-nic testing
* Accept ordered lists of controller and worker networkd configs
* Do not rely on these variables. They will be replaced with a
cleaner mechanism at a future date
2017-10-16 00:39:58 -07:00
Dalton Hubble 9ec8ec4afc Secure copy etcd TLS credentials to controllers only
* Controllers receive etcd TLS credentials
* Controllers and workers receive a kubeconfig
2017-10-14 20:48:02 -07:00
Dalton Hubble 5c1ed37ff5 Add SSH key to user "debug" during disk-install phase
* Avoid adding SSH authorized key for user "core" during the disk
install, so that terraform apply cannot SSH until post-install
2017-10-14 20:37:42 -07:00
bzub e765fb310d Allow setting custom PXE boot kernel_args on bare-metal 2017-10-14 19:39:10 -07:00
Dalton Hubble 1bc25c1036 Update Kubernetes from v1.7.5 to v1.7.7
* Update from bootkube v0.6.2 to v0.7.0
* Use renamed terraform-render-bootkube. Renamed from
bootkube-terraform to meet Terraform Module requirements
2017-10-03 21:03:15 -07:00
Dalton Hubble 2d5a4ae1ef Update kube-dns image to address dnsmasq vulnerability
* https://security.googleblog.com/2017/10/behind-masq-yet-more-dns-and-dhcp.html
2017-10-02 10:27:10 -07:00
Dalton Hubble dd883988bd Update from Calico v2.5.1 to v2.6.1
* Network policy improvements
* Update cni sidecar image from v1.10.0 to v1.11.0
* Lower log level in Calico CNI config from debug to info
2017-09-30 16:16:40 -07:00
Dalton Hubble e0d8917573 Add LICENSE to top-level of each module 2017-09-28 20:41:19 -07:00
Dalton Hubble 77e387cf83 Add top-level README.md with module overview 2017-09-27 22:09:52 -07:00
Dalton Hubble f7dd959e9c bare-metal: Stop including etcd-network-checkpointer 2017-09-27 18:25:20 -07:00
Dalton Hubble b62a6def23 Merge pull request #26 from poseidon/fix-nfs-issue
Add Wants=rpc-statd.service to Kubelet
2017-09-24 20:18:22 -07:00
Dalton Hubble 1b5caef4c1 Add Wants=rpc-statd.service to Kubelet
* Mounting NFS exports as volumes from some NFS servers fails because
the kubelet isn't starting rpc-statd as expected. Describing pods
that are stuck creating shows rpc.statd is required for remote locking
* Starting rpc-statd.service resolves the issue and all NFS mounts
seem to be working.
* Recommended approach https://github.com/coreos/bugs/issues/2074
2017-09-24 18:23:55 -07:00
Dalton Hubble 68726a2773 bare-metal: Remove support for experimental_self_hosted_etcd
* Transition from discouraging self-hosted etcd for bare-metal,
to removing it as an option
* See #13 and FAQ for self-hosted etcd discussion
2017-09-23 16:49:15 -07:00
Dalton Hubble 777c860b1c bare-metal: Update to using Kubernetes v1.7.5 control plane manifests
* bootkube-terraform module wasn't bumped for bare-metal
2017-09-23 14:04:18 -07:00
Dalton Hubble bca96bb124 bare-metal: Ues Terraform templating for Container Linux configs
* Template bare-metal Container Linux configs with Terraform's
(limited) template_file module. This allows rendering problems
to be identified during `terraform plan` and is favored over
using the Matchbox templating feature when the configs are
served to PXE booting nodes.
* Writes a Matchbox profile for each machine, which will be served
as-is. The effect is the same, each node gets provisioned with its
own Container Linux config.
2017-09-23 11:49:12 -07:00
Dalton Hubble 7c046b6206 *: Fix Terraform fmt and comments 2017-09-17 21:43:00 -07:00
Dalton Hubble 0d6410505d bare-metal: Update kubelet.service unit to match upstream
* Mount host /opt/cni/bin in Kubelet to use host's CNI plugins
* Switch /var/run/kubelet-pod.uuid to /var/cache/kubelet-pod.uuid
to persist between reboots and cleanup old Kubelet pods
* Organize Kubelet flags in alphabetical order
2017-09-14 11:44:02 -07:00
Dalton Hubble 64e8d207b1 Change bare-metal and GCE networking default to calico
* Switch networking default from flannel to calico
2017-09-12 09:16:58 -07:00
Dalton Hubble a441f5c6e0 Update Kubernetes from v1.7.3 to v1.7.5 2017-09-08 13:56:20 -07:00
Dalton Hubble 1efe39d6bc Allow MTU for bare-metal Calico to be customized
* Calico on bare-metal defaults to IP-in-IP encapsulation and MTU 1480
2017-09-05 19:01:18 -07:00
Dalton Hubble 6ef326a872 bare-metal: Add support for Calico networking
* Add variable networking with "flannel" or "calico"
2017-09-01 17:52:22 -07:00
Dalton Hubble dc3ff174ea Update Kubernetes from v1.7.1 to v1.7.3 2017-08-16 20:12:59 -07:00
Dalton Hubble fc018ffa28 Rename project and organization 2017-08-14 19:24:04 -07:00
Dalton Hubble e19517d3df Fix the terraform fmt of configs 2017-08-12 18:26:05 -07:00
Lucas Serven cafc58c610 Update module source from dghubble to purenetes 2017-08-07 19:30:41 -07:00
Dalton Hubble efff7497eb digital-ocean: Join name.dns_zone for controller domain
* Output the DNS FQDNs, IPv4 addresses, and IPv6 addresses
2017-07-29 12:47:47 -07:00
Dalton Hubble 2d33b9abe2 Update diskless PXE workers to Kubernetes v1.7.1 2017-07-27 23:11:41 -07:00
Dalton Hubble da596e06bb Add bare-metal support for Container Linux with Matchbox 2017-07-24 23:24:12 -07:00
Dalton Hubble 386dc49a58 Organize metal-worker-pxe under bare-metal 2017-07-24 21:40:05 -07:00