Commit Graph

868 Commits

Author SHA1 Message Date
Dalton Hubble 68d8717924 Refresh Prometheus rules/alerts and Grafana dashboards
* Refresh rules, alerts, and dashboards from upstreams
2019-07-21 11:29:34 -07:00
Dalton Hubble c8df349e55 Fix to add all Azure controller nodes to address pool
* Add all Azure controllers to the apiserver load balancer
backend address pool
* Previously, kube-apiserver availability relied on the 0th
controller being up. Multi-controller was just providing etcd
data redundancy
2019-07-21 10:38:17 -07:00
Dalton Hubble f543f08867 Compact nginx-ingress ClusterRole rules
* https://github.com/kubernetes/ingress-nginx/pull/4302
2019-07-20 20:31:06 -07:00
Dalton Hubble e0be091acc Update kube-state-metrics from v1.7.0 to v1.7.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.1
2019-07-20 20:17:08 -07:00
Dalton Hubble 56d0b9eae4 Avoid creating extraneous GCE controller instance groups
* Intended as part of #504 improvement
* Single controller clusters only require one controller
instance group (previously created zone-many)
* Multi-controller clusters must "wrap" controllers over
zonal heterogeneous instance groups. For example, 5
controllers over 3 zones (no change)
2019-07-20 16:58:45 -07:00
Dalton Hubble e0c7676a15 Update Kubernetes from v1.15.0 to v1.15.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#downloads-for-v1151
2019-07-19 01:21:08 -07:00
Dalton Hubble 339e323491 Temporarily turn off QoS cgroups on Fedora CoreOS controllers
* Kubelets can hit the ContainerManager Delegation issue and fail
to start (noted in 72c94f1c6). Its unclear why this occurs only
to some Kubelets (possibly an ordering concern)
* QoS cgroups remain a goal
* When a controller node is affected, bootstrapping fails, which
makes other development harder. Temporarily disable QoS on
controllers only. This should safeguard bring-up and hopefully
still allow the issue to occur on some workers for debugging
2019-07-19 00:17:03 -07:00
Dalton Hubble 6cd3e65267 Update kube-state-metrics from v1.7.0-rc.1 to v1.7.0
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0
* Add storageclasses and verticalpodautoscalers to ClusterRole
2019-07-19 00:14:47 -07:00
Dalton Hubble bb557b4ba0 Fix Fedora CoreOS preview links on docs site 2019-07-18 23:44:08 -07:00
Dalton Hubble c7ff1a2e01 Announce a preview with Fedora CoreOS preview 2019-07-18 09:13:40 -07:00
Dalton Hubble 5fdeb9bc78 Adjust Fedora CoreOS image locations
* Use the xz compressed images published by Fedora testing,
instead of gzippped tarballs. This is possible because the
initramfs now supports xz and coreos-installer 0.8 was added
* Separate bios and uefi raw images are no longer needed
2019-07-18 01:15:29 -07:00
Dalton Hubble 155bffa773 Add docs for Fedora CoreOS AWS and bare-metal 2019-07-18 00:55:22 -07:00
Dalton Hubble ce45e123fe Port Typhoon Fedora CoreOS support to AWS
* Use the newly minted "Fedora CoreOS Preview" AMI
* Remove iscsi, kubelet.path activation, and kubeconfig
distribution
* As usual, bare-metal efforts make cloud provider ports
much easier
2019-07-18 00:55:22 -07:00
Dalton Hubble 72c94f1c6a Add Kubelet System Container and bootkube bootstrap
* First semi-working cluster using 30.307-metal-bios
* Enable CPU, Memory, and BlockIO accounting
* Mount /var/lib/kubelet with `rshare` so mounted tmpfs Secrets
(e.g. serviceaccount's) are visible within appropriate containers
* SELinux relabel /etc/kubernetes so install-cni init containers
can write the CNI config to the host /etc/kubernetes/net.d
* SELinux relabel /var/lib/kubelet so ConfigMaps can be read
by containers
* SELinux relabel /opt/cni/bin so install-cni containers can
write CNI binaries to the host
* Set net.ipv4_conf.all.rp_filter to 1 (not 2, loose mode) to
satisfy Calico requirement
* Enable the QoS cgroup hierarchy for pod workloads (kubepods,
burstable, besteffort). Mount /sys/fs/cgroup and
/sys/fs/cgroup/systemd into the Kubelet. Its still rather racy
whether Kubelet will fail on ContainerManager Delegation
2019-07-18 00:55:22 -07:00
Dalton Hubble aab14c5573 Run etcd-member.service across controllers
* Running the etcd container with NOTIFY_SOCKET mounted
(to use systemd Type=notify) causes podman to hang so
for now just use exec
* https://github.com/opencontainers/runc/pull/1807
2019-07-18 00:55:22 -07:00
Dalton Hubble eb92f67125 Start prototype of Fedora CoreOS on bare-metal
* Use terraform-provider-ct v0.4.0 with Fedora CoreOS Config
support (not yet released)
2019-07-18 00:55:22 -07:00
Dalton Hubble dfa6bcfecf Relax terraform-provider-ct version constraint
* Allow updating terraform-provider-ct to any release
beyond v0.3.2, but below v1.0. This relaxes the prior
constraint that allowed only v0.3.y provider versions
2019-07-16 22:07:37 -07:00
Dalton Hubble 70f5cfd33e Update kube-state-metrics from v1.6.0 to v1.7.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0-rc.1
* https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.7.0-rc.0
2019-07-13 13:13:57 -07:00
Dalton Hubble 9e91d7f011 Upgrade Calico from v3.7.4 to v3.8.0
* Enable CNI bandwidth plugin for traffic shaping
* https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#support-traffic-shaping
2019-07-11 21:01:41 -07:00
Dalton Hubble eaf59bd33f Update Prometheus from v2.11.0-rc.0 to v2.11.0
* https://github.com/prometheus/prometheus/releases/tag/v2.11.0
2019-07-09 21:33:24 -07:00
Dalton Hubble 40640f3697 Upgrade nginx-ingress from v0.24.1 to v0.25.0
* Support networking.k8s.io/v1beta1 apiVersion
* Update RBAC cluster-role for networking.k8s.io/v1beta1
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.25.0
2019-07-08 22:04:50 -07:00
Dalton Hubble 28ab746068 Update Prometheus from v2.10.0 to v2.11.0-rc.0
* https://github.com/prometheus/prometheus/releases/tag/v2.11.0-rc.0
2019-07-08 21:32:50 -07:00
Dalton Hubble 19596255a6 Fix malformed markdown table in OS docs 2019-07-08 20:54:46 -07:00
Dalton Hubble 69d064bfdf Run kube-apiserver with lower privilege user (nobody)
* Run kube-apiserver as a non-root user (nobody). User
no longer needs to bind low number ports.
* On most platforms, the kube-apiserver load balancer listens
on 6443 and fronts controllers with kube-apiserver pods using
port 6443. Google Cloud TCP proxy load balancers cannot listen
on 6443. However, GCP's load balancer can be made to listen on
443, while kube-apiserver uses 6443 across all platforms.
2019-07-08 20:52:00 -07:00
Dalton Hubble 7a69bae75e Raise GCP network deletion timeout from 4m to 6m
* Fix a GCP errata item https://github.com/poseidon/typhoon/wiki/Errata
* Removal of a Google Cloud cluster often required 2 runs of
`terraform apply` because network resource deletes timeout
after 4m. Raise the network deletion timeout to 6m to
ensure apply only needs to be run once to remove a cluster
2019-07-06 13:15:33 -07:00
Dalton Hubble 3fcb04f68c Improve apiserver backend service zone spanning
* google_compute_backend_services use nested blocks to define
backends (instance groups heterogeneous controllers)
* Use Terraform v0.12.x dynamic blocks so the apiserver backend
service can refer to (up to zone-many) controller instance groups
* Previously, with Terraform v0.11.x, the apiserver backend service
had to list a fixed set of backends to span controller nodes across
zones in multi-controller setups. 3 backends were used because each
GCP region offered at least 3 zones. Single-controller clusters had
the cosmetic ugliness of unused instance groups
* Allow controllers to span more than 3 zones if avilable in a
region (e.g. currently only us-central1, with 4 zones)

Related:

* https://www.terraform.io/docs/providers/google/r/compute_backend_service.html
* https://www.terraform.io/docs/configuration/expressions.html#dynamic-blocks
2019-07-05 19:46:26 -07:00
Dalton Hubble 8d373b5850 Update Calico from v3.7.3 to v3.7.4
* https://docs.projectcalico.org/v3.7/release-notes/
2019-07-02 20:18:02 -07:00
Dalton Hubble 307aaf5e30 Use Terraform v0.12 syntax in ingress docs
* Drop string interpolation in Google Cloud A records
shown in Nginx ingress addon docs
* Retain string interpolation syntax for CNAME records
since Google Cloud DNS expects records to end in "."
(some clouds add it automatically)
2019-06-29 13:50:49 -07:00
Dalton Hubble 9a395dbf88 Update Grafana from v6.2.4 to v6.2.5
* https://github.com/grafana/grafana/releases/tag/v6.2.5
2019-06-29 13:21:42 -07:00
Mateusz Gozdek fc6e8886ce Fix README link to Azure module (#502) 2019-06-29 13:20:03 -07:00
Dalton Hubble fff7cc035d Remove Fedora Atomic modules
* Typhoon for Fedora Atomic was deprecated in March 2019
* https://typhoon.psdn.io/announce/#march-27-2019
2019-06-23 13:40:51 -07:00
Dalton Hubble ca18fab5f0 Remove providers block, unused with Terraform v0.12
* Fix inconsistency btw README and the docs
2019-06-23 13:34:33 -07:00
Dalton Hubble 408e60075a Update Kubernetes from v1.14.3 to v1.15.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1150
* Remove docs referring to possible v1.14.4 release
2019-06-23 13:12:18 -07:00
Dalton Hubble 79d910821d Configure Kubelet cgroup-driver for Flatcar Linux Edge
* For Container Linux or Flatcar Linux alpha/beta/stable,
continue using the `cgroupfs` driver
* For Fedora Atomic, continue using the `systemd` driver
* For Flatcar Linux Edge, use the `systemd` driver
2019-06-22 23:38:42 -07:00
Dalton Hubble 5c4486f57b Allow using Flatcar Linux Edge on bare-metal and AWS
* On AWS, use Flatcar Linux Edge by setting `os_image` to
"flatcar-edge"
* On bare-metal, Flatcar Linux Edge by setting `os_channel` to
"flatcar-edge"
2019-06-22 23:38:42 -07:00
Dalton Hubble 331ebd90f6
Acknowledge DigitalOcean providing credits for test clusters (#500)
* [DigitalOcean](https://www.digitalocean.com/) kindly provides credits to support Typhoon test clusters. Many thanks!
2019-06-21 10:03:21 -07:00
Dalton Hubble 405015f52c Remove Fedora Atomic documentation
* Typhoon for Fedora Atomic was deprecated in March 2019
* https://typhoon.psdn.io/announce/#march-27-2019
2019-06-19 22:21:58 -07:00
Dalton Hubble d35c1cb9fb Fix advanced customization docs for Terraform v0.12
* Use Terraform v0.12 syntax in the Container Linux Config
snippet customization docs
2019-06-19 22:11:11 -07:00
Dalton Hubble 3d5be86aae Update provider plugin versions in tutorial docs
* Update Terraform provider plugin versions in docs to
reflect the recommended versions that we actively use
2019-06-19 21:58:43 -07:00
Dalton Hubble 4ad69efc43 Update Grafana from v6.2.2 to v6.2.4
* https://github.com/grafana/grafana/releases/tag/v6.2.4
2019-06-19 21:51:54 -07:00
Dalton Hubble ce7bff0066 Update mkdocs-material from v4.3.0 to v4.4.0 2019-06-16 12:28:37 -07:00
Dalton Hubble 21fb632e90 Update Calico from v3.7.2 to v3.7.3
* https://docs.projectcalico.org/v3.7/release-notes/
2019-06-13 23:54:20 -07:00
Dalton Hubble b168db139b Add tweaks to Terraform v0.12 migration docs
* Provide an exact SHA early migrators might use to
perform an in-place upgrade to Terraform v0.12
2019-06-13 23:52:00 -07:00
Johannes Liebermann e7dda155f3 Fix typo in maintenance docs (#494)
s/circuting/circuiting/
2019-06-11 19:59:42 -07:00
Dalton Hubble cc4f7e09ab Update node-exporter from v0.18.0 to v0.18.1
* https://github.com/prometheus/node_exporter/releases/tag/v0.18.1
2019-06-07 02:09:44 -07:00
Dalton Hubble f5960e227d Update addon-resizer base image to distroless
* Rel: https://github.com/kubernetes/kubernetes/pull/78397
2019-06-07 00:14:54 -07:00
Dalton Hubble d449477272 Update Grafana from v6.2.1 to v6.2.2
* https://github.com/grafana/grafana/releases/tag/v6.2.2
2019-06-07 00:07:54 -07:00
Dalton Hubble 5303e32e38 Change DO worker_type default from s-1vcpu-1gb to s-1vcpu-2gb
* On DigitalOcean, `s-1vcpu-1gb` worker nodes have 1GB of RAM, which
is too small as a default, even for most cost constrained developers
2019-06-06 23:50:19 -07:00
Dalton Hubble da3f2b5d95 Adjust README example and Terraform version in docs
* Delay changing README example. Its prominent display
on github.com may lead to new users copying it, even
though it corresponds to an "in between releases" state
and v1.14.4 doesn't exist yet
* Leave docs tutorials the same, they can reflect master
2019-06-06 23:36:36 -07:00
Dalton Hubble 3276bf5878 Add migration instructions from Terraform v0.11 to v0.12
* Provide Terraform v0.11 to v0.12 migration guide. Show an
in-place strategy and a move resources strategy
* Describe in-place modifying an existing cluster and providers,
using the Terraform helper to edit syntax, and checking the
plan produces a zero diff
* Describe replacing existing clusters by creating a new config
directory for use with Terraform v0.12 only and moving resources
one by one
* Provide some limited advise on migrating non-Typhoon resources
2019-06-06 09:51:22 -07:00