Commit Graph

1561 Commits

Author SHA1 Message Date
Dalton Hubble aaa8e0261a Add Google Cloud worker instances to a target pool
* Background: A managed instance group of workers is used in backend
services for global load balancing (HTTP/HTTPS Ingress) and output
for custom global load balancing use cases
* Add worker instances to a target pool load balancing TCP/UDP
applications (NodePort or proxied). Output as `worker_target_pool`
* Health check for workers with a healthy Ingress controller. Forward
rules (regional) to target pools don't support different external and
internal ports so choosing nodes with Ingress allows proxying as a
workaround
* A target pool is a logical grouping only. It doesn't add costs to
clusters or worker pools
2019-04-01 21:03:48 -07:00
Dalton Hubble ae3a8a5770 Update mkdocs-material from v4.1.0 to v4.1.1 2019-03-31 18:23:18 -07:00
Dalton Hubble b3ec5f73e3 Update Calico from v3.6.0 to v3.6.1
* https://docs.projectcalico.org/v3.6/release-notes/
2019-03-31 17:43:43 -07:00
Dalton Hubble 3e9dc28a00 Update Prometheus from v2.8.0 to v2.8.1
* https://github.com/prometheus/prometheus/releases/tag/v2.8.1
2019-03-31 17:40:20 -07:00
Dalton Hubble 46196af500 Remove Haswell minimum CPU platform requirement
* Google Cloud API implements `min_cpu_platform` to mean
"use exactly this CPU"
* Fix error creating clusters in newer regions lacking Haswell
platform (e.g. europe-west2) (#438)
* Reverts #405, added in v1.13.4
* Original goal of ignoring old Ivy/Sandy bridge CPUs in older regions
will be achieved shortly anyway. Google Cloud is deprecating those CPUs
in April 2019
* https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform#how_selecting_a_minimum_cpu_platform_works
2019-03-27 19:51:32 -07:00
Dalton Hubble 5a1bc423a1 Announce Fedora Atomic modules won't be updated beyond v1.13.x
* Thank you Project Atomic team and users
* See the deprecation announcement https://typhoon.psdn.io/announce/#march-27-2019
2019-03-26 23:56:33 -07:00
Dalton Hubble 32fe72fb2d Update mkdocs and plugin versions used in tutorials
* Recommend provider plugin versions that are currently used
by the author
* Recommend updating terraform-provider-ct plugin from v0.3.0
to v0.3.1
* https://github.com/coreos/terraform-provider-ct/releases
2019-03-26 01:00:44 -07:00
Dalton Hubble 4fea526ebf Update Kubernetes from v1.13.4 to v1.13.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1135
2019-03-25 21:43:47 -07:00
Dalton Hubble 41a9d86bc3 Add NetworkPolicy to limit traffic into Prometheus
* Allow traffic from Grafana to Prometheus in monitoring
* Allow traffic from Prometheus to Prometheus in monitoring
* NetworkPolicy denies non-whitelisted traffic. Define policy
to allow other access
2019-03-23 21:38:34 -07:00
Dalton Hubble 36e31fc9fa Add liveness and readiness probes to Grafana
* https://github.com/grafana/grafana/issues/3302
2019-03-23 17:55:37 -07:00
Dalton Hubble 619a0370dc Update Grafana from v6.0.1 to v6.0.2
* https://github.com/grafana/grafana/releases/tag/v6.0.2
2019-03-21 23:41:25 -07:00
Dalton Hubble 6dd2731046 Set cpu/memory resources requests/limits for some addons
* Set resource requests and limits for Grafana and CLUO
* Set resource requests for Prometheus, but allow usage
to grow since needs vary widely
* Leave nginx without resource requests/limits for now,
its typically well behaved
2019-03-20 00:15:08 -07:00
Dalton Hubble 1feefbe9c6 Update Calico from v3.5.2 to v3.6.0
* Add calico-ipam CRDs and RBAC permissions
* Switch IPAM from host-local to calico-ipam
  * `calico-ipam` subnets `ippools` (defaults to pod CIDR) into
`ipamblocks` (defaults to /26, but set to /24 in Typhoon)
  * `host-local` subnets the pod CIDR based on the node PodCIDR
field (set via kube-controller-manager as /24's)
* Create a custom default IPv4 IPPool to ensure the block size
is kept at /24 to allow 110 pods per node (Kubernetes default)
* Retaining host-local was slightly preferred, but Calico v3.6
is migrating all usage to calico-ipam. The codepath that skipped
calico-ipam for KDD was removed
*  https://docs.projectcalico.org/v3.6/release-notes/
2019-03-19 22:49:56 -07:00
Dalton Hubble aa630003a4 Refresh Prometheus rules and Grafana dashboards
* Refresh rules and dashboards from upstreams
* Organize dashboards and stay below the ConfigMap size
limit
2019-03-17 13:23:04 -07:00
Dalton Hubble bf97a45b9d Remove heapster manifests from addons
* Heapster addon powers `kubectl top`
* In early Kubernetes, people legitimately used and expected
`kubectl top` to work, so the optional addon was provided
* Today the standards are different. Many better monitoring
tools exist, that are also less coupled to Kubernetes "kubectl
top" reliance on a non-core extensions means its not in-scope
for minimal Kubernetes clusters. No more exceptionalism
* Finally, Heapster isn't that useful anymore. Its manifests
have no need for Typhoon-specific modification
* Look to prior releases if you still wish to apply heapster
2019-03-17 12:41:59 -07:00
Dalton Hubble 3d6a6d4adb Re-add Kubelet metadata service dependency on DigitalOcean
* Restore the original special-casing of DigitalOcean Kubelets
* Fix node metadata InternalIP being set to the IP of the default
gateway on DigitalOcean nodes (regressed in v1.12.3)
* Reverts the "pretty" node names on DigitalOcean (worker-2 vs IP)
* Closes #424 (full details)
2019-03-17 12:39:25 -07:00
Dalton Hubble e0bee2e417 Update Prometheus from v2.7.2 to v2.8.0
* https://github.com/prometheus/prometheus/releases/tag/v2.8.0
2019-03-13 22:11:38 -07:00
Dalton Hubble 2019177b6b Fix implicit map assignments to be explicit
* Terraform v0.12 will require map assignments be explicit,
part of v0.12 readiness
2019-03-12 01:19:54 -07:00
Dalton Hubble 9493ed3b1d Change default iPXE kernel/initrd download from HTTP to HTTPS
* Require an iPXE-enabled network boot environment with support for
TLS downloads. PXE clients must chainload to iPXE firmware compiled
with `DOWNLOAD_PROTO_HTTPS` enabled ([crypto](https://ipxe.org/crypto))
* iPXE's pre-compiled firmware binaries do _not_ enable HTTPS. Admins
should build iPXE from source with support enabled
* Affects the Container Linux and Flatcar Linux install profiles that
pull from public downloads. No effect when cached_install=true
or using Fedora Atomic, as those download from Matchbox
* Add `download_protocol` variable. Recognizing boot firmware TLS
support is difficult in some environments, set the protocol to "http"
for the old behavior (discouraged)
2019-03-09 23:23:40 -08:00
Dalton Hubble 4201eb1efa Update Grafana from v6.0.0 to v6.0.1
* https://github.com/grafana/grafana/releases/tag/v6.0.1
2019-03-09 12:44:18 -08:00
Dalton Hubble fe96da27d7 Add support for terraform-provider-aws v2.0+
* Allow terraform-provider-aws >= v1.13, but < 3.0. No change
to the minimum version, but allow using v2.x.y releases
* Verify compatability with terraform-provider-aws v2.1.0
2019-03-09 12:06:44 -08:00
Dalton Hubble 3afd114f8c Update mkdocs-material from v4.0.1 to v4.0.2 2019-03-04 23:11:02 -08:00
Dalton Hubble 4d9a692424 Update Prometheus from v2.7.1 to v2.7.2
* https://github.com/prometheus/prometheus/releases/tag/v2.7.2
2019-03-04 23:08:12 -08:00
Dalton Hubble deec512c14 Resolve in-addr.arpa and ip6.arpa zones with CoreDNS kubernetes plugin
* Resolve in-addr.arpa and ip6.arpa DNS PTR requests for Kubernetes
service IPs and pod IPs
* Previously, CoreDNS was configured to resolve in-addr.arpa PTR
records for service IPs (but not pod IPs)
2019-03-04 23:03:00 -08:00
Dalton Hubble 5066a25d89 Add links and clarifications in CHANGES for release 2019-03-02 11:26:12 -08:00
Dalton Hubble de251bd94f Update tutorials to prefer newer provider plugins over min version
* Minimum versions of Terraform provider plugins are enforced in
each module already. Its better to provide examples with newer
versions. Some folks don't update them
* Previously, tutorials showed the minimum viable version of each
terraform provider that might be used
2019-03-02 11:07:40 -08:00
Dalton Hubble fc277eaab6 Document the GCP DNS admin requirement for cluster provisioning
* Configure the google terraform provider to use GCP service
account credentials with compute and dns admin privileges
2019-03-02 10:54:35 -08:00
Dalton Hubble a08adc92b5 Update nginx-ingress from v0.22.0 to v0.23.0
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.23.0
2019-03-01 01:18:54 -08:00
Dalton Hubble d42f42df4e Re-measure cluster provision times and document 2019-03-01 01:15:08 -08:00
Dalton Hubble 4ff7fe2c29 Update Grafana dashboards from upstreams 2019-02-28 23:22:07 -08:00
Dalton Hubble f598307998 Update Kubernetes from v1.13.3 to v1.13.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1134
2019-02-28 22:47:43 -08:00
Dalton Hubble 8ae552ebda Update documentation for use with Ubiquiti EdgeOS
* Show creation of a PXE-enabled network boot environment when
using dnsmasq as the DHCP server
* Recommend TFTP be served from /config/tftpboot since /config
is preserved between firmware upgrades
* Recommend compiling undionly.kpxe from source to enable
TLS features
* Add a note that equal-cost multi-path service IP routing
(e.g. for ingress) requires EdgeOS v2.0. Previously, it was known
that TLS handshakes couldn't be completed with packet balacing.
I've verified this is no longer the case when using the v2.0
EdgeOS firmware, ECMP works as expected.
2019-02-27 23:36:27 -08:00
Dalton Hubble daee5a9d60 Update Grafana from v6.0.0-beta3 to v6.0.0
* https://github.com/grafana/grafana/releases/tag/v6.0.0
* http://docs.grafana.org/guides/whats-new-in-v6-0/
2019-02-25 21:43:43 -08:00
Dalton Hubble 73ae5d5649 Update Calico from v3.5.1 to v3.5.2
* https://docs.projectcalico.org/v3.5/releases/
2019-02-25 21:23:13 -08:00
Dalton Hubble 42d7222f3d Add a readiness probe to CoreDNS
* https://github.com/poseidon/terraform-render-bootkube/pull/115
2019-02-23 13:25:23 -08:00
Dalton Hubble d10c2b4cb9 Update Grafana from v6.0.0-beta2 to v6.0.0-beta3
* Update Grafana dashboards
2019-02-23 13:03:25 -08:00
Dalton Hubble 7f8572030d Upgrade to support terraform-provider-google v2.0+
* Support terraform-provider-google v1.19.0, v1.19.1, v1.20.0
and v2.0+ (and allow for future 2.x.y releases)
* Require terraform-provider-google v1.19.0 or newer. v1.19.0
introduced `network_interface` fields `network_ip` and `nat_ip`
to deprecate `address` and `assigned_nat_ip`. Those deprecated
fields are removed in terraform-provider-google v2.0
* https://github.com/terraform-providers/terraform-provider-google/releases/tag/v2.0.0
2019-02-20 02:33:32 -08:00
Dalton Hubble 4294bd0292 Assign Pod Priority classes to critical cluster and node components
* Assign pod priorityClassNames to critical cluster and node
components (higher is higher priority) to inform node out-of-resource
eviction order and scheduler preemption and scheduling order
* Priority Admission Controller has been enabled since Typhoon
v1.11.1
2019-02-19 22:21:39 -08:00
Dalton Hubble ba4c5de052 Set the Google Cloud minimum CPU platform to Intel Haswell
* Intel Haswell or better is available in every zone around the world
* Neither Kubernetes nor Typhoon have a particular minimum processor
family. However, a few Google Cloud zones still default to Sandy/Ivy
bridge (scheduled to shift April 2019). Price is only based on machine
type so it is beneficial to opt for the next processor family
* Intel Haswell is a suitable minimum since it still allows plenty of
liberty in choosing any region or machine type
* Likely a slight increase to preemption probability in a few zones,
but any lower probability on Sandy/Ivy bridge is due to lower
desirability as they're phased out
* https://cloud.google.com/compute/docs/regions-zones/
2019-02-18 12:55:04 -08:00
Dalton Hubble e483c81ce9 Improve Prometheus rules and alerts and Grafana dashboards
* Collate upstream rules, alerts, and dashboards and tune for use
in Typhoon
* Previously, a well-chosen (but older) set of rules, alerts, and
dashboards were maintained to reflect metric name changes
2019-02-18 12:19:23 -08:00
Dalton Hubble 6fa3b8a13f Upgrade Grafana to v6.0.0-beta2 and enable Explore UI
* Upgrade Grafana from v5.4.3 to v6.0.0-beta2
* Enable Grafana Explore UI while still using only the Viewer
role (inspect/edit without saving)
* http://docs.grafana.org/guides/whats-new-in-v6-0/
2019-02-17 13:26:42 -08:00
Dalton Hubble ac95e83249 Update mkdocs-material from v3.3.0 to v4.0.1 2019-02-16 15:55:38 -08:00
Dalton Hubble d988822741 Document and recommend terraform-provider-matchbox v0.2.3
* https://github.com/coreos/terraform-provider-matchbox/releases/tag/v0.2.3
2019-02-16 15:07:49 -08:00
Dalton Hubble 170ef74eea Remove Nginx Ingress default backend
* nginx-ingress no longer requires a configured default-backend,
it will respond with its own 404 page starting in v0.21.0
* https://github.com/kubernetes/ingress-nginx/pull/3196
2019-02-16 14:18:15 -08:00
Dalton Hubble b13a651cfe Drop metrics that are unset, high cardinality, or extraneous
* https://github.com/coreos/prometheus-operator/pull/2387
* https://github.com/coreos/prometheus-operator/pull/1959
2019-02-10 23:56:11 -08:00
Dalton Hubble 9c59f393a5 Add Kubernetes pod name to metrics discovered from service endpoints
* Prometheus queries from some upstreams use joins of node-exporter
and kube-state-metrics metrics by (namespace,pod). Add the Kubernetes
pod name to service endpoint metrics
* Rename the kubernetes_namespace field to namespace
* Honor labels since kube-state-metrics already include a `pod` field
that should not be overridden
2019-02-10 23:54:30 -08:00
Dalton Hubble 3e4b3bfb04 Raise nginx-ingress liveness/readiness timeout
* Under heavy load, avoid timeouts causing nginx-ingress
restarts https://github.com/kubernetes/ingress-nginx/pull/3737
2019-02-09 12:53:09 -08:00
Dalton Hubble 584088397c Update etcd from v3.3.11 to v3.3.12
* https://github.com/etcd-io/etcd/releases/tag/v3.3.12
2019-02-09 11:54:54 -08:00
Dalton Hubble 0200058e0e Update Calico from v3.5.0 to v3.5.1
* Fix in confd https://github.com/projectcalico/confd/pull/205
2019-02-09 11:49:31 -08:00
Dalton Hubble d5537405e1 Add CHANGES note about reducing the pod eviciton timeout 2019-02-02 14:54:18 -08:00