typhoon

mirror of https://github.com/puppetmaster/typhoon.git synced 2025-10-03 14:24:37 +02:00

Author	SHA1	Message	Date
Dalton Hubble	36e31fc9fa	Add liveness and readiness probes to Grafana * https://github.com/grafana/grafana/issues/3302	2019-03-23 17:55:37 -07:00
Dalton Hubble	619a0370dc	Update Grafana from v6.0.1 to v6.0.2 * https://github.com/grafana/grafana/releases/tag/v6.0.2	2019-03-21 23:41:25 -07:00
Dalton Hubble	6dd2731046	Set cpu/memory resources requests/limits for some addons * Set resource requests and limits for Grafana and CLUO * Set resource requests for Prometheus, but allow usage to grow since needs vary widely * Leave nginx without resource requests/limits for now, its typically well behaved	2019-03-20 00:15:08 -07:00
Dalton Hubble	1feefbe9c6	Update Calico from v3.5.2 to v3.6.0 * Add calico-ipam CRDs and RBAC permissions * Switch IPAM from host-local to calico-ipam * `calico-ipam` subnets `ippools` (defaults to pod CIDR) into `ipamblocks` (defaults to /26, but set to /24 in Typhoon) * `host-local` subnets the pod CIDR based on the node PodCIDR field (set via kube-controller-manager as /24's) * Create a custom default IPv4 IPPool to ensure the block size is kept at /24 to allow 110 pods per node (Kubernetes default) * Retaining host-local was slightly preferred, but Calico v3.6 is migrating all usage to calico-ipam. The codepath that skipped calico-ipam for KDD was removed * https://docs.projectcalico.org/v3.6/release-notes/	2019-03-19 22:49:56 -07:00
Dalton Hubble	aa630003a4	Refresh Prometheus rules and Grafana dashboards * Refresh rules and dashboards from upstreams * Organize dashboards and stay below the ConfigMap size limit	2019-03-17 13:23:04 -07:00
Dalton Hubble	bf97a45b9d	Remove heapster manifests from addons * Heapster addon powers `kubectl top` * In early Kubernetes, people legitimately used and expected `kubectl top` to work, so the optional addon was provided * Today the standards are different. Many better monitoring tools exist, that are also less coupled to Kubernetes "kubectl top" reliance on a non-core extensions means its not in-scope for minimal Kubernetes clusters. No more exceptionalism * Finally, Heapster isn't that useful anymore. Its manifests have no need for Typhoon-specific modification * Look to prior releases if you still wish to apply heapster	2019-03-17 12:41:59 -07:00
Dalton Hubble	3d6a6d4adb	Re-add Kubelet metadata service dependency on DigitalOcean * Restore the original special-casing of DigitalOcean Kubelets * Fix node metadata InternalIP being set to the IP of the default gateway on DigitalOcean nodes (regressed in v1.12.3) * Reverts the "pretty" node names on DigitalOcean (worker-2 vs IP) * Closes #424 (full details)	2019-03-17 12:39:25 -07:00
Dalton Hubble	e0bee2e417	Update Prometheus from v2.7.2 to v2.8.0 * https://github.com/prometheus/prometheus/releases/tag/v2.8.0	2019-03-13 22:11:38 -07:00
Dalton Hubble	2019177b6b	Fix implicit map assignments to be explicit * Terraform v0.12 will require map assignments be explicit, part of v0.12 readiness	2019-03-12 01:19:54 -07:00
Dalton Hubble	9493ed3b1d	Change default iPXE kernel/initrd download from HTTP to HTTPS * Require an iPXE-enabled network boot environment with support for TLS downloads. PXE clients must chainload to iPXE firmware compiled with `DOWNLOAD_PROTO_HTTPS` enabled ([crypto](https://ipxe.org/crypto)) * iPXE's pre-compiled firmware binaries do _not_ enable HTTPS. Admins should build iPXE from source with support enabled * Affects the Container Linux and Flatcar Linux install profiles that pull from public downloads. No effect when cached_install=true or using Fedora Atomic, as those download from Matchbox * Add `download_protocol` variable. Recognizing boot firmware TLS support is difficult in some environments, set the protocol to "http" for the old behavior (discouraged)	2019-03-09 23:23:40 -08:00
Dalton Hubble	4201eb1efa	Update Grafana from v6.0.0 to v6.0.1 * https://github.com/grafana/grafana/releases/tag/v6.0.1	2019-03-09 12:44:18 -08:00
Dalton Hubble	fe96da27d7	Add support for terraform-provider-aws v2.0+ * Allow terraform-provider-aws >= v1.13, but < 3.0. No change to the minimum version, but allow using v2.x.y releases * Verify compatability with terraform-provider-aws v2.1.0	2019-03-09 12:06:44 -08:00
Dalton Hubble	3afd114f8c	Update mkdocs-material from v4.0.1 to v4.0.2	2019-03-04 23:11:02 -08:00
Dalton Hubble	4d9a692424	Update Prometheus from v2.7.1 to v2.7.2 * https://github.com/prometheus/prometheus/releases/tag/v2.7.2	2019-03-04 23:08:12 -08:00
Dalton Hubble	deec512c14	Resolve in-addr.arpa and ip6.arpa zones with CoreDNS kubernetes plugin * Resolve in-addr.arpa and ip6.arpa DNS PTR requests for Kubernetes service IPs and pod IPs * Previously, CoreDNS was configured to resolve in-addr.arpa PTR records for service IPs (but not pod IPs)	2019-03-04 23:03:00 -08:00
Dalton Hubble	5066a25d89	Add links and clarifications in CHANGES for release v1.13.4	2019-03-02 11:26:12 -08:00
Dalton Hubble	de251bd94f	Update tutorials to prefer newer provider plugins over min version * Minimum versions of Terraform provider plugins are enforced in each module already. Its better to provide examples with newer versions. Some folks don't update them * Previously, tutorials showed the minimum viable version of each terraform provider that might be used	2019-03-02 11:07:40 -08:00
Dalton Hubble	fc277eaab6	Document the GCP DNS admin requirement for cluster provisioning * Configure the google terraform provider to use GCP service account credentials with compute and dns admin privileges	2019-03-02 10:54:35 -08:00
Dalton Hubble	a08adc92b5	Update nginx-ingress from v0.22.0 to v0.23.0 * https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.23.0	2019-03-01 01:18:54 -08:00
Dalton Hubble	d42f42df4e	Re-measure cluster provision times and document	2019-03-01 01:15:08 -08:00
Dalton Hubble	4ff7fe2c29	Update Grafana dashboards from upstreams	2019-02-28 23:22:07 -08:00
Dalton Hubble	f598307998	Update Kubernetes from v1.13.3 to v1.13.4 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1134	2019-02-28 22:47:43 -08:00
Dalton Hubble	8ae552ebda	Update documentation for use with Ubiquiti EdgeOS * Show creation of a PXE-enabled network boot environment when using dnsmasq as the DHCP server * Recommend TFTP be served from /config/tftpboot since /config is preserved between firmware upgrades * Recommend compiling undionly.kpxe from source to enable TLS features * Add a note that equal-cost multi-path service IP routing (e.g. for ingress) requires EdgeOS v2.0. Previously, it was known that TLS handshakes couldn't be completed with packet balacing. I've verified this is no longer the case when using the v2.0 EdgeOS firmware, ECMP works as expected.	2019-02-27 23:36:27 -08:00
Dalton Hubble	daee5a9d60	Update Grafana from v6.0.0-beta3 to v6.0.0 * https://github.com/grafana/grafana/releases/tag/v6.0.0 * http://docs.grafana.org/guides/whats-new-in-v6-0/	2019-02-25 21:43:43 -08:00
Dalton Hubble	73ae5d5649	Update Calico from v3.5.1 to v3.5.2 * https://docs.projectcalico.org/v3.5/releases/	2019-02-25 21:23:13 -08:00
Dalton Hubble	42d7222f3d	Add a readiness probe to CoreDNS * https://github.com/poseidon/terraform-render-bootkube/pull/115	2019-02-23 13:25:23 -08:00
Dalton Hubble	d10c2b4cb9	Update Grafana from v6.0.0-beta2 to v6.0.0-beta3 * Update Grafana dashboards	2019-02-23 13:03:25 -08:00
Dalton Hubble	7f8572030d	Upgrade to support terraform-provider-google v2.0+ * Support terraform-provider-google v1.19.0, v1.19.1, v1.20.0 and v2.0+ (and allow for future 2.x.y releases) * Require terraform-provider-google v1.19.0 or newer. v1.19.0 introduced `network_interface` fields `network_ip` and `nat_ip` to deprecate `address` and `assigned_nat_ip`. Those deprecated fields are removed in terraform-provider-google v2.0 * https://github.com/terraform-providers/terraform-provider-google/releases/tag/v2.0.0	2019-02-20 02:33:32 -08:00
Dalton Hubble	4294bd0292	Assign Pod Priority classes to critical cluster and node components * Assign pod priorityClassNames to critical cluster and node components (higher is higher priority) to inform node out-of-resource eviction order and scheduler preemption and scheduling order * Priority Admission Controller has been enabled since Typhoon v1.11.1	2019-02-19 22:21:39 -08:00
Dalton Hubble	ba4c5de052	Set the Google Cloud minimum CPU platform to Intel Haswell * Intel Haswell or better is available in every zone around the world * Neither Kubernetes nor Typhoon have a particular minimum processor family. However, a few Google Cloud zones still default to Sandy/Ivy bridge (scheduled to shift April 2019). Price is only based on machine type so it is beneficial to opt for the next processor family * Intel Haswell is a suitable minimum since it still allows plenty of liberty in choosing any region or machine type * Likely a slight increase to preemption probability in a few zones, but any lower probability on Sandy/Ivy bridge is due to lower desirability as they're phased out * https://cloud.google.com/compute/docs/regions-zones/	2019-02-18 12:55:04 -08:00
Dalton Hubble	e483c81ce9	Improve Prometheus rules and alerts and Grafana dashboards * Collate upstream rules, alerts, and dashboards and tune for use in Typhoon * Previously, a well-chosen (but older) set of rules, alerts, and dashboards were maintained to reflect metric name changes	2019-02-18 12:19:23 -08:00
Dalton Hubble	6fa3b8a13f	Upgrade Grafana to v6.0.0-beta2 and enable Explore UI * Upgrade Grafana from v5.4.3 to v6.0.0-beta2 * Enable Grafana Explore UI while still using only the Viewer role (inspect/edit without saving) * http://docs.grafana.org/guides/whats-new-in-v6-0/	2019-02-17 13:26:42 -08:00
Dalton Hubble	ac95e83249	Update mkdocs-material from v3.3.0 to v4.0.1	2019-02-16 15:55:38 -08:00
Dalton Hubble	d988822741	Document and recommend terraform-provider-matchbox v0.2.3 * https://github.com/coreos/terraform-provider-matchbox/releases/tag/v0.2.3	2019-02-16 15:07:49 -08:00
Dalton Hubble	170ef74eea	Remove Nginx Ingress default backend * nginx-ingress no longer requires a configured default-backend, it will respond with its own 404 page starting in v0.21.0 * https://github.com/kubernetes/ingress-nginx/pull/3196	2019-02-16 14:18:15 -08:00
Dalton Hubble	b13a651cfe	Drop metrics that are unset, high cardinality, or extraneous * https://github.com/coreos/prometheus-operator/pull/2387 * https://github.com/coreos/prometheus-operator/pull/1959	2019-02-10 23:56:11 -08:00
Dalton Hubble	9c59f393a5	Add Kubernetes pod name to metrics discovered from service endpoints * Prometheus queries from some upstreams use joins of node-exporter and kube-state-metrics metrics by (namespace,pod). Add the Kubernetes pod name to service endpoint metrics * Rename the kubernetes_namespace field to namespace * Honor labels since kube-state-metrics already include a `pod` field that should not be overridden	2019-02-10 23:54:30 -08:00
Dalton Hubble	3e4b3bfb04	Raise nginx-ingress liveness/readiness timeout * Under heavy load, avoid timeouts causing nginx-ingress restarts https://github.com/kubernetes/ingress-nginx/pull/3737	2019-02-09 12:53:09 -08:00
Dalton Hubble	584088397c	Update etcd from v3.3.11 to v3.3.12 * https://github.com/etcd-io/etcd/releases/tag/v3.3.12	2019-02-09 11:54:54 -08:00
Dalton Hubble	0200058e0e	Update Calico from v3.5.0 to v3.5.1 * Fix in confd https://github.com/projectcalico/confd/pull/205	2019-02-09 11:49:31 -08:00
Dalton Hubble	d5537405e1	Add CHANGES note about reducing the pod eviciton timeout v1.13.3	2019-02-02 14:54:18 -08:00
Dalton Hubble	949ce21fb2	Update Prometheus from v2.7.0 to v2.7.1 * https://github.com/prometheus/prometheus/releases/tag/v2.7.1	2019-02-02 00:13:24 -08:00
Dalton Hubble	ccd96c37da	Update Kubernetes from v1.13.2 to v1.13.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1133	2019-02-01 23:26:13 -08:00
Carlos Cobo	acd539f865	Fix architecture title for DigitalOcean (#390 )	2019-02-01 23:20:06 -08:00
Dalton Hubble	244a1a601a	Switch CoreDNS to use the forward plugin instead of proxy * Use the forward plugin to forward to upstream resolvers, instead of the proxy plugin. The forward plugin is reported to be a faster alternative since it can re-use open sockets * https://coredns.io/explugins/forward/ * https://coredns.io/plugins/proxy/ * https://github.com/kubernetes/kubernetes/issues/73254	2019-01-30 22:25:23 -08:00
Dalton Hubble	d02af3d40d	Update mkdocs-material from v3.2.0 to v3.3.0 * Fix minor docs typos and errors * Allow a transient verison of the six PyPi package, the docs build system can use the 0.12.0 (0.11.0 broke sync tools so pinning to 0.10.0 was previously needed)	2019-01-29 23:16:57 -08:00
Dalton Hubble	130daeac26	Update Prometheus from v2.6.1 to v2.7.0	2019-01-29 22:31:20 -08:00
Dalton Hubble	1ab06f69d7	Update flannel from v0.10.0 to v0.11.0 * https://github.com/coreos/flannel/releases/tag/v0.11.0	2019-01-29 21:51:25 -08:00
Dalton Hubble	eb08593eae	Fix azure provider warning, rename a public_ip field * azurerm_public_ip (used internally) added a field `allocation_method` to replace the field `public_ip_address_allocation` (deprecated) * Require terraform-provider-azurerm v1.21+ * https://github.com/terraform-providers/terraform-provider-azurerm/pull/2576	2019-01-27 17:52:35 -08:00
Dalton Hubble	e9659a8539	Update Calico from v3.4.0 to v3.5.0 * https://docs.projectcalico.org/v3.5/releases/	2019-01-27 16:34:30 -08:00

... 3 4 5 6 7 ...

802 Commits