mirror of
https://github.com/puppetmaster/typhoon.git
synced 2025-08-17 15:44:57 +02:00
Compare commits
94 Commits
Author | SHA1 | Date | |
---|---|---|---|
5066a25d89 | |||
de251bd94f | |||
fc277eaab6 | |||
a08adc92b5 | |||
d42f42df4e | |||
4ff7fe2c29 | |||
f598307998 | |||
8ae552ebda | |||
daee5a9d60 | |||
73ae5d5649 | |||
42d7222f3d | |||
d10c2b4cb9 | |||
7f8572030d | |||
4294bd0292 | |||
ba4c5de052 | |||
e483c81ce9 | |||
6fa3b8a13f | |||
ac95e83249 | |||
d988822741 | |||
170ef74eea | |||
b13a651cfe | |||
9c59f393a5 | |||
3e4b3bfb04 | |||
584088397c | |||
0200058e0e | |||
d5537405e1 | |||
949ce21fb2 | |||
ccd96c37da | |||
acd539f865 | |||
244a1a601a | |||
d02af3d40d | |||
130daeac26 | |||
1ab06f69d7 | |||
eb08593eae | |||
e9659a8539 | |||
6b87132aa1 | |||
f5ff003d0e | |||
d697dd46dc | |||
2f3097ebea | |||
f4d3508578 | |||
67fb9602e7 | |||
c8a85fabe1 | |||
7eafa59d8f | |||
679079b242 | |||
1d27dc6528 | |||
b74cc8afd2 | |||
1d66ad33f7 | |||
4d32b79c6f | |||
df4c0ba05d | |||
bfe0c74793 | |||
60c70797ec | |||
6795a753ea | |||
b57273b6f1 | |||
812a1adb49 | |||
1c6a0392ad | |||
5263d00a6f | |||
66e1365cc4 | |||
ea8b0d1c84 | |||
f2f4deb8bb | |||
4d2f33aee6 | |||
d42f47c49e | |||
53e549f233 | |||
bcb200186d | |||
479d498024 | |||
e0c032be94 | |||
b74bf11772 | |||
018c5edc25 | |||
8aeec0b9b5 | |||
ff6ab571f3 | |||
991fb44c37 | |||
d31f444fcd | |||
76d993cdae | |||
b6016d0a26 | |||
eec314b52f | |||
bcce02a9ce | |||
42c523e6a2 | |||
64b4c10418 | |||
872b11b948 | |||
5b27d8d889 | |||
840b73f9ba | |||
915af3c6cc | |||
c6586b69fd | |||
ea3fc6d2a7 | |||
c8c43f3991 | |||
58472438ce | |||
7f8e781ae4 | |||
56e9a82984 | |||
e95b856a22 | |||
31f48a81a8 | |||
2b3f61d1bb | |||
8fd2978c31 | |||
7de03a1279 | |||
be9f7b87d6 | |||
721c847943 |
4
.github/ISSUE_TEMPLATE.md
vendored
4
.github/ISSUE_TEMPLATE.md
vendored
@ -5,8 +5,8 @@
|
|||||||
### Environment
|
### Environment
|
||||||
|
|
||||||
* Platform: aws, azure, bare-metal, google-cloud, digital-ocean
|
* Platform: aws, azure, bare-metal, google-cloud, digital-ocean
|
||||||
* OS: container-linux, fedora-atomic
|
* OS: container-linux, flatcar-linux, or fedora-atomic
|
||||||
* Ref: Release version or Git SHA (reporting latest is **not** helpful)
|
* Release: Typhoon version or Git SHA (reporting latest is **not** helpful)
|
||||||
* Terraform: `terraform version` (reporting latest is **not** helpful)
|
* Terraform: `terraform version` (reporting latest is **not** helpful)
|
||||||
* Plugins: Provider plugin versions (reporting latest is **not** helpful)
|
* Plugins: Provider plugin versions (reporting latest is **not** helpful)
|
||||||
|
|
||||||
|
158
CHANGES.md
158
CHANGES.md
@ -4,6 +4,164 @@ Notable changes between versions.
|
|||||||
|
|
||||||
## Latest
|
## Latest
|
||||||
|
|
||||||
|
## v1.13.4
|
||||||
|
|
||||||
|
* Kubernetes [v1.13.4](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1134)
|
||||||
|
* Update etcd from v3.3.11 to [v3.3.12](https://github.com/etcd-io/etcd/releases/tag/v3.3.12)
|
||||||
|
* Update Calico from v3.5.0 to [v3.5.2](https://docs.projectcalico.org/v3.5/releases/)
|
||||||
|
* Assign priorityClassNames to critical cluster and node components ([#406](https://github.com/poseidon/typhoon/pull/406))
|
||||||
|
* Inform node out-of-resource eviction and scheduler preemption and ordering
|
||||||
|
* Add CoreDNS readiness probe ([#410](https://github.com/poseidon/typhoon/pull/410))
|
||||||
|
|
||||||
|
#### Bare-Metal
|
||||||
|
|
||||||
|
* Recommend updating [terraform-provider-matchbox](https://github.com/coreos/terraform-provider-matchbox) plugin from v0.2.2 to [v0.2.3](https://github.com/coreos/terraform-provider-matchbox/releases/tag/v0.2.3) ([#402](https://github.com/poseidon/typhoon/pull/402))
|
||||||
|
* Improve docs on using Ubiquiti EdgeOS with bare-metal clusters ([#413](https://github.com/poseidon/typhoon/pull/413))
|
||||||
|
|
||||||
|
#### Google Cloud
|
||||||
|
|
||||||
|
* Support `terraform-provider-google` v2.0+ ([#407](https://github.com/poseidon/typhoon/pull/407))
|
||||||
|
* Require `terraform-provider-google` v1.19+ (**action required**)
|
||||||
|
* Set the minimum CPU platform to Intel Haswell ([#405](https://github.com/poseidon/typhoon/pull/405))
|
||||||
|
* Haswell or better is available in every zone (no price change)
|
||||||
|
* A few zones still default to Sandy/Ivy Bridge (shifts in April 2019)
|
||||||
|
|
||||||
|
#### Addons
|
||||||
|
|
||||||
|
* Modernize Prometheus rules and alerts ([#404](https://github.com/poseidon/typhoon/pull/404))
|
||||||
|
* Drop extraneous metrics ([#397](https://github.com/poseidon/typhoon/pull/397))
|
||||||
|
* Add `pod` name label to metrics discovered via service endpoints
|
||||||
|
* Rename `kubernetes_namespace` label to `namespace`
|
||||||
|
* Modernize Grafana and dashboards, see [docs](https://typhoon.psdn.io/addons/grafana/) ([#403](https://github.com/poseidon/typhoon/pull/403), [#404](https://github.com/poseidon/typhoon/pull/404))
|
||||||
|
* Upgrade Grafana from v5.4.3 to [v6.0.0](https://github.com/grafana/grafana/releases/tag/v6.0.0)!
|
||||||
|
* Enable Grafana [Explore](http://docs.grafana.org/guides/whats-new-in-v6-0/#explore) UI as a Viewer (inspect/edit without saving)
|
||||||
|
* Update nginx-ingress from v0.22.0 to v0.23.0
|
||||||
|
* Raise nginx-ingress liveness/readiness timeout to 5 seconds
|
||||||
|
* Remove nginx-ingess default-backend ([#401](https://github.com/poseidon/typhoon/pull/401))
|
||||||
|
|
||||||
|
#### Fedora Atomic
|
||||||
|
|
||||||
|
* Build Kubelet [system container](https://github.com/poseidon/system-containers) with buildah. The image is an OCI format and slightly larger.
|
||||||
|
|
||||||
|
## v1.13.3
|
||||||
|
|
||||||
|
* Kubernetes [v1.13.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1133)
|
||||||
|
* Update etcd from v3.3.10 to [v3.3.11](https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.3.md#v3311-2019-1-11)
|
||||||
|
* Update CoreDNS from v1.3.0 to [v1.3.1](https://coredns.io/2019/01/13/coredns-1.3.1-release/)
|
||||||
|
* Switch from the `proxy` plugin to the faster `forward` plugin for upsteam resolvers
|
||||||
|
* Update Calico from v3.4.0 to [v3.5.0](https://docs.projectcalico.org/v3.5/releases/)
|
||||||
|
* Update flannel from v0.10.0 to [v0.11.0](https://github.com/coreos/flannel/releases/tag/v0.11.0)
|
||||||
|
* Reduce pod eviction timeout for deleting pods on unready nodes to 1 minute
|
||||||
|
* Respond more quickly to node preemption (previously 5 minutes)
|
||||||
|
* Fix automatic worker deletion on shutdown for cloud platforms
|
||||||
|
* Lowering Kubelet privileges in [#372](https://github.com/poseidon/typhoon/pull/372) dropped a needed node deletion authorization. Scale-in due to manual terraform apply (any cloud), AWS spot termination, or Azure low priority deletion left old nodes registered, requiring manual deletion (`kubectl delete node name`)
|
||||||
|
|
||||||
|
#### AWS
|
||||||
|
|
||||||
|
* Add `ingress_zone_id` output with the NLB DNS name's Route53 zone for use in alias records ([#380](https://github.com/poseidon/typhoon/pull/380))
|
||||||
|
|
||||||
|
#### Azure
|
||||||
|
|
||||||
|
* Fix azure provider warning, `public_ip` `allocation_method` replaces `public_ip_address_allocation`
|
||||||
|
* Require `terraform-provider-azurerm` v1.21+ (action required)
|
||||||
|
|
||||||
|
#### Addons
|
||||||
|
|
||||||
|
* Update nginx-ingress from v0.21.0 to v0.22.0
|
||||||
|
* Update Prometheus from v2.6.0 to v2.7.1
|
||||||
|
* Update kube-state-metrics from v1.4.0 to v1.5.0
|
||||||
|
* Fix ClusterRole to collect and export PodDisruptionBudget metrics ([#383](https://github.com/poseidon/typhoon/pull/383))
|
||||||
|
* Update node-exporter from v0.15.2 to v0.17.0
|
||||||
|
* Update Grafana from v5.4.2 to v5.4.3
|
||||||
|
|
||||||
|
## v1.13.2
|
||||||
|
|
||||||
|
* Kubernetes [v1.13.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1132)
|
||||||
|
* Add ServiceAccounts for `kube-apiserver` and `kube-scheduler` ([#370](https://github.com/poseidon/typhoon/pull/370))
|
||||||
|
* Use lower-privilege TLS client certificates for Kubelets ([#372](https://github.com/poseidon/typhoon/pull/372))
|
||||||
|
* Use HTTPS liveness probes for `kube-scheduler` and `kube-controller-manager` ([#377](https://github.com/poseidon/typhoon/pull/377))
|
||||||
|
* Update CoreDNS from v1.2.6 to [v1.3.0](https://coredns.io/2018/12/15/coredns-1.3.0-release/)
|
||||||
|
* Allow the `certificates.k8s.io` API to issue certificates signed by the cluster CA ([#376](https://github.com/poseidon/typhoon/pull/376))
|
||||||
|
* Configure controller manager to sign CSRs that are manually [approved](https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster) by an administrator
|
||||||
|
|
||||||
|
#### AWS
|
||||||
|
|
||||||
|
* Change `controller_type` and `worker_type` default from t2.small to t3.small ([#365](https://github.com/poseidon/typhoon/pull/365))
|
||||||
|
* t3.small is cheaper, provides 2 vCPU (instead of 1), and 5 Gbps of pod-to-pod bandwidth!
|
||||||
|
|
||||||
|
#### Bare-Metal
|
||||||
|
|
||||||
|
* Remove the `kubeconfig` output variable
|
||||||
|
|
||||||
|
#### Addons
|
||||||
|
|
||||||
|
* Update Prometheus from v2.5.0 to v2.6.0
|
||||||
|
|
||||||
|
## v1.13.1
|
||||||
|
|
||||||
|
* Kubernetes [v1.13.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1131)
|
||||||
|
* Update Calico from v3.3.2 to [v3.4.0](https://docs.projectcalico.org/v3.4/releases/) ([#362](https://github.com/poseidon/typhoon/pull/362))
|
||||||
|
* Install CNI plugins with an init container rather than a sidecar
|
||||||
|
* Improve the `calico-node` ClusterRole
|
||||||
|
* Recommend updating `terraform-provider-ct` plugin from v0.2.1 to v0.3.0 ([#363](https://github.com/poseidon/typhoon/pull/363))
|
||||||
|
* [Migration](https://typhoon.psdn.io/topics/maintenance/#upgrade-terraform-provider-ct) instructions for upgrading `terraform-provider-ct` in-place for v1.12.2+ clusters (**action required**)
|
||||||
|
* [Require](https://typhoon.psdn.io/topics/maintenance/#terraform-plugins-directory) switching from `~/.terraformrc` to the Terraform [third-party plugins](https://www.terraform.io/docs/configuration/providers.html#third-party-plugins) directory `~/.terraform.d/plugins/`
|
||||||
|
* Require Container Linux 1688.5.3 or newer
|
||||||
|
|
||||||
|
#### Google Cloud
|
||||||
|
|
||||||
|
* Increase TCP proxy apiserver backend service timeout from 1 minute to 5 minutes ([#361](https://github.com/poseidon/typhoon/pull/361))
|
||||||
|
* Align `port-forward` behavior closer to AWS/Azure (no timeout)
|
||||||
|
|
||||||
|
#### Addons
|
||||||
|
|
||||||
|
* Update Grafana from v5.4.0 to v5.4.2
|
||||||
|
|
||||||
|
## v1.13.0
|
||||||
|
|
||||||
|
* Kubernetes [v1.13.0](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1130)
|
||||||
|
* Update Calico from v3.3.1 to [v3.3.2](https://docs.projectcalico.org/v3.3/releases/)
|
||||||
|
|
||||||
|
#### Addons
|
||||||
|
|
||||||
|
* Update Grafana from v5.3.4 to v5.4.0
|
||||||
|
* Disable Grafana login form, since admin user can't be disabled ([#352](https://github.com/poseidon/typhoon/pull/352))
|
||||||
|
* Example manifests aim to provide a read-only dashboard view
|
||||||
|
|
||||||
|
## v1.12.3
|
||||||
|
|
||||||
|
* Kubernetes [v1.12.3](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.12.md#v1123)
|
||||||
|
* Add `enable_reporting` variable (default "false") to provide upstreams with usage data ([#345](https://github.com/poseidon/typhoon/pull/345))
|
||||||
|
* Change kube-apiserver `--kubelet-preferred-address-types` to InternalIP,ExternalIP,Hostname
|
||||||
|
* Update Calico from v3.3.0 to [v3.3.1](https://docs.projectcalico.org/v3.3/releases/)
|
||||||
|
* Disable Felix usage reporting by default ([#345](https://github.com/poseidon/typhoon/pull/345))
|
||||||
|
* Improve flannel manifests
|
||||||
|
* [Rename](https://github.com/poseidon/terraform-render-bootkube/commit/d045a8e6b8eccfbb9d69bb51953b5a93d23f67f7) `kube-flannel` DaemonSet to `flannel` and `kube-flannel-cfg` ConfigMap to `flannel-config`
|
||||||
|
* [Drop](https://github.com/poseidon/terraform-render-bootkube/commit/39f9afb3360ec642e5b98457c8bd07eda35b6c96) unused mounts and add a CPU resource request
|
||||||
|
* Update CoreDNS from v1.2.4 to [v1.2.6](https://coredns.io/2018/11/05/coredns-1.2.6-release/)
|
||||||
|
* Enable CoreDNS `loop` and `loadbalance` plugins ([#340](https://github.com/poseidon/typhoon/pull/340))
|
||||||
|
* Fix pod-checkpointer log noise and checkpointable pods detection ([#346](https://github.com/poseidon/typhoon/pull/346))
|
||||||
|
* Use kubernetes-incubator/bootkube v0.14.0
|
||||||
|
* [Recommend](https://typhoon.psdn.io/topics/maintenance/#terraform-plugins-directory) switching from `~/.terraformrc` to the Terraform [third-party plugins](https://www.terraform.io/docs/configuration/providers.html#third-party-plugins) directory `~/.terraform.d/plugins/`.
|
||||||
|
* Allows pinning `terraform-provider-ct` and `terraform-provider-matchbox` versions
|
||||||
|
* Improves safety of later plugin version migrations
|
||||||
|
|
||||||
|
#### Azure
|
||||||
|
|
||||||
|
* Use eviction policy `Delete` for `Low` priority virtual machine scale set workers ([#343](https://github.com/poseidon/typhoon/pull/343))
|
||||||
|
* Fix issue where Azure defaults to `Deallocate` eviction policy, which required manually restarting deallocated instances. `Delete` policy aligns Azure with AWS and GCP behavior.
|
||||||
|
* Require `terraform-provider-azurerm` v1.19+ (action required)
|
||||||
|
|
||||||
|
#### Bare-Metal
|
||||||
|
|
||||||
|
* Add Kubelet `/etc/iscsi` and `iscsadm` mounts on bare-metal for iSCSI ([#103](https://github.com/poseidon/typhoon/pull/103))
|
||||||
|
|
||||||
|
#### Addons
|
||||||
|
|
||||||
|
* Update nginx-ingress from v0.20.0 to v0.21.0
|
||||||
|
* Update Prometheus from v2.4.3 to v2.5.0
|
||||||
|
* Update Grafana from v5.3.2 to v5.3.4
|
||||||
|
|
||||||
## v1.12.2
|
## v1.12.2
|
||||||
|
|
||||||
* Kubernetes [v1.12.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.12.md#v1122)
|
* Kubernetes [v1.12.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.12.md#v1122)
|
||||||
|
35
README.md
35
README.md
@ -11,29 +11,32 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
|
|||||||
|
|
||||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||||
|
|
||||||
* Kubernetes v1.12.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
* Kubernetes v1.13.4 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||||
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/) and [preemption](https://typhoon.psdn.io/cl/google-cloud/#preemption) (varies by platform)
|
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [preemptible](https://typhoon.psdn.io/cl/google-cloud/#preemption) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
|
||||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
* Ready for Ingress, Prometheus, Grafana, CSI, or other [addons](https://typhoon.psdn.io/addons/overview/)
|
||||||
|
|
||||||
## Modules
|
## Modules
|
||||||
|
|
||||||
Typhoon provides a Terraform Module for each supported operating system and platform.
|
Typhoon provides a Terraform Module for each supported operating system and platform. Container Linux is a mature and reliable choice. Also, Kinvolk's Flatcar Linux fork is selectable on AWS and bare-metal.
|
||||||
|
|
||||||
| Platform | Operating System | Terraform Module | Status |
|
| Platform | Operating System | Terraform Module | Status |
|
||||||
|---------------|------------------|------------------|--------|
|
|---------------|------------------|------------------|--------|
|
||||||
| AWS | Container Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable |
|
| AWS | Container Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | stable |
|
||||||
| AWS | Fedora Atomic | [aws/fedora-atomic/kubernetes](aws/fedora-atomic/kubernetes) | alpha |
|
|
||||||
| Azure | Container Linux | [azure/container-linux/kubernetes](cl/azure.md) | alpha |
|
| Azure | Container Linux | [azure/container-linux/kubernetes](cl/azure.md) | alpha |
|
||||||
| Bare-Metal | Container Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
|
| Bare-Metal | Container Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
|
||||||
| Bare-Metal | Fedora Atomic | [bare-metal/fedora-atomic/kubernetes](bare-metal/fedora-atomic/kubernetes) | alpha |
|
|
||||||
| Digital Ocean | Container Linux | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta |
|
| Digital Ocean | Container Linux | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta |
|
||||||
| Digital Ocean | Fedora Atomic | [digital-ocean/fedora-atomic/kubernetes](digital-ocean/fedora-atomic/kubernetes) | alpha |
|
|
||||||
| Google Cloud | Container Linux | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | stable |
|
| Google Cloud | Container Linux | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | stable |
|
||||||
| Google Cloud | Fedora Atomic | [google-cloud/fedora-atomic/kubernetes](google-cloud/fedora-atomic/kubernetes) | alpha |
|
|
||||||
|
|
||||||
The AWS and bare-metal `container-linux` modules allow picking Red Hat Container Linux (formerly CoreOS Container Linux) or Kinvolk's Flatcar Linux friendly fork.
|
Fedora Atomic support is alpha and will evolve as Fedora Atomic is replaced by Fedora CoreOS.
|
||||||
|
|
||||||
|
| Platform | Operating System | Terraform Module | Status |
|
||||||
|
|---------------|------------------|------------------|--------|
|
||||||
|
| AWS | Fedora Atomic | [aws/fedora-atomic/kubernetes](aws/fedora-atomic/kubernetes) | alpha |
|
||||||
|
| Bare-Metal | Fedora Atomic | [bare-metal/fedora-atomic/kubernetes](bare-metal/fedora-atomic/kubernetes) | alpha |
|
||||||
|
| Digital Ocean | Fedora Atomic | [digital-ocean/fedora-atomic/kubernetes](digital-ocean/fedora-atomic/kubernetes) | alpha |
|
||||||
|
| Google Cloud | Fedora Atomic | [google-cloud/fedora-atomic/kubernetes](google-cloud/fedora-atomic/kubernetes) | alpha |
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
@ -47,7 +50,7 @@ Define a Kubernetes cluster by using the Terraform module for your chosen platfo
|
|||||||
|
|
||||||
```tf
|
```tf
|
||||||
module "google-cloud-yavin" {
|
module "google-cloud-yavin" {
|
||||||
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.12.2"
|
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes?ref=v1.13.4"
|
||||||
|
|
||||||
providers = {
|
providers = {
|
||||||
google = "google.default"
|
google = "google.default"
|
||||||
@ -87,10 +90,10 @@ In 4-8 minutes (varies by platform), the cluster will be ready. This Google Clou
|
|||||||
```sh
|
```sh
|
||||||
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
|
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
|
||||||
$ kubectl get nodes
|
$ kubectl get nodes
|
||||||
NAME STATUS AGE VERSION
|
NAME ROLES STATUS AGE VERSION
|
||||||
yavin-controller-0.c.example-com.internal Ready 6m v1.12.2
|
yavin-controller-0.c.example-com.internal controller,master Ready 6m v1.13.4
|
||||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.12.2
|
yavin-worker-jrbf.c.example-com.internal node Ready 5m v1.13.4
|
||||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.12.2
|
yavin-worker-mzdm.c.example-com.internal node Ready 5m v1.13.4
|
||||||
```
|
```
|
||||||
|
|
||||||
List the pods.
|
List the pods.
|
||||||
@ -102,6 +105,7 @@ kube-system calico-node-1cs8z 2/2 Running 0
|
|||||||
kube-system calico-node-d1l5b 2/2 Running 0 6m
|
kube-system calico-node-d1l5b 2/2 Running 0 6m
|
||||||
kube-system calico-node-sp9ps 2/2 Running 0 6m
|
kube-system calico-node-sp9ps 2/2 Running 0 6m
|
||||||
kube-system coredns-1187388186-zj5dl 1/1 Running 0 6m
|
kube-system coredns-1187388186-zj5dl 1/1 Running 0 6m
|
||||||
|
kube-system coredns-1187388186-dkh3o 1/1 Running 0 6m
|
||||||
kube-system kube-apiserver-zppls 1/1 Running 0 6m
|
kube-system kube-apiserver-zppls 1/1 Running 0 6m
|
||||||
kube-system kube-controller-manager-3271970485-gh9kt 1/1 Running 0 6m
|
kube-system kube-controller-manager-3271970485-gh9kt 1/1 Running 0 6m
|
||||||
kube-system kube-controller-manager-3271970485-h90v8 1/1 Running 1 6m
|
kube-system kube-controller-manager-3271970485-h90v8 1/1 Running 1 6m
|
||||||
@ -111,6 +115,7 @@ kube-system kube-proxy-njn47 1/1 Running 0
|
|||||||
kube-system kube-scheduler-3895335239-5x87r 1/1 Running 0 6m
|
kube-system kube-scheduler-3895335239-5x87r 1/1 Running 0 6m
|
||||||
kube-system kube-scheduler-3895335239-bzrrt 1/1 Running 1 6m
|
kube-system kube-scheduler-3895335239-bzrrt 1/1 Running 1 6m
|
||||||
kube-system pod-checkpointer-l6lrt 1/1 Running 0 6m
|
kube-system pod-checkpointer-l6lrt 1/1 Running 0 6m
|
||||||
|
kube-system pod-checkpointer-l6lrt-controller-0 1/1 Running 0 6m
|
||||||
```
|
```
|
||||||
|
|
||||||
## Non-Goals
|
## Non-Goals
|
||||||
|
36
addons/grafana/config.yaml
Normal file
36
addons/grafana/config.yaml
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
apiVersion: v1
|
||||||
|
kind: ConfigMap
|
||||||
|
metadata:
|
||||||
|
name: grafana-config
|
||||||
|
namespace: monitoring
|
||||||
|
data:
|
||||||
|
custom.ini: |+
|
||||||
|
[server]
|
||||||
|
http_port = 8080
|
||||||
|
|
||||||
|
[paths]
|
||||||
|
data = /var/lib/grafana
|
||||||
|
plugins = /var/lib/grafana/plugins
|
||||||
|
provisioning = /etc/grafana/provisioning
|
||||||
|
|
||||||
|
[users]
|
||||||
|
allow_sign_up = false
|
||||||
|
allow_org_create = false
|
||||||
|
# viewers can edit/inspect, but not save
|
||||||
|
viewers_can_edit = true
|
||||||
|
|
||||||
|
# Disable login form, since Grafana always creates an admin user
|
||||||
|
[auth]
|
||||||
|
disable_login_form = true
|
||||||
|
|
||||||
|
# Disable the user/pass login system
|
||||||
|
[auth.basic]
|
||||||
|
enabled = false
|
||||||
|
|
||||||
|
# Allow anonymous authentication with view-only authorization
|
||||||
|
[auth.anonymous]
|
||||||
|
enabled = true
|
||||||
|
org_role = Viewer
|
||||||
|
|
||||||
|
[analytics]
|
||||||
|
reporting_enabled = false
|
File diff suppressed because it is too large
Load Diff
@ -10,7 +10,15 @@ data:
|
|||||||
- name: prometheus
|
- name: prometheus
|
||||||
type: prometheus
|
type: prometheus
|
||||||
access: proxy
|
access: proxy
|
||||||
orgId: 1
|
|
||||||
url: http://prometheus.monitoring.svc.cluster.local
|
url: http://prometheus.monitoring.svc.cluster.local
|
||||||
version: 1
|
version: 1
|
||||||
editable: false
|
editable: false
|
||||||
|
loki.yaml: |+
|
||||||
|
apiVersion: 1
|
||||||
|
datasources:
|
||||||
|
- name: loki
|
||||||
|
type: loki
|
||||||
|
access: proxy
|
||||||
|
url: http://loki.monitoring.svc.cluster.local
|
||||||
|
version: 1
|
||||||
|
editable: false
|
||||||
|
@ -23,18 +23,10 @@ spec:
|
|||||||
spec:
|
spec:
|
||||||
containers:
|
containers:
|
||||||
- name: grafana
|
- name: grafana
|
||||||
image: grafana/grafana:5.3.2
|
image: grafana/grafana:6.0.0
|
||||||
env:
|
env:
|
||||||
- name: GF_SERVER_HTTP_PORT
|
- name: GF_PATHS_CONFIG
|
||||||
value: "8080"
|
value: "/etc/grafana/custom.ini"
|
||||||
- name: GF_AUTH_BASIC_ENABLED
|
|
||||||
value: "false"
|
|
||||||
- name: GF_AUTH_ANONYMOUS_ENABLED
|
|
||||||
value: "true"
|
|
||||||
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
|
|
||||||
value: Viewer
|
|
||||||
- name: GF_ANALYTICS_REPORTING_ENABLED
|
|
||||||
value: "false"
|
|
||||||
ports:
|
ports:
|
||||||
- name: http
|
- name: http
|
||||||
containerPort: 8080
|
containerPort: 8080
|
||||||
@ -46,19 +38,24 @@ spec:
|
|||||||
memory: 200Mi
|
memory: 200Mi
|
||||||
cpu: 200m
|
cpu: 200m
|
||||||
volumeMounts:
|
volumeMounts:
|
||||||
|
- name: config
|
||||||
|
mountPath: /etc/grafana
|
||||||
- name: datasources
|
- name: datasources
|
||||||
mountPath: /etc/grafana/provisioning/datasources
|
mountPath: /etc/grafana/provisioning/datasources
|
||||||
- name: dashboard-providers
|
- name: providers
|
||||||
mountPath: /etc/grafana/provisioning/dashboards
|
mountPath: /etc/grafana/provisioning/dashboards
|
||||||
- name: dashboards
|
- name: dashboards
|
||||||
mountPath: /var/lib/grafana/dashboards
|
mountPath: /etc/grafana/dashboards
|
||||||
volumes:
|
volumes:
|
||||||
|
- name: config
|
||||||
|
configMap:
|
||||||
|
name: grafana-config
|
||||||
- name: datasources
|
- name: datasources
|
||||||
configMap:
|
configMap:
|
||||||
name: grafana-datasources
|
name: grafana-datasources
|
||||||
- name: dashboard-providers
|
- name: providers
|
||||||
configMap:
|
configMap:
|
||||||
name: grafana-dashboard-providers
|
name: grafana-providers
|
||||||
- name: dashboards
|
- name: dashboards
|
||||||
configMap:
|
configMap:
|
||||||
name: grafana-dashboards
|
name: grafana-dashboards
|
||||||
|
@ -1,10 +1,10 @@
|
|||||||
apiVersion: v1
|
apiVersion: v1
|
||||||
kind: ConfigMap
|
kind: ConfigMap
|
||||||
metadata:
|
metadata:
|
||||||
name: grafana-dashboard-providers
|
name: grafana-providers
|
||||||
namespace: monitoring
|
namespace: monitoring
|
||||||
data:
|
data:
|
||||||
dashboard-providers.yaml: |+
|
providers.yaml: |+
|
||||||
apiVersion: 1
|
apiVersion: 1
|
||||||
providers:
|
providers:
|
||||||
- name: 'default'
|
- name: 'default'
|
||||||
@ -12,4 +12,4 @@ data:
|
|||||||
folder: ''
|
folder: ''
|
||||||
type: file
|
type: file
|
||||||
options:
|
options:
|
||||||
path: /var/lib/grafana/dashboards
|
path: /etc/grafana/dashboards
|
@ -1,42 +0,0 @@
|
|||||||
apiVersion: apps/v1
|
|
||||||
kind: Deployment
|
|
||||||
metadata:
|
|
||||||
name: default-backend
|
|
||||||
namespace: ingress
|
|
||||||
spec:
|
|
||||||
replicas: 1
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
template:
|
|
||||||
metadata:
|
|
||||||
labels:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
annotations:
|
|
||||||
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
|
|
||||||
spec:
|
|
||||||
containers:
|
|
||||||
- name: default-backend
|
|
||||||
# Any image is permissable as long as:
|
|
||||||
# 1. It serves a 404 page at /
|
|
||||||
# 2. It serves 200 on a /healthz endpoint
|
|
||||||
image: k8s.gcr.io/defaultbackend:1.4
|
|
||||||
ports:
|
|
||||||
- containerPort: 8080
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpu: 10m
|
|
||||||
memory: 20Mi
|
|
||||||
requests:
|
|
||||||
cpu: 10m
|
|
||||||
memory: 20Mi
|
|
||||||
livenessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /healthz
|
|
||||||
port: 8080
|
|
||||||
scheme: HTTP
|
|
||||||
initialDelaySeconds: 30
|
|
||||||
timeoutSeconds: 5
|
|
||||||
terminationGracePeriodSeconds: 60
|
|
@ -1,15 +0,0 @@
|
|||||||
apiVersion: v1
|
|
||||||
kind: Service
|
|
||||||
metadata:
|
|
||||||
name: default-backend
|
|
||||||
namespace: ingress
|
|
||||||
spec:
|
|
||||||
type: ClusterIP
|
|
||||||
selector:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
ports:
|
|
||||||
- name: http
|
|
||||||
protocol: TCP
|
|
||||||
port: 80
|
|
||||||
targetPort: 8080
|
|
@ -24,10 +24,9 @@ spec:
|
|||||||
node-role.kubernetes.io/node: ""
|
node-role.kubernetes.io/node: ""
|
||||||
containers:
|
containers:
|
||||||
- name: nginx-ingress-controller
|
- name: nginx-ingress-controller
|
||||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.20.0
|
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
|
||||||
args:
|
args:
|
||||||
- /nginx-ingress-controller
|
- /nginx-ingress-controller
|
||||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
|
||||||
- --ingress-class=public
|
- --ingress-class=public
|
||||||
# use downward API
|
# use downward API
|
||||||
env:
|
env:
|
||||||
@ -58,7 +57,7 @@ spec:
|
|||||||
initialDelaySeconds: 10
|
initialDelaySeconds: 10
|
||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
successThreshold: 1
|
successThreshold: 1
|
||||||
timeoutSeconds: 1
|
timeoutSeconds: 5
|
||||||
readinessProbe:
|
readinessProbe:
|
||||||
failureThreshold: 3
|
failureThreshold: 3
|
||||||
httpGet:
|
httpGet:
|
||||||
@ -67,7 +66,7 @@ spec:
|
|||||||
scheme: HTTP
|
scheme: HTTP
|
||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
successThreshold: 1
|
successThreshold: 1
|
||||||
timeoutSeconds: 1
|
timeoutSeconds: 5
|
||||||
securityContext:
|
securityContext:
|
||||||
capabilities:
|
capabilities:
|
||||||
add:
|
add:
|
||||||
|
@ -1,42 +0,0 @@
|
|||||||
apiVersion: apps/v1
|
|
||||||
kind: Deployment
|
|
||||||
metadata:
|
|
||||||
name: default-backend
|
|
||||||
namespace: ingress
|
|
||||||
spec:
|
|
||||||
replicas: 1
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
template:
|
|
||||||
metadata:
|
|
||||||
labels:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
annotations:
|
|
||||||
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
|
|
||||||
spec:
|
|
||||||
containers:
|
|
||||||
- name: default-backend
|
|
||||||
# Any image is permissable as long as:
|
|
||||||
# 1. It serves a 404 page at /
|
|
||||||
# 2. It serves 200 on a /healthz endpoint
|
|
||||||
image: k8s.gcr.io/defaultbackend:1.4
|
|
||||||
ports:
|
|
||||||
- containerPort: 8080
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpu: 10m
|
|
||||||
memory: 20Mi
|
|
||||||
requests:
|
|
||||||
cpu: 10m
|
|
||||||
memory: 20Mi
|
|
||||||
livenessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /healthz
|
|
||||||
port: 8080
|
|
||||||
scheme: HTTP
|
|
||||||
initialDelaySeconds: 30
|
|
||||||
timeoutSeconds: 5
|
|
||||||
terminationGracePeriodSeconds: 60
|
|
@ -1,15 +0,0 @@
|
|||||||
apiVersion: v1
|
|
||||||
kind: Service
|
|
||||||
metadata:
|
|
||||||
name: default-backend
|
|
||||||
namespace: ingress
|
|
||||||
spec:
|
|
||||||
type: ClusterIP
|
|
||||||
selector:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
ports:
|
|
||||||
- name: http
|
|
||||||
protocol: TCP
|
|
||||||
port: 80
|
|
||||||
targetPort: 8080
|
|
@ -24,10 +24,9 @@ spec:
|
|||||||
node-role.kubernetes.io/node: ""
|
node-role.kubernetes.io/node: ""
|
||||||
containers:
|
containers:
|
||||||
- name: nginx-ingress-controller
|
- name: nginx-ingress-controller
|
||||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.20.0
|
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
|
||||||
args:
|
args:
|
||||||
- /nginx-ingress-controller
|
- /nginx-ingress-controller
|
||||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
|
||||||
- --ingress-class=public
|
- --ingress-class=public
|
||||||
# use downward API
|
# use downward API
|
||||||
env:
|
env:
|
||||||
@ -58,7 +57,7 @@ spec:
|
|||||||
initialDelaySeconds: 10
|
initialDelaySeconds: 10
|
||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
successThreshold: 1
|
successThreshold: 1
|
||||||
timeoutSeconds: 1
|
timeoutSeconds: 5
|
||||||
readinessProbe:
|
readinessProbe:
|
||||||
failureThreshold: 3
|
failureThreshold: 3
|
||||||
httpGet:
|
httpGet:
|
||||||
@ -67,7 +66,7 @@ spec:
|
|||||||
scheme: HTTP
|
scheme: HTTP
|
||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
successThreshold: 1
|
successThreshold: 1
|
||||||
timeoutSeconds: 1
|
timeoutSeconds: 5
|
||||||
securityContext:
|
securityContext:
|
||||||
capabilities:
|
capabilities:
|
||||||
add:
|
add:
|
||||||
|
@ -1,42 +0,0 @@
|
|||||||
apiVersion: apps/v1
|
|
||||||
kind: Deployment
|
|
||||||
metadata:
|
|
||||||
name: default-backend
|
|
||||||
namespace: ingress
|
|
||||||
spec:
|
|
||||||
replicas: 1
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
template:
|
|
||||||
metadata:
|
|
||||||
labels:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
annotations:
|
|
||||||
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
|
|
||||||
spec:
|
|
||||||
containers:
|
|
||||||
- name: default-backend
|
|
||||||
# Any image is permissable as long as:
|
|
||||||
# 1. It serves a 404 page at /
|
|
||||||
# 2. It serves 200 on a /healthz endpoint
|
|
||||||
image: k8s.gcr.io/defaultbackend:1.4
|
|
||||||
ports:
|
|
||||||
- containerPort: 8080
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpu: 10m
|
|
||||||
memory: 20Mi
|
|
||||||
requests:
|
|
||||||
cpu: 10m
|
|
||||||
memory: 20Mi
|
|
||||||
livenessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /healthz
|
|
||||||
port: 8080
|
|
||||||
scheme: HTTP
|
|
||||||
initialDelaySeconds: 30
|
|
||||||
timeoutSeconds: 5
|
|
||||||
terminationGracePeriodSeconds: 60
|
|
@ -1,15 +0,0 @@
|
|||||||
apiVersion: v1
|
|
||||||
kind: Service
|
|
||||||
metadata:
|
|
||||||
name: default-backend
|
|
||||||
namespace: ingress
|
|
||||||
spec:
|
|
||||||
type: ClusterIP
|
|
||||||
selector:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
ports:
|
|
||||||
- name: http
|
|
||||||
protocol: TCP
|
|
||||||
port: 80
|
|
||||||
targetPort: 8080
|
|
@ -22,10 +22,9 @@ spec:
|
|||||||
spec:
|
spec:
|
||||||
containers:
|
containers:
|
||||||
- name: nginx-ingress-controller
|
- name: nginx-ingress-controller
|
||||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.20.0
|
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
|
||||||
args:
|
args:
|
||||||
- /nginx-ingress-controller
|
- /nginx-ingress-controller
|
||||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
|
||||||
- --ingress-class=public
|
- --ingress-class=public
|
||||||
# use downward API
|
# use downward API
|
||||||
env:
|
env:
|
||||||
@ -53,7 +52,7 @@ spec:
|
|||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
successThreshold: 1
|
successThreshold: 1
|
||||||
failureThreshold: 3
|
failureThreshold: 3
|
||||||
timeoutSeconds: 1
|
timeoutSeconds: 5
|
||||||
readinessProbe:
|
readinessProbe:
|
||||||
httpGet:
|
httpGet:
|
||||||
path: /healthz
|
path: /healthz
|
||||||
@ -62,7 +61,7 @@ spec:
|
|||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
successThreshold: 1
|
successThreshold: 1
|
||||||
failureThreshold: 3
|
failureThreshold: 3
|
||||||
timeoutSeconds: 1
|
timeoutSeconds: 5
|
||||||
securityContext:
|
securityContext:
|
||||||
capabilities:
|
capabilities:
|
||||||
add:
|
add:
|
||||||
|
@ -24,10 +24,9 @@ spec:
|
|||||||
node-role.kubernetes.io/node: ""
|
node-role.kubernetes.io/node: ""
|
||||||
containers:
|
containers:
|
||||||
- name: nginx-ingress-controller
|
- name: nginx-ingress-controller
|
||||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.20.0
|
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
|
||||||
args:
|
args:
|
||||||
- /nginx-ingress-controller
|
- /nginx-ingress-controller
|
||||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
|
||||||
- --ingress-class=public
|
- --ingress-class=public
|
||||||
# use downward API
|
# use downward API
|
||||||
env:
|
env:
|
||||||
@ -58,7 +57,7 @@ spec:
|
|||||||
initialDelaySeconds: 10
|
initialDelaySeconds: 10
|
||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
successThreshold: 1
|
successThreshold: 1
|
||||||
timeoutSeconds: 1
|
timeoutSeconds: 5
|
||||||
readinessProbe:
|
readinessProbe:
|
||||||
failureThreshold: 3
|
failureThreshold: 3
|
||||||
httpGet:
|
httpGet:
|
||||||
@ -67,7 +66,7 @@ spec:
|
|||||||
scheme: HTTP
|
scheme: HTTP
|
||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
successThreshold: 1
|
successThreshold: 1
|
||||||
timeoutSeconds: 1
|
timeoutSeconds: 5
|
||||||
securityContext:
|
securityContext:
|
||||||
capabilities:
|
capabilities:
|
||||||
add:
|
add:
|
||||||
|
@ -1,42 +0,0 @@
|
|||||||
apiVersion: apps/v1
|
|
||||||
kind: Deployment
|
|
||||||
metadata:
|
|
||||||
name: default-backend
|
|
||||||
namespace: ingress
|
|
||||||
spec:
|
|
||||||
replicas: 1
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
template:
|
|
||||||
metadata:
|
|
||||||
labels:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
annotations:
|
|
||||||
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
|
|
||||||
spec:
|
|
||||||
containers:
|
|
||||||
- name: default-backend
|
|
||||||
# Any image is permissable as long as:
|
|
||||||
# 1. It serves a 404 page at /
|
|
||||||
# 2. It serves 200 on a /healthz endpoint
|
|
||||||
image: k8s.gcr.io/defaultbackend:1.4
|
|
||||||
ports:
|
|
||||||
- containerPort: 8080
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpu: 10m
|
|
||||||
memory: 20Mi
|
|
||||||
requests:
|
|
||||||
cpu: 10m
|
|
||||||
memory: 20Mi
|
|
||||||
livenessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /healthz
|
|
||||||
port: 8080
|
|
||||||
scheme: HTTP
|
|
||||||
initialDelaySeconds: 30
|
|
||||||
timeoutSeconds: 5
|
|
||||||
terminationGracePeriodSeconds: 60
|
|
@ -1,15 +0,0 @@
|
|||||||
apiVersion: v1
|
|
||||||
kind: Service
|
|
||||||
metadata:
|
|
||||||
name: default-backend
|
|
||||||
namespace: ingress
|
|
||||||
spec:
|
|
||||||
type: ClusterIP
|
|
||||||
selector:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
ports:
|
|
||||||
- name: http
|
|
||||||
protocol: TCP
|
|
||||||
port: 80
|
|
||||||
targetPort: 8080
|
|
@ -1,42 +0,0 @@
|
|||||||
apiVersion: apps/v1
|
|
||||||
kind: Deployment
|
|
||||||
metadata:
|
|
||||||
name: default-backend
|
|
||||||
namespace: ingress
|
|
||||||
spec:
|
|
||||||
replicas: 1
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
template:
|
|
||||||
metadata:
|
|
||||||
labels:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
annotations:
|
|
||||||
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
|
|
||||||
spec:
|
|
||||||
containers:
|
|
||||||
- name: default-backend
|
|
||||||
# Any image is permissable as long as:
|
|
||||||
# 1. It serves a 404 page at /
|
|
||||||
# 2. It serves 200 on a /healthz endpoint
|
|
||||||
image: k8s.gcr.io/defaultbackend:1.4
|
|
||||||
ports:
|
|
||||||
- containerPort: 8080
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpu: 10m
|
|
||||||
memory: 20Mi
|
|
||||||
requests:
|
|
||||||
cpu: 10m
|
|
||||||
memory: 20Mi
|
|
||||||
livenessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /healthz
|
|
||||||
port: 8080
|
|
||||||
scheme: HTTP
|
|
||||||
initialDelaySeconds: 30
|
|
||||||
timeoutSeconds: 5
|
|
||||||
terminationGracePeriodSeconds: 60
|
|
@ -1,15 +0,0 @@
|
|||||||
apiVersion: v1
|
|
||||||
kind: Service
|
|
||||||
metadata:
|
|
||||||
name: default-backend
|
|
||||||
namespace: ingress
|
|
||||||
spec:
|
|
||||||
type: ClusterIP
|
|
||||||
selector:
|
|
||||||
name: default-backend
|
|
||||||
phase: prod
|
|
||||||
ports:
|
|
||||||
- name: http
|
|
||||||
protocol: TCP
|
|
||||||
port: 80
|
|
||||||
targetPort: 8080
|
|
@ -24,10 +24,9 @@ spec:
|
|||||||
node-role.kubernetes.io/node: ""
|
node-role.kubernetes.io/node: ""
|
||||||
containers:
|
containers:
|
||||||
- name: nginx-ingress-controller
|
- name: nginx-ingress-controller
|
||||||
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.20.0
|
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0
|
||||||
args:
|
args:
|
||||||
- /nginx-ingress-controller
|
- /nginx-ingress-controller
|
||||||
- --default-backend-service=$(POD_NAMESPACE)/default-backend
|
|
||||||
- --ingress-class=public
|
- --ingress-class=public
|
||||||
# use downward API
|
# use downward API
|
||||||
env:
|
env:
|
||||||
@ -58,7 +57,7 @@ spec:
|
|||||||
initialDelaySeconds: 10
|
initialDelaySeconds: 10
|
||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
successThreshold: 1
|
successThreshold: 1
|
||||||
timeoutSeconds: 1
|
timeoutSeconds: 5
|
||||||
readinessProbe:
|
readinessProbe:
|
||||||
failureThreshold: 3
|
failureThreshold: 3
|
||||||
httpGet:
|
httpGet:
|
||||||
@ -67,7 +66,7 @@ spec:
|
|||||||
scheme: HTTP
|
scheme: HTTP
|
||||||
periodSeconds: 10
|
periodSeconds: 10
|
||||||
successThreshold: 1
|
successThreshold: 1
|
||||||
timeoutSeconds: 1
|
timeoutSeconds: 5
|
||||||
securityContext:
|
securityContext:
|
||||||
capabilities:
|
capabilities:
|
||||||
add:
|
add:
|
||||||
|
@ -55,6 +55,17 @@ data:
|
|||||||
action: replace
|
action: replace
|
||||||
target_label: job
|
target_label: job
|
||||||
|
|
||||||
|
metric_relabel_configs:
|
||||||
|
- source_labels: [__name__]
|
||||||
|
action: drop
|
||||||
|
regex: etcd_(debugging|disk|request|server).*
|
||||||
|
- source_labels: [__name__]
|
||||||
|
action: drop
|
||||||
|
regex: apiserver_admission_controller_admission_latencies_seconds_.*
|
||||||
|
- source_labels: [__name__]
|
||||||
|
action: drop
|
||||||
|
regex: apiserver_admission_step_admission_latencies_seconds_.*
|
||||||
|
|
||||||
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
|
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
|
||||||
# metrics from a node by scraping kubelet (127.0.0.1:10250/metrics).
|
# metrics from a node by scraping kubelet (127.0.0.1:10250/metrics).
|
||||||
- job_name: 'kubelet'
|
- job_name: 'kubelet'
|
||||||
@ -89,6 +100,13 @@ data:
|
|||||||
relabel_configs:
|
relabel_configs:
|
||||||
- action: labelmap
|
- action: labelmap
|
||||||
regex: __meta_kubernetes_node_label_(.+)
|
regex: __meta_kubernetes_node_label_(.+)
|
||||||
|
metric_relabel_configs:
|
||||||
|
- source_labels: [__name__, image]
|
||||||
|
action: drop
|
||||||
|
regex: container_([a-z_]+);
|
||||||
|
- source_labels: [__name__]
|
||||||
|
action: drop
|
||||||
|
regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
|
||||||
|
|
||||||
|
|
||||||
# Scrap etcd metrics from controllers via listen-metrics-urls
|
# Scrap etcd metrics from controllers via listen-metrics-urls
|
||||||
@ -102,7 +120,7 @@ data:
|
|||||||
regex: 'true'
|
regex: 'true'
|
||||||
- action: labelmap
|
- action: labelmap
|
||||||
regex: __meta_kubernetes_node_label_(.+)
|
regex: __meta_kubernetes_node_label_(.+)
|
||||||
- source_labels: [__meta_kubernetes_node_name]
|
- source_labels: [__meta_kubernetes_node_address_InternalIP]
|
||||||
action: replace
|
action: replace
|
||||||
target_label: __address__
|
target_label: __address__
|
||||||
replacement: '${1}:2381'
|
replacement: '${1}:2381'
|
||||||
@ -119,10 +137,10 @@ data:
|
|||||||
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
|
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
|
||||||
# service then set this appropriately.
|
# service then set this appropriately.
|
||||||
- job_name: 'kubernetes-service-endpoints'
|
- job_name: 'kubernetes-service-endpoints'
|
||||||
|
|
||||||
kubernetes_sd_configs:
|
kubernetes_sd_configs:
|
||||||
- role: endpoints
|
- role: endpoints
|
||||||
|
|
||||||
|
honor_labels: true
|
||||||
relabel_configs:
|
relabel_configs:
|
||||||
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
|
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
|
||||||
action: keep
|
action: keep
|
||||||
@ -144,11 +162,19 @@ data:
|
|||||||
regex: __meta_kubernetes_service_label_(.+)
|
regex: __meta_kubernetes_service_label_(.+)
|
||||||
- source_labels: [__meta_kubernetes_namespace]
|
- source_labels: [__meta_kubernetes_namespace]
|
||||||
action: replace
|
action: replace
|
||||||
target_label: kubernetes_namespace
|
target_label: namespace
|
||||||
|
- source_labels: [__meta_kubernetes_pod_name]
|
||||||
|
action: replace
|
||||||
|
target_label: pod
|
||||||
- source_labels: [__meta_kubernetes_service_name]
|
- source_labels: [__meta_kubernetes_service_name]
|
||||||
action: replace
|
action: replace
|
||||||
target_label: job
|
target_label: job
|
||||||
|
|
||||||
|
metric_relabel_configs:
|
||||||
|
- source_labels: [__name__]
|
||||||
|
action: drop
|
||||||
|
regex: etcd_(debugging|disk|request|server).*
|
||||||
|
|
||||||
# Example scrape config for probing services via the Blackbox Exporter.
|
# Example scrape config for probing services via the Blackbox Exporter.
|
||||||
#
|
#
|
||||||
# The relabeling allows the actual service scrape endpoint to be configured
|
# The relabeling allows the actual service scrape endpoint to be configured
|
||||||
@ -177,7 +203,7 @@ data:
|
|||||||
- action: labelmap
|
- action: labelmap
|
||||||
regex: __meta_kubernetes_service_label_(.+)
|
regex: __meta_kubernetes_service_label_(.+)
|
||||||
- source_labels: [__meta_kubernetes_namespace]
|
- source_labels: [__meta_kubernetes_namespace]
|
||||||
target_label: kubernetes_namespace
|
target_label: namespace
|
||||||
- source_labels: [__meta_kubernetes_service_name]
|
- source_labels: [__meta_kubernetes_service_name]
|
||||||
target_label: job
|
target_label: job
|
||||||
|
|
||||||
|
@ -20,7 +20,7 @@ spec:
|
|||||||
serviceAccountName: prometheus
|
serviceAccountName: prometheus
|
||||||
containers:
|
containers:
|
||||||
- name: prometheus
|
- name: prometheus
|
||||||
image: quay.io/prometheus/prometheus:v2.4.3
|
image: quay.io/prometheus/prometheus:v2.7.1
|
||||||
args:
|
args:
|
||||||
- --web.listen-address=0.0.0.0:9090
|
- --web.listen-address=0.0.0.0:9090
|
||||||
- --config.file=/etc/prometheus/prometheus.yaml
|
- --config.file=/etc/prometheus/prometheus.yaml
|
||||||
|
@ -3,7 +3,8 @@ kind: ClusterRole
|
|||||||
metadata:
|
metadata:
|
||||||
name: kube-state-metrics
|
name: kube-state-metrics
|
||||||
rules:
|
rules:
|
||||||
- apiGroups: [""]
|
- apiGroups:
|
||||||
|
- ""
|
||||||
resources:
|
resources:
|
||||||
- configmaps
|
- configmaps
|
||||||
- secrets
|
- secrets
|
||||||
@ -17,23 +18,47 @@ rules:
|
|||||||
- persistentvolumes
|
- persistentvolumes
|
||||||
- namespaces
|
- namespaces
|
||||||
- endpoints
|
- endpoints
|
||||||
verbs: ["list", "watch"]
|
verbs:
|
||||||
- apiGroups: ["extensions"]
|
- list
|
||||||
|
- watch
|
||||||
|
- apiGroups:
|
||||||
|
- extensions
|
||||||
resources:
|
resources:
|
||||||
- daemonsets
|
- daemonsets
|
||||||
- deployments
|
- deployments
|
||||||
- replicasets
|
- replicasets
|
||||||
verbs: ["list", "watch"]
|
verbs:
|
||||||
- apiGroups: ["apps"]
|
- list
|
||||||
|
- watch
|
||||||
|
- apiGroups:
|
||||||
|
- apps
|
||||||
resources:
|
resources:
|
||||||
- statefulsets
|
- statefulsets
|
||||||
verbs: ["list", "watch"]
|
- daemonsets
|
||||||
- apiGroups: ["batch"]
|
- deployments
|
||||||
|
- replicasets
|
||||||
|
verbs:
|
||||||
|
- list
|
||||||
|
- watch
|
||||||
|
- apiGroups:
|
||||||
|
- batch
|
||||||
resources:
|
resources:
|
||||||
- cronjobs
|
- cronjobs
|
||||||
- jobs
|
- jobs
|
||||||
verbs: ["list", "watch"]
|
verbs:
|
||||||
- apiGroups: ["autoscaling"]
|
- list
|
||||||
|
- watch
|
||||||
|
- apiGroups:
|
||||||
|
- autoscaling
|
||||||
resources:
|
resources:
|
||||||
- horizontalpodautoscalers
|
- horizontalpodautoscalers
|
||||||
verbs: ["list", "watch"]
|
verbs:
|
||||||
|
- list
|
||||||
|
- watch
|
||||||
|
- apiGroups:
|
||||||
|
- policy
|
||||||
|
resources:
|
||||||
|
- poddisruptionbudgets
|
||||||
|
verbs:
|
||||||
|
- list
|
||||||
|
- watch
|
||||||
|
@ -24,7 +24,7 @@ spec:
|
|||||||
serviceAccountName: kube-state-metrics
|
serviceAccountName: kube-state-metrics
|
||||||
containers:
|
containers:
|
||||||
- name: kube-state-metrics
|
- name: kube-state-metrics
|
||||||
image: quay.io/coreos/kube-state-metrics:v1.4.0
|
image: quay.io/coreos/kube-state-metrics:v1.5.0
|
||||||
ports:
|
ports:
|
||||||
- name: metrics
|
- name: metrics
|
||||||
containerPort: 8080
|
containerPort: 8080
|
||||||
@ -35,7 +35,7 @@ spec:
|
|||||||
initialDelaySeconds: 5
|
initialDelaySeconds: 5
|
||||||
timeoutSeconds: 5
|
timeoutSeconds: 5
|
||||||
- name: addon-resizer
|
- name: addon-resizer
|
||||||
image: k8s.gcr.io/addon-resizer:1.7
|
image: k8s.gcr.io/addon-resizer:1.8.4
|
||||||
resources:
|
resources:
|
||||||
limits:
|
limits:
|
||||||
cpu: 100m
|
cpu: 100m
|
||||||
|
@ -6,7 +6,7 @@ metadata:
|
|||||||
roleRef:
|
roleRef:
|
||||||
apiGroup: rbac.authorization.k8s.io
|
apiGroup: rbac.authorization.k8s.io
|
||||||
kind: Role
|
kind: Role
|
||||||
name: kube-state-metrics-resizer
|
name: kube-state-metrics
|
||||||
subjects:
|
subjects:
|
||||||
- kind: ServiceAccount
|
- kind: ServiceAccount
|
||||||
name: kube-state-metrics
|
name: kube-state-metrics
|
||||||
|
@ -1,15 +1,31 @@
|
|||||||
apiVersion: rbac.authorization.k8s.io/v1
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
kind: Role
|
kind: Role
|
||||||
metadata:
|
metadata:
|
||||||
name: kube-state-metrics-resizer
|
name: kube-state-metrics
|
||||||
namespace: monitoring
|
namespace: monitoring
|
||||||
rules:
|
rules:
|
||||||
- apiGroups: [""]
|
- apiGroups:
|
||||||
|
- ""
|
||||||
resources:
|
resources:
|
||||||
- pods
|
- pods
|
||||||
verbs: ["get"]
|
verbs:
|
||||||
- apiGroups: ["extensions"]
|
- get
|
||||||
|
- apiGroups:
|
||||||
|
- extensions
|
||||||
resources:
|
resources:
|
||||||
- deployments
|
- deployments
|
||||||
resourceNames: ["kube-state-metrics"]
|
resourceNames:
|
||||||
verbs: ["get", "update"]
|
- kube-state-metrics
|
||||||
|
verbs:
|
||||||
|
- get
|
||||||
|
- update
|
||||||
|
- apiGroups:
|
||||||
|
- apps
|
||||||
|
resources:
|
||||||
|
- deployments
|
||||||
|
resourceNames:
|
||||||
|
- kube-state-metrics
|
||||||
|
verbs:
|
||||||
|
- get
|
||||||
|
- update
|
||||||
|
|
||||||
|
@ -28,21 +28,24 @@ spec:
|
|||||||
hostPID: true
|
hostPID: true
|
||||||
containers:
|
containers:
|
||||||
- name: node-exporter
|
- name: node-exporter
|
||||||
image: quay.io/prometheus/node-exporter:v0.15.2
|
image: quay.io/prometheus/node-exporter:v0.17.0
|
||||||
args:
|
args:
|
||||||
- "--path.procfs=/host/proc"
|
- --path.procfs=/host/proc
|
||||||
- "--path.sysfs=/host/sys"
|
- --path.sysfs=/host/sys
|
||||||
|
- --path.rootfs=/host/root
|
||||||
|
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
|
||||||
|
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
|
||||||
ports:
|
ports:
|
||||||
- name: metrics
|
- name: metrics
|
||||||
containerPort: 9100
|
containerPort: 9100
|
||||||
hostPort: 9100
|
hostPort: 9100
|
||||||
resources:
|
resources:
|
||||||
requests:
|
requests:
|
||||||
memory: 30Mi
|
|
||||||
cpu: 100m
|
cpu: 100m
|
||||||
limits:
|
|
||||||
memory: 50Mi
|
memory: 50Mi
|
||||||
|
limits:
|
||||||
cpu: 200m
|
cpu: 200m
|
||||||
|
memory: 100Mi
|
||||||
volumeMounts:
|
volumeMounts:
|
||||||
- name: proc
|
- name: proc
|
||||||
mountPath: /host/proc
|
mountPath: /host/proc
|
||||||
@ -50,6 +53,9 @@ spec:
|
|||||||
- name: sys
|
- name: sys
|
||||||
mountPath: /host/sys
|
mountPath: /host/sys
|
||||||
readOnly: true
|
readOnly: true
|
||||||
|
- name: root
|
||||||
|
mountPath: /host/root
|
||||||
|
readOnly: true
|
||||||
tolerations:
|
tolerations:
|
||||||
- effect: NoSchedule
|
- effect: NoSchedule
|
||||||
operator: Exists
|
operator: Exists
|
||||||
@ -60,3 +66,6 @@ spec:
|
|||||||
- name: sys
|
- name: sys
|
||||||
hostPath:
|
hostPath:
|
||||||
path: /sys
|
path: /sys
|
||||||
|
- name: root
|
||||||
|
hostPath:
|
||||||
|
path: /
|
||||||
|
@ -4,582 +4,1089 @@ metadata:
|
|||||||
name: prometheus-rules
|
name: prometheus-rules
|
||||||
namespace: monitoring
|
namespace: monitoring
|
||||||
data:
|
data:
|
||||||
alertmanager.rules.yaml: |
|
etcd.yaml: |-
|
||||||
groups:
|
{
|
||||||
- name: alertmanager.rules
|
"groups": [
|
||||||
rules:
|
{
|
||||||
- alert: AlertmanagerConfigInconsistent
|
"name": "etcd",
|
||||||
expr: count_values("config_hash", alertmanager_config_hash) BY (service) / ON(service)
|
"rules": [
|
||||||
GROUP_LEFT() label_replace(prometheus_operator_alertmanager_spec_replicas, "service",
|
{
|
||||||
"alertmanager-$1", "alertmanager", "(.*)") != 1
|
"alert": "etcdInsufficientMembers",
|
||||||
for: 5m
|
"annotations": {
|
||||||
labels:
|
"message": "etcd cluster \"{{ $labels.job }}\": insufficient members ({{ $value }})."
|
||||||
severity: critical
|
},
|
||||||
annotations:
|
"expr": "sum(up{job=~\".*etcd.*\"} == bool 1) by (job) < ((count(up{job=~\".*etcd.*\"}) by (job) + 1) / 2)\n",
|
||||||
description: The configuration of the instances of the Alertmanager cluster
|
"for": "3m",
|
||||||
`{{$labels.service}}` are out of sync.
|
"labels": {
|
||||||
- alert: AlertmanagerDownOrMissing
|
"severity": "critical"
|
||||||
expr: label_replace(prometheus_operator_alertmanager_spec_replicas, "job", "alertmanager-$1",
|
}
|
||||||
"alertmanager", "(.*)") / ON(job) GROUP_RIGHT() sum(up) BY (job) != 1
|
},
|
||||||
for: 5m
|
{
|
||||||
labels:
|
"alert": "etcdNoLeader",
|
||||||
severity: warning
|
"annotations": {
|
||||||
annotations:
|
"message": "etcd cluster \"{{ $labels.job }}\": member {{ $labels.instance }} has no leader."
|
||||||
description: An unexpected number of Alertmanagers are scraped or Alertmanagers
|
},
|
||||||
disappeared from discovery.
|
"expr": "etcd_server_has_leader{job=~\".*etcd.*\"} == 0\n",
|
||||||
- alert: AlertmanagerFailedReload
|
"for": "1m",
|
||||||
expr: alertmanager_config_last_reload_successful == 0
|
"labels": {
|
||||||
for: 10m
|
"severity": "critical"
|
||||||
labels:
|
}
|
||||||
severity: warning
|
},
|
||||||
annotations:
|
{
|
||||||
description: Reloading Alertmanager's configuration has failed for {{ $labels.namespace
|
"alert": "etcdHighNumberOfLeaderChanges",
|
||||||
}}/{{ $labels.pod}}.
|
"annotations": {
|
||||||
etcd3.rules.yaml: |
|
"message": "etcd cluster \"{{ $labels.job }}\": instance {{ $labels.instance }} has seen {{ $value }} leader changes within the last 30 minutes."
|
||||||
groups:
|
},
|
||||||
- name: ./etcd3.rules
|
"expr": "rate(etcd_server_leader_changes_seen_total{job=~\".*etcd.*\"}[15m]) > 3\n",
|
||||||
rules:
|
"for": "15m",
|
||||||
- alert: InsufficientMembers
|
"labels": {
|
||||||
expr: count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1)
|
"severity": "warning"
|
||||||
for: 3m
|
}
|
||||||
labels:
|
},
|
||||||
severity: critical
|
{
|
||||||
annotations:
|
"alert": "etcdGRPCRequestsSlow",
|
||||||
description: If one more etcd member goes down the cluster will be unavailable
|
"annotations": {
|
||||||
summary: etcd cluster insufficient members
|
"message": "etcd cluster \"{{ $labels.job }}\": gRPC requests to {{ $labels.grpc_method }} are taking {{ $value }}s on etcd instance {{ $labels.instance }}."
|
||||||
- alert: NoLeader
|
},
|
||||||
expr: etcd_server_has_leader{job="etcd"} == 0
|
"expr": "histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job=~\".*etcd.*\", grpc_type=\"unary\"}[5m])) by (job, instance, grpc_service, grpc_method, le))\n> 0.15\n",
|
||||||
for: 1m
|
"for": "10m",
|
||||||
labels:
|
"labels": {
|
||||||
severity: critical
|
"severity": "critical"
|
||||||
annotations:
|
}
|
||||||
description: etcd member {{ $labels.instance }} has no leader
|
},
|
||||||
summary: etcd member has no leader
|
{
|
||||||
- alert: HighNumberOfLeaderChanges
|
"alert": "etcdMemberCommunicationSlow",
|
||||||
expr: increase(etcd_server_leader_changes_seen_total{job="etcd"}[1h]) > 3
|
"annotations": {
|
||||||
labels:
|
"message": "etcd cluster \"{{ $labels.job }}\": member communication with {{ $labels.To }} is taking {{ $value }}s on etcd instance {{ $labels.instance }}."
|
||||||
severity: warning
|
},
|
||||||
annotations:
|
"expr": "histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket{job=~\".*etcd.*\"}[5m]))\n> 0.15\n",
|
||||||
description: etcd instance {{ $labels.instance }} has seen {{ $value }} leader
|
"for": "10m",
|
||||||
changes within the last hour
|
"labels": {
|
||||||
summary: a high number of leader changes within the etcd cluster are happening
|
"severity": "warning"
|
||||||
- alert: GRPCRequestsSlow
|
}
|
||||||
expr: histogram_quantile(0.99, sum(rate(grpc_server_handling_seconds_bucket{job="etcd",grpc_type="unary"}[5m])) by (grpc_service, grpc_method, le))
|
},
|
||||||
> 0.15
|
{
|
||||||
for: 10m
|
"alert": "etcdHighNumberOfFailedProposals",
|
||||||
labels:
|
"annotations": {
|
||||||
severity: critical
|
"message": "etcd cluster \"{{ $labels.job }}\": {{ $value }} proposal failures within the last 30 minutes on etcd instance {{ $labels.instance }}."
|
||||||
annotations:
|
},
|
||||||
description: on etcd instance {{ $labels.instance }} gRPC requests to {{ $labels.grpc_method
|
"expr": "rate(etcd_server_proposals_failed_total{job=~\".*etcd.*\"}[15m]) > 5\n",
|
||||||
}} are slow
|
"for": "15m",
|
||||||
summary: slow gRPC requests
|
"labels": {
|
||||||
- alert: HighNumberOfFailedHTTPRequests
|
"severity": "warning"
|
||||||
expr: sum(rate(etcd_http_failed_total{job="etcd"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job="etcd"}[5m]))
|
}
|
||||||
BY (method) > 0.01
|
},
|
||||||
for: 10m
|
{
|
||||||
labels:
|
"alert": "etcdHighFsyncDurations",
|
||||||
severity: warning
|
"annotations": {
|
||||||
annotations:
|
"message": "etcd cluster \"{{ $labels.job }}\": 99th percentile fync durations are {{ $value }}s on etcd instance {{ $labels.instance }}."
|
||||||
description: '{{ $value }}% of requests for {{ $labels.method }} failed on etcd
|
},
|
||||||
instance {{ $labels.instance }}'
|
"expr": "histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{job=~\".*etcd.*\"}[5m]))\n> 0.5\n",
|
||||||
summary: a high number of HTTP requests are failing
|
"for": "10m",
|
||||||
- alert: HighNumberOfFailedHTTPRequests
|
"labels": {
|
||||||
expr: sum(rate(etcd_http_failed_total{job="etcd"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job="etcd"}[5m]))
|
"severity": "warning"
|
||||||
BY (method) > 0.05
|
}
|
||||||
for: 5m
|
},
|
||||||
labels:
|
{
|
||||||
severity: critical
|
"alert": "etcdHighCommitDurations",
|
||||||
annotations:
|
"annotations": {
|
||||||
description: '{{ $value }}% of requests for {{ $labels.method }} failed on etcd
|
"message": "etcd cluster \"{{ $labels.job }}\": 99th percentile commit durations {{ $value }}s on etcd instance {{ $labels.instance }}."
|
||||||
instance {{ $labels.instance }}'
|
},
|
||||||
summary: a high number of HTTP requests are failing
|
"expr": "histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket{job=~\".*etcd.*\"}[5m]))\n> 0.25\n",
|
||||||
- alert: HTTPRequestsSlow
|
"for": "10m",
|
||||||
expr: histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[5m]))
|
"labels": {
|
||||||
> 0.15
|
"severity": "warning"
|
||||||
for: 10m
|
}
|
||||||
labels:
|
},
|
||||||
severity: warning
|
{
|
||||||
annotations:
|
"alert": "etcdHighNumberOfFailedHTTPRequests",
|
||||||
description: on etcd instance {{ $labels.instance }} HTTP requests to {{ $labels.method
|
"annotations": {
|
||||||
}} are slow
|
"message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}"
|
||||||
summary: slow HTTP requests
|
},
|
||||||
- alert: EtcdMemberCommunicationSlow
|
"expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nBY (method) > 0.01\n",
|
||||||
expr: histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[5m]))
|
"for": "10m",
|
||||||
> 0.15
|
"labels": {
|
||||||
for: 10m
|
"severity": "warning"
|
||||||
labels:
|
}
|
||||||
severity: warning
|
},
|
||||||
annotations:
|
{
|
||||||
description: etcd instance {{ $labels.instance }} member communication with
|
"alert": "etcdHighNumberOfFailedHTTPRequests",
|
||||||
{{ $labels.To }} is slow
|
"annotations": {
|
||||||
summary: etcd member communication is slow
|
"message": "{{ $value }}% of requests for {{ $labels.method }} failed on etcd instance {{ $labels.instance }}."
|
||||||
- alert: HighNumberOfFailedProposals
|
},
|
||||||
expr: increase(etcd_server_proposals_failed_total{job="etcd"}[1h]) > 5
|
"expr": "sum(rate(etcd_http_failed_total{job=~\".*etcd.*\", code!=\"404\"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job=~\".*etcd.*\"}[5m]))\nBY (method) > 0.05\n",
|
||||||
labels:
|
"for": "10m",
|
||||||
severity: warning
|
"labels": {
|
||||||
annotations:
|
"severity": "critical"
|
||||||
description: etcd instance {{ $labels.instance }} has seen {{ $value }} proposal
|
}
|
||||||
failures within the last hour
|
},
|
||||||
summary: a high number of proposals within the etcd cluster are failing
|
{
|
||||||
- alert: HighFsyncDurations
|
"alert": "etcdHTTPRequestsSlow",
|
||||||
expr: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m]))
|
"annotations": {
|
||||||
> 0.5
|
"message": "etcd instance {{ $labels.instance }} HTTP requests to {{ $labels.method }} are slow."
|
||||||
for: 10m
|
},
|
||||||
labels:
|
"expr": "histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[5m]))\n> 0.15\n",
|
||||||
severity: warning
|
"for": "10m",
|
||||||
annotations:
|
"labels": {
|
||||||
description: etcd instance {{ $labels.instance }} fync durations are high
|
"severity": "warning"
|
||||||
summary: high fsync durations
|
}
|
||||||
- alert: HighCommitDurations
|
}
|
||||||
expr: histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[5m]))
|
]
|
||||||
> 0.25
|
}
|
||||||
for: 10m
|
]
|
||||||
labels:
|
}
|
||||||
severity: warning
|
extra.yaml: |-
|
||||||
annotations:
|
{
|
||||||
description: etcd instance {{ $labels.instance }} commit durations are high
|
"groups": [
|
||||||
summary: high commit durations
|
{
|
||||||
general.rules.yaml: |
|
"name": "extra.rules",
|
||||||
groups:
|
"rules": [
|
||||||
- name: general.rules
|
{
|
||||||
rules:
|
"alert": "InactiveRAIDDisk",
|
||||||
- alert: TargetDown
|
"annotations": {
|
||||||
expr: 100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10
|
"message": "{{ $value }} RAID disk(s) on node {{ $labels.instance }} are inactive."
|
||||||
for: 10m
|
},
|
||||||
labels:
|
"expr": "node_md_disks - node_md_disks_active > 0",
|
||||||
severity: warning
|
"for": "10m",
|
||||||
annotations:
|
"labels": {
|
||||||
description: '{{ $value }}% of {{ $labels.job }} targets are down.'
|
"severity": "warning"
|
||||||
summary: Targets are down
|
}
|
||||||
- record: fd_utilization
|
}
|
||||||
expr: process_open_fds / process_max_fds
|
]
|
||||||
- alert: FdExhaustionClose
|
}
|
||||||
expr: predict_linear(fd_utilization[1h], 3600 * 4) > 1
|
]
|
||||||
for: 10m
|
}
|
||||||
labels:
|
kube.yaml: |-
|
||||||
severity: warning
|
{
|
||||||
annotations:
|
"groups": [
|
||||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
|
{
|
||||||
will exhaust in file/socket descriptors within the next 4 hours'
|
"name": "k8s.rules",
|
||||||
summary: file descriptors soon exhausted
|
"rules": [
|
||||||
- alert: FdExhaustionClose
|
{
|
||||||
expr: predict_linear(fd_utilization[10m], 3600) > 1
|
"expr": "sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container_name!=\"\"}[5m])) by (namespace)\n",
|
||||||
for: 10m
|
"record": "namespace:container_cpu_usage_seconds_total:sum_rate"
|
||||||
labels:
|
},
|
||||||
severity: critical
|
{
|
||||||
annotations:
|
"expr": "sum by (namespace, pod_name, container_name) (\n rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container_name!=\"\"}[5m])\n)\n",
|
||||||
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
|
"record": "namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate"
|
||||||
will exhaust in file/socket descriptors within the next hour'
|
},
|
||||||
summary: file descriptors soon exhausted
|
{
|
||||||
kube-controller-manager.rules.yaml: |
|
"expr": "sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\", image!=\"\", container_name!=\"\"}) by (namespace)\n",
|
||||||
groups:
|
"record": "namespace:container_memory_usage_bytes:sum"
|
||||||
- name: kube-controller-manager.rules
|
},
|
||||||
rules:
|
{
|
||||||
- alert: K8SControllerManagerDown
|
"expr": "sum by (namespace, label_name) (\n sum(rate(container_cpu_usage_seconds_total{job=\"kubernetes-cadvisor\", image!=\"\", container_name!=\"\"}[5m])) by (namespace, pod_name)\n * on (namespace, pod_name) group_left(label_name)\n label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\", \"(.*)\")\n)\n",
|
||||||
expr: absent(up{job="kube-controller-manager"} == 1)
|
"record": "namespace_name:container_cpu_usage_seconds_total:sum_rate"
|
||||||
for: 5m
|
},
|
||||||
labels:
|
{
|
||||||
severity: critical
|
"expr": "sum by (namespace, label_name) (\n sum(container_memory_usage_bytes{job=\"kubernetes-cadvisor\",image!=\"\", container_name!=\"\"}) by (pod_name, namespace)\n* on (namespace, pod_name) group_left(label_name)\n label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\", \"(.*)\")\n)\n",
|
||||||
annotations:
|
"record": "namespace_name:container_memory_usage_bytes:sum"
|
||||||
description: There is no running K8S controller manager. Deployments and replication
|
},
|
||||||
controllers are not making progress.
|
{
|
||||||
summary: Controller manager is down
|
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"}) by (namespace, pod)\n* on (namespace, pod) group_left(label_name)\n label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\", \"(.*)\")\n)\n",
|
||||||
kube-scheduler.rules.yaml: |
|
"record": "namespace_name:kube_pod_container_resource_requests_memory_bytes:sum"
|
||||||
groups:
|
},
|
||||||
- name: kube-scheduler.rules
|
{
|
||||||
rules:
|
"expr": "sum by (namespace, label_name) (\n sum(kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"} and on(pod) kube_pod_status_scheduled{condition=\"true\"}) by (namespace, pod)\n* on (namespace, pod) group_left(label_name)\n label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\", \"(.*)\")\n)\n",
|
||||||
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
|
"record": "namespace_name:kube_pod_container_resource_requests_cpu_cores:sum"
|
||||||
expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
}
|
||||||
BY (le, cluster)) / 1e+06
|
]
|
||||||
labels:
|
},
|
||||||
quantile: "0.99"
|
{
|
||||||
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
|
"name": "kube-scheduler.rules",
|
||||||
expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
"rules": [
|
||||||
BY (le, cluster)) / 1e+06
|
{
|
||||||
labels:
|
"expr": "histogram_quantile(0.99, sum(rate(scheduler_e2e_scheduling_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
quantile: "0.9"
|
"labels": {
|
||||||
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
|
"quantile": "0.99"
|
||||||
expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
|
},
|
||||||
BY (le, cluster)) / 1e+06
|
"record": "cluster_quantile:scheduler_e2e_scheduling_latency:histogram_quantile"
|
||||||
labels:
|
},
|
||||||
quantile: "0.5"
|
{
|
||||||
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
|
"expr": "histogram_quantile(0.99, sum(rate(scheduler_scheduling_algorithm_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
"labels": {
|
||||||
BY (le, cluster)) / 1e+06
|
"quantile": "0.99"
|
||||||
labels:
|
},
|
||||||
quantile: "0.99"
|
"record": "cluster_quantile:scheduler_scheduling_algorithm_latency:histogram_quantile"
|
||||||
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
|
},
|
||||||
expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
{
|
||||||
BY (le, cluster)) / 1e+06
|
"expr": "histogram_quantile(0.99, sum(rate(scheduler_binding_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
labels:
|
"labels": {
|
||||||
quantile: "0.9"
|
"quantile": "0.99"
|
||||||
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
|
},
|
||||||
expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
|
"record": "cluster_quantile:scheduler_binding_latency:histogram_quantile"
|
||||||
BY (le, cluster)) / 1e+06
|
},
|
||||||
labels:
|
{
|
||||||
quantile: "0.5"
|
"expr": "histogram_quantile(0.9, sum(rate(scheduler_e2e_scheduling_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
- record: cluster:scheduler_binding_latency_seconds:quantile
|
"labels": {
|
||||||
expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket)
|
"quantile": "0.9"
|
||||||
BY (le, cluster)) / 1e+06
|
},
|
||||||
labels:
|
"record": "cluster_quantile:scheduler_e2e_scheduling_latency:histogram_quantile"
|
||||||
quantile: "0.99"
|
},
|
||||||
- record: cluster:scheduler_binding_latency_seconds:quantile
|
{
|
||||||
expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket)
|
"expr": "histogram_quantile(0.9, sum(rate(scheduler_scheduling_algorithm_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
BY (le, cluster)) / 1e+06
|
"labels": {
|
||||||
labels:
|
"quantile": "0.9"
|
||||||
quantile: "0.9"
|
},
|
||||||
- record: cluster:scheduler_binding_latency_seconds:quantile
|
"record": "cluster_quantile:scheduler_scheduling_algorithm_latency:histogram_quantile"
|
||||||
expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket)
|
},
|
||||||
BY (le, cluster)) / 1e+06
|
{
|
||||||
labels:
|
"expr": "histogram_quantile(0.9, sum(rate(scheduler_binding_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
quantile: "0.5"
|
"labels": {
|
||||||
- alert: K8SSchedulerDown
|
"quantile": "0.9"
|
||||||
expr: absent(up{job="kube-scheduler"} == 1)
|
},
|
||||||
for: 5m
|
"record": "cluster_quantile:scheduler_binding_latency:histogram_quantile"
|
||||||
labels:
|
},
|
||||||
severity: critical
|
{
|
||||||
annotations:
|
"expr": "histogram_quantile(0.5, sum(rate(scheduler_e2e_scheduling_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
description: There is no running K8S scheduler. New pods are not being assigned
|
"labels": {
|
||||||
to nodes.
|
"quantile": "0.5"
|
||||||
summary: Scheduler is down
|
},
|
||||||
kube-state-metrics.rules.yaml: |
|
"record": "cluster_quantile:scheduler_e2e_scheduling_latency:histogram_quantile"
|
||||||
groups:
|
},
|
||||||
- name: kube-state-metrics.rules
|
{
|
||||||
rules:
|
"expr": "histogram_quantile(0.5, sum(rate(scheduler_scheduling_algorithm_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
- alert: DeploymentGenerationMismatch
|
"labels": {
|
||||||
expr: kube_deployment_status_observed_generation != kube_deployment_metadata_generation
|
"quantile": "0.5"
|
||||||
for: 15m
|
},
|
||||||
labels:
|
"record": "cluster_quantile:scheduler_scheduling_algorithm_latency:histogram_quantile"
|
||||||
severity: warning
|
},
|
||||||
annotations:
|
{
|
||||||
description: Observed deployment generation does not match expected one for
|
"expr": "histogram_quantile(0.5, sum(rate(scheduler_binding_latency_microseconds_bucket{job=\"kube-scheduler\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
deployment {{$labels.namespaces}}/{{$labels.deployment}}
|
"labels": {
|
||||||
summary: Deployment is outdated
|
"quantile": "0.5"
|
||||||
- alert: DeploymentReplicasNotUpdated
|
},
|
||||||
expr: ((kube_deployment_status_replicas_updated != kube_deployment_spec_replicas)
|
"record": "cluster_quantile:scheduler_binding_latency:histogram_quantile"
|
||||||
or (kube_deployment_status_replicas_available != kube_deployment_spec_replicas))
|
}
|
||||||
unless (kube_deployment_spec_paused == 1)
|
]
|
||||||
for: 15m
|
},
|
||||||
labels:
|
{
|
||||||
severity: warning
|
"name": "kube-apiserver.rules",
|
||||||
annotations:
|
"rules": [
|
||||||
description: Replicas are not updated and available for deployment {{$labels.namespaces}}/{{$labels.deployment}}
|
{
|
||||||
summary: Deployment replicas are outdated
|
"expr": "histogram_quantile(0.99, sum(rate(apiserver_request_latencies_bucket{job=\"apiserver\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
- alert: DaemonSetRolloutStuck
|
"labels": {
|
||||||
expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled
|
"quantile": "0.99"
|
||||||
* 100 < 100
|
},
|
||||||
for: 15m
|
"record": "cluster_quantile:apiserver_request_latencies:histogram_quantile"
|
||||||
labels:
|
},
|
||||||
severity: warning
|
{
|
||||||
annotations:
|
"expr": "histogram_quantile(0.9, sum(rate(apiserver_request_latencies_bucket{job=\"apiserver\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
description: Only {{$value}}% of desired pods scheduled and ready for daemon
|
"labels": {
|
||||||
set {{$labels.namespaces}}/{{$labels.daemonset}}
|
"quantile": "0.9"
|
||||||
summary: DaemonSet is missing pods
|
},
|
||||||
- alert: K8SDaemonSetsNotScheduled
|
"record": "cluster_quantile:apiserver_request_latencies:histogram_quantile"
|
||||||
expr: kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled
|
},
|
||||||
> 0
|
{
|
||||||
for: 10m
|
"expr": "histogram_quantile(0.5, sum(rate(apiserver_request_latencies_bucket{job=\"apiserver\"}[5m])) without(instance, pod)) / 1e+06\n",
|
||||||
labels:
|
"labels": {
|
||||||
severity: warning
|
"quantile": "0.5"
|
||||||
annotations:
|
},
|
||||||
description: A number of daemonsets are not scheduled.
|
"record": "cluster_quantile:apiserver_request_latencies:histogram_quantile"
|
||||||
summary: Daemonsets are not scheduled correctly
|
}
|
||||||
- alert: DaemonSetsMissScheduled
|
]
|
||||||
expr: kube_daemonset_status_number_misscheduled > 0
|
},
|
||||||
for: 10m
|
{
|
||||||
labels:
|
"name": "node.rules",
|
||||||
severity: warning
|
"rules": [
|
||||||
annotations:
|
{
|
||||||
description: A number of daemonsets are running where they are not supposed
|
"expr": "sum(min(kube_pod_info) by (node))",
|
||||||
to run.
|
"record": ":kube_pod_info_node_count:"
|
||||||
summary: Daemonsets are not scheduled correctly
|
},
|
||||||
- alert: PodFrequentlyRestarting
|
{
|
||||||
expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
|
"expr": "max(label_replace(kube_pod_info{job=\"kube-state-metrics\"}, \"pod\", \"$1\", \"pod\", \"(.*)\")) by (node, namespace, pod)\n",
|
||||||
for: 10m
|
"record": "node_namespace_pod:kube_pod_info:"
|
||||||
labels:
|
},
|
||||||
severity: warning
|
{
|
||||||
annotations:
|
"expr": "count by (node) (sum by (node, cpu) (\n node_cpu_seconds_total{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n))\n",
|
||||||
description: Pod {{$labels.namespaces}}/{{$labels.pod}} restarted {{$value}}
|
"record": "node:node_num_cpu:sum"
|
||||||
times within the last hour
|
},
|
||||||
summary: Pod is restarting frequently
|
{
|
||||||
kubelet.rules.yaml: |
|
"expr": "1 - avg(rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m]))\n",
|
||||||
groups:
|
"record": ":node_cpu_utilisation:avg1m"
|
||||||
- name: kubelet.rules
|
},
|
||||||
rules:
|
{
|
||||||
- alert: K8SNodeNotReady
|
"expr": "1 - avg by (node) (\n rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m])\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:)\n",
|
||||||
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
|
"record": "node:node_cpu_utilisation:avg1m"
|
||||||
for: 1h
|
},
|
||||||
labels:
|
{
|
||||||
severity: warning
|
"expr": "node:node_cpu_utilisation:avg1m\n *\nnode:node_num_cpu:sum\n /\nscalar(sum(node:node_num_cpu:sum))\n",
|
||||||
annotations:
|
"record": "node:cluster_cpu_utilisation:ratio"
|
||||||
description: The Kubelet on {{ $labels.node }} has not checked in with the API,
|
},
|
||||||
or has set itself to NotReady, for more than an hour
|
{
|
||||||
summary: Node status is NotReady
|
"expr": "sum(node_load1{job=\"node-exporter\"})\n/\nsum(node:node_num_cpu:sum)\n",
|
||||||
- alert: K8SManyNodesNotReady
|
"record": ":node_cpu_saturation_load1:"
|
||||||
expr: count(kube_node_status_condition{condition="Ready",status="true"} == 0)
|
},
|
||||||
> 1 and (count(kube_node_status_condition{condition="Ready",status="true"} ==
|
{
|
||||||
0) / count(kube_node_status_condition{condition="Ready",status="true"})) > 0.2
|
"expr": "sum by (node) (\n node_load1{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n/\nnode:node_num_cpu:sum\n",
|
||||||
for: 1m
|
"record": "node:node_cpu_saturation_load1:"
|
||||||
labels:
|
},
|
||||||
severity: critical
|
{
|
||||||
annotations:
|
"expr": "1 -\nsum(node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n/\nsum(node_memory_MemTotal_bytes{job=\"node-exporter\"})\n",
|
||||||
description: '{{ $value }}% of Kubernetes nodes are not ready'
|
"record": ":node_memory_utilisation:"
|
||||||
- alert: K8SKubeletDown
|
},
|
||||||
expr: count(up{job="kubelet"} == 0) / count(up{job="kubelet"}) * 100 > 3
|
{
|
||||||
for: 1h
|
"expr": "sum(node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n",
|
||||||
labels:
|
"record": ":node_memory_MemFreeCachedBuffers_bytes:sum"
|
||||||
severity: warning
|
},
|
||||||
annotations:
|
{
|
||||||
description: Prometheus failed to scrape {{ $value }}% of kubelets.
|
"expr": "sum(node_memory_MemTotal_bytes{job=\"node-exporter\"})\n",
|
||||||
- alert: K8SKubeletDown
|
"record": ":node_memory_MemTotal_bytes:sum"
|
||||||
expr: (absent(up{job="kubelet"} == 1) or count(up{job="kubelet"} == 0) / count(up{job="kubelet"}))
|
},
|
||||||
* 100 > 10
|
{
|
||||||
for: 1h
|
"expr": "sum by (node) (\n (node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
|
||||||
labels:
|
"record": "node:node_memory_bytes_available:sum"
|
||||||
severity: critical
|
},
|
||||||
annotations:
|
{
|
||||||
description: Prometheus failed to scrape {{ $value }}% of kubelets, or all Kubelets
|
"expr": "sum by (node) (\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
|
||||||
have disappeared from service discovery.
|
"record": "node:node_memory_bytes_total:sum"
|
||||||
summary: Many Kubelets cannot be scraped
|
},
|
||||||
- alert: K8SKubeletTooManyPods
|
{
|
||||||
expr: kubelet_running_pod_count > 100
|
"expr": "(node:node_memory_bytes_total:sum - node:node_memory_bytes_available:sum)\n/\nnode:node_memory_bytes_total:sum\n",
|
||||||
for: 10m
|
"record": "node:node_memory_utilisation:ratio"
|
||||||
labels:
|
},
|
||||||
severity: warning
|
{
|
||||||
annotations:
|
"expr": "(node:node_memory_bytes_total:sum - node:node_memory_bytes_available:sum)\n/\nscalar(sum(node:node_memory_bytes_total:sum))\n",
|
||||||
description: Kubelet {{$labels.instance}} is running {{$value}} pods, close
|
"record": "node:cluster_memory_utilisation:ratio"
|
||||||
to the limit of 110
|
},
|
||||||
summary: Kubelet is close to pod limit
|
{
|
||||||
kubernetes.rules.yaml: |
|
"expr": "1e3 * sum(\n (rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m])\n + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n)\n",
|
||||||
groups:
|
"record": ":node_memory_swap_io_bytes:sum_rate"
|
||||||
- name: kubernetes.rules
|
},
|
||||||
rules:
|
{
|
||||||
- record: pod_name:container_memory_usage_bytes:sum
|
"expr": "1 -\nsum by (node) (\n (node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"} + node_memory_Buffers_bytes{job=\"node-exporter\"})\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n/\nsum by (node) (\n node_memory_MemTotal_bytes{job=\"node-exporter\"}\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
|
||||||
expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
|
"record": "node:node_memory_utilisation:"
|
||||||
(pod_name)
|
},
|
||||||
- record: pod_name:container_spec_cpu_shares:sum
|
{
|
||||||
expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) BY (pod_name)
|
"expr": "1 - (node:node_memory_bytes_available:sum / node:node_memory_bytes_total:sum)\n",
|
||||||
- record: pod_name:container_cpu_usage:sum
|
"record": "node:node_memory_utilisation_2:"
|
||||||
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
|
},
|
||||||
BY (pod_name)
|
{
|
||||||
- record: pod_name:container_fs_usage_bytes:sum
|
"expr": "1e3 * sum by (node) (\n (rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m])\n + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n * on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
|
||||||
expr: sum(container_fs_usage_bytes{container_name!="POD",pod_name!=""}) BY (pod_name)
|
"record": "node:node_memory_swap_io_bytes:sum_rate"
|
||||||
- record: namespace:container_memory_usage_bytes:sum
|
},
|
||||||
expr: sum(container_memory_usage_bytes{container_name!=""}) BY (namespace)
|
{
|
||||||
- record: namespace:container_spec_cpu_shares:sum
|
"expr": "avg(irate(node_disk_io_time_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\"}[1m]))\n",
|
||||||
expr: sum(container_spec_cpu_shares{container_name!=""}) BY (namespace)
|
"record": ":node_disk_utilisation:avg_irate"
|
||||||
- record: namespace:container_cpu_usage:sum
|
},
|
||||||
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m]))
|
{
|
||||||
BY (namespace)
|
"expr": "avg by (node) (\n irate(node_disk_io_time_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\"}[1m])\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
|
||||||
- record: cluster:memory_usage:ratio
|
"record": "node:node_disk_utilisation:avg_irate"
|
||||||
expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
|
},
|
||||||
(cluster) / sum(machine_memory_bytes) BY (cluster)
|
{
|
||||||
- record: cluster:container_spec_cpu_shares:ratio
|
"expr": "avg(irate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\"}[1m]) / 1e3)\n",
|
||||||
expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) / 1000
|
"record": ":node_disk_saturation:avg_irate"
|
||||||
/ sum(machine_cpu_cores)
|
},
|
||||||
- record: cluster:container_cpu_usage:ratio
|
{
|
||||||
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
|
"expr": "avg by (node) (\n irate(node_disk_io_time_weighted_seconds_total{job=\"node-exporter\",device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\"}[1m]) / 1e3\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
|
||||||
/ sum(machine_cpu_cores)
|
"record": "node:node_disk_saturation:avg_irate"
|
||||||
- record: apiserver_latency_seconds:quantile
|
},
|
||||||
expr: histogram_quantile(0.99, rate(apiserver_request_latencies_bucket[5m])) /
|
{
|
||||||
1e+06
|
"expr": "max by (namespace, pod, device) ((node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"}\n- node_filesystem_avail_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n/ node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n",
|
||||||
labels:
|
"record": "node:node_filesystem_usage:"
|
||||||
quantile: "0.99"
|
},
|
||||||
- record: apiserver_latency:quantile_seconds
|
{
|
||||||
expr: histogram_quantile(0.9, rate(apiserver_request_latencies_bucket[5m])) /
|
"expr": "max by (namespace, pod, device) (node_filesystem_avail_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"} / node_filesystem_size_bytes{fstype=~\"ext[234]|btrfs|xfs|zfs\"})\n",
|
||||||
1e+06
|
"record": "node:node_filesystem_avail:"
|
||||||
labels:
|
},
|
||||||
quantile: "0.9"
|
{
|
||||||
- record: apiserver_latency_seconds:quantile
|
"expr": "sum(irate(node_network_receive_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m])) +\nsum(irate(node_network_transmit_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n",
|
||||||
expr: histogram_quantile(0.5, rate(apiserver_request_latencies_bucket[5m])) /
|
"record": ":node_net_utilisation:sum_irate"
|
||||||
1e+06
|
},
|
||||||
labels:
|
{
|
||||||
quantile: "0.5"
|
"expr": "sum by (node) (\n (irate(node_network_receive_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]) +\n irate(node_network_transmit_bytes_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
|
||||||
- alert: APIServerLatencyHigh
|
"record": "node:node_net_utilisation:sum_irate"
|
||||||
expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
|
},
|
||||||
> 1
|
{
|
||||||
for: 10m
|
"expr": "sum(irate(node_network_receive_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m])) +\nsum(irate(node_network_transmit_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n",
|
||||||
labels:
|
"record": ":node_net_saturation:sum_irate"
|
||||||
severity: warning
|
},
|
||||||
annotations:
|
{
|
||||||
description: the API server has a 99th percentile latency of {{ $value }} seconds
|
"expr": "sum by (node) (\n (irate(node_network_receive_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]) +\n irate(node_network_transmit_drop_total{job=\"node-exporter\",device!~\"veth.+\"}[1m]))\n* on (namespace, pod) group_left(node)\n node_namespace_pod:kube_pod_info:\n)\n",
|
||||||
for {{$labels.verb}} {{$labels.resource}}
|
"record": "node:node_net_saturation:sum_irate"
|
||||||
- alert: APIServerLatencyHigh
|
},
|
||||||
expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
|
{
|
||||||
> 4
|
"expr": "max(\n max(\n kube_pod_info{job=\"kube-state-metrics\", host_ip!=\"\"}\n ) by (node, host_ip)\n * on (host_ip) group_right (node)\n label_replace(\n (max(node_filesystem_files{job=\"node-exporter\", mountpoint=\"/\"}) by (instance)), \"host_ip\", \"$1\", \"instance\", \"(.*):.*\"\n )\n) by (node)\n",
|
||||||
for: 10m
|
"record": "node:node_inodes_total:"
|
||||||
labels:
|
},
|
||||||
severity: critical
|
{
|
||||||
annotations:
|
"expr": "max(\n max(\n kube_pod_info{job=\"kube-state-metrics\", host_ip!=\"\"}\n ) by (node, host_ip)\n * on (host_ip) group_right (node)\n label_replace(\n (max(node_filesystem_files_free{job=\"node-exporter\", mountpoint=\"/\"}) by (instance)), \"host_ip\", \"$1\", \"instance\", \"(.*):.*\"\n )\n) by (node)\n",
|
||||||
description: the API server has a 99th percentile latency of {{ $value }} seconds
|
"record": "node:node_inodes_free:"
|
||||||
for {{$labels.verb}} {{$labels.resource}}
|
}
|
||||||
- alert: APIServerErrorsHigh
|
]
|
||||||
expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
|
},
|
||||||
* 100 > 2
|
{
|
||||||
for: 10m
|
"name": "kubernetes-absent",
|
||||||
labels:
|
"rules": [
|
||||||
severity: warning
|
{
|
||||||
annotations:
|
"alert": "KubeAPIDown",
|
||||||
description: API server returns errors for {{ $value }}% of requests
|
"annotations": {
|
||||||
- alert: APIServerErrorsHigh
|
"message": "KubeAPI has disappeared from Prometheus target discovery.",
|
||||||
expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown"
|
||||||
* 100 > 5
|
},
|
||||||
for: 10m
|
"expr": "absent(up{job=\"apiserver\"} == 1)\n",
|
||||||
labels:
|
"for": "15m",
|
||||||
severity: critical
|
"labels": {
|
||||||
annotations:
|
"severity": "critical"
|
||||||
description: API server returns errors for {{ $value }}% of requests
|
}
|
||||||
- alert: K8SApiserverDown
|
},
|
||||||
expr: absent(up{job="apiserver"} == 1)
|
{
|
||||||
for: 20m
|
"alert": "KubeControllerManagerDown",
|
||||||
labels:
|
"annotations": {
|
||||||
severity: critical
|
"message": "KubeControllerManager has disappeared from Prometheus target discovery.",
|
||||||
annotations:
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown"
|
||||||
description: No API servers are reachable or all have disappeared from service
|
},
|
||||||
discovery
|
"expr": "absent(up{job=\"kube-controller-manager\"} == 1)\n",
|
||||||
|
"for": "15m",
|
||||||
- alert: K8sCertificateExpirationNotice
|
"labels": {
|
||||||
labels:
|
"severity": "critical"
|
||||||
severity: warning
|
}
|
||||||
annotations:
|
},
|
||||||
description: Kubernetes API Certificate is expiring soon (less than 7 days)
|
{
|
||||||
expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="604800"}) > 0
|
"alert": "KubeSchedulerDown",
|
||||||
|
"annotations": {
|
||||||
- alert: K8sCertificateExpirationNotice
|
"message": "KubeScheduler has disappeared from Prometheus target discovery.",
|
||||||
labels:
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeschedulerdown"
|
||||||
severity: critical
|
},
|
||||||
annotations:
|
"expr": "absent(up{job=\"kube-scheduler\"} == 1)\n",
|
||||||
description: Kubernetes API Certificate is expiring in less than 1 day
|
"for": "15m",
|
||||||
expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="86400"}) > 0
|
"labels": {
|
||||||
node.rules.yaml: |
|
"severity": "critical"
|
||||||
groups:
|
}
|
||||||
- name: node.rules
|
},
|
||||||
rules:
|
{
|
||||||
- record: instance:node_cpu:rate:sum
|
"alert": "KubeletDown",
|
||||||
expr: sum(rate(node_cpu{mode!="idle",mode!="iowait",mode!~"^(?:guest.*)$"}[3m]))
|
"annotations": {
|
||||||
BY (instance)
|
"message": "Kubelet has disappeared from Prometheus target discovery.",
|
||||||
- record: instance:node_filesystem_usage:sum
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown"
|
||||||
expr: sum((node_filesystem_size{mountpoint="/"} - node_filesystem_free{mountpoint="/"}))
|
},
|
||||||
BY (instance)
|
"expr": "absent(up{job=\"kubelet\"} == 1)\n",
|
||||||
- record: instance:node_network_receive_bytes:rate:sum
|
"for": "15m",
|
||||||
expr: sum(rate(node_network_receive_bytes[3m])) BY (instance)
|
"labels": {
|
||||||
- record: instance:node_network_transmit_bytes:rate:sum
|
"severity": "critical"
|
||||||
expr: sum(rate(node_network_transmit_bytes[3m])) BY (instance)
|
}
|
||||||
- record: instance:node_cpu:ratio
|
}
|
||||||
expr: sum(rate(node_cpu{mode!="idle"}[5m])) WITHOUT (cpu, mode) / ON(instance)
|
]
|
||||||
GROUP_LEFT() count(sum(node_cpu) BY (instance, cpu)) BY (instance)
|
},
|
||||||
- record: cluster:node_cpu:sum_rate5m
|
{
|
||||||
expr: sum(rate(node_cpu{mode!="idle"}[5m]))
|
"name": "kubernetes-apps",
|
||||||
- record: cluster:node_cpu:ratio
|
"rules": [
|
||||||
expr: cluster:node_cpu:rate5m / count(sum(node_cpu) BY (instance, cpu))
|
{
|
||||||
- alert: NodeExporterDown
|
"alert": "KubePodCrashLooping",
|
||||||
expr: absent(up{job="node-exporter"} == 1)
|
"annotations": {
|
||||||
for: 10m
|
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) is restarting {{ printf \"%.2f\" $value }} times / 5 minutes.",
|
||||||
labels:
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping"
|
||||||
severity: warning
|
},
|
||||||
annotations:
|
"expr": "rate(kube_pod_container_status_restarts_total{job=\"kube-state-metrics\"}[15m]) * 60 * 5 > 0\n",
|
||||||
description: Prometheus could not scrape a node-exporter for more than 10m,
|
"for": "1h",
|
||||||
or node-exporters have disappeared from discovery
|
"labels": {
|
||||||
- alert: NodeDiskRunningFull
|
"severity": "critical"
|
||||||
expr: predict_linear(node_filesystem_free[6h], 3600 * 24) < 0
|
}
|
||||||
for: 30m
|
},
|
||||||
labels:
|
{
|
||||||
severity: warning
|
"alert": "KubePodNotReady",
|
||||||
annotations:
|
"annotations": {
|
||||||
description: device {{$labels.device}} on node {{$labels.instance}} is running
|
"message": "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than an hour.",
|
||||||
full within the next 24 hours (mounted at {{$labels.mountpoint}})
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready"
|
||||||
- alert: NodeDiskRunningFull
|
},
|
||||||
expr: predict_linear(node_filesystem_free[30m], 3600 * 2) < 0
|
"expr": "sum by (namespace, pod) (kube_pod_status_phase{job=\"kube-state-metrics\", phase=~\"Pending|Unknown\"}) > 0\n",
|
||||||
for: 10m
|
"for": "1h",
|
||||||
labels:
|
"labels": {
|
||||||
severity: critical
|
"severity": "critical"
|
||||||
annotations:
|
}
|
||||||
description: device {{$labels.device}} on node {{$labels.instance}} is running
|
},
|
||||||
full within the next 2 hours (mounted at {{$labels.mountpoint}})
|
{
|
||||||
- alert: InactiveRAIDDisk
|
"alert": "KubeDeploymentGenerationMismatch",
|
||||||
expr: node_md_disks - node_md_disks_active > 0
|
"annotations": {
|
||||||
for: 10m
|
"message": "Deployment generation for {{ $labels.namespace }}/{{ $labels.deployment }} does not match, this indicates that the Deployment has failed but has not been rolled back.",
|
||||||
labels:
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentgenerationmismatch"
|
||||||
severity: warning
|
},
|
||||||
annotations:
|
"expr": "kube_deployment_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_deployment_metadata_generation{job=\"kube-state-metrics\"}\n",
|
||||||
description: '{{$value}} RAID disk(s) on node {{$labels.instance}} are inactive'
|
"for": "15m",
|
||||||
prometheus.rules.yaml: |
|
"labels": {
|
||||||
groups:
|
"severity": "critical"
|
||||||
- name: prometheus.rules
|
}
|
||||||
rules:
|
},
|
||||||
- alert: PrometheusConfigReloadFailed
|
{
|
||||||
expr: prometheus_config_last_reload_successful == 0
|
"alert": "KubeDeploymentReplicasMismatch",
|
||||||
for: 10m
|
"annotations": {
|
||||||
labels:
|
"message": "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has not matched the expected number of replicas for longer than an hour.",
|
||||||
severity: warning
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedeploymentreplicasmismatch"
|
||||||
annotations:
|
},
|
||||||
description: Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}}
|
"expr": "kube_deployment_spec_replicas{job=\"kube-state-metrics\"}\n !=\nkube_deployment_status_replicas_available{job=\"kube-state-metrics\"}\n",
|
||||||
- alert: PrometheusNotificationQueueRunningFull
|
"for": "1h",
|
||||||
expr: predict_linear(prometheus_notifications_queue_length[5m], 60 * 30) > prometheus_notifications_queue_capacity
|
"labels": {
|
||||||
for: 10m
|
"severity": "critical"
|
||||||
labels:
|
}
|
||||||
severity: warning
|
},
|
||||||
annotations:
|
{
|
||||||
description: Prometheus' alert notification queue is running full for {{$labels.namespace}}/{{
|
"alert": "KubeStatefulSetReplicasMismatch",
|
||||||
$labels.pod}}
|
"annotations": {
|
||||||
- alert: PrometheusErrorSendingAlerts
|
"message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 15 minutes.",
|
||||||
expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetreplicasmismatch"
|
||||||
> 0.01
|
},
|
||||||
for: 10m
|
"expr": "kube_statefulset_status_replicas_ready{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_status_replicas{job=\"kube-state-metrics\"}\n",
|
||||||
labels:
|
"for": "15m",
|
||||||
severity: warning
|
"labels": {
|
||||||
annotations:
|
"severity": "critical"
|
||||||
description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
|
}
|
||||||
$labels.pod}} to Alertmanager {{$labels.Alertmanager}}
|
},
|
||||||
- alert: PrometheusErrorSendingAlerts
|
{
|
||||||
expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
|
"alert": "KubeStatefulSetGenerationMismatch",
|
||||||
> 0.03
|
"annotations": {
|
||||||
for: 10m
|
"message": "StatefulSet generation for {{ $labels.namespace }}/{{ $labels.statefulset }} does not match, this indicates that the StatefulSet has failed but has not been rolled back.",
|
||||||
labels:
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetgenerationmismatch"
|
||||||
severity: critical
|
},
|
||||||
annotations:
|
"expr": "kube_statefulset_status_observed_generation{job=\"kube-state-metrics\"}\n !=\nkube_statefulset_metadata_generation{job=\"kube-state-metrics\"}\n",
|
||||||
description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
|
"for": "15m",
|
||||||
$labels.pod}} to Alertmanager {{$labels.Alertmanager}}
|
"labels": {
|
||||||
- alert: PrometheusNotConnectedToAlertmanagers
|
"severity": "critical"
|
||||||
expr: prometheus_notifications_alertmanagers_discovered < 1
|
}
|
||||||
for: 10m
|
},
|
||||||
labels:
|
{
|
||||||
severity: warning
|
"alert": "KubeStatefulSetUpdateNotRolledOut",
|
||||||
annotations:
|
"annotations": {
|
||||||
description: Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected
|
"message": "StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} update has not been rolled out.",
|
||||||
to any Alertmanagers
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubestatefulsetupdatenotrolledout"
|
||||||
- alert: PrometheusTSDBReloadsFailing
|
},
|
||||||
expr: increase(prometheus_tsdb_reloads_failures_total[2h]) > 0
|
"expr": "max without (revision) (\n kube_statefulset_status_current_revision{job=\"kube-state-metrics\"}\n unless\n kube_statefulset_status_update_revision{job=\"kube-state-metrics\"}\n)\n *\n(\n kube_statefulset_replicas{job=\"kube-state-metrics\"}\n !=\n kube_statefulset_status_replicas_updated{job=\"kube-state-metrics\"}\n)\n",
|
||||||
for: 12h
|
"for": "15m",
|
||||||
labels:
|
"labels": {
|
||||||
severity: warning
|
"severity": "critical"
|
||||||
annotations:
|
}
|
||||||
description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}
|
},
|
||||||
reload failures over the last four hours.'
|
{
|
||||||
summary: Prometheus has issues reloading data blocks from disk
|
"alert": "KubeDaemonSetRolloutStuck",
|
||||||
- alert: PrometheusTSDBCompactionsFailing
|
"annotations": {
|
||||||
expr: increase(prometheus_tsdb_compactions_failed_total[2h]) > 0
|
"message": "Only {{ $value }}% of the desired Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are scheduled and ready.",
|
||||||
for: 12h
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck"
|
||||||
labels:
|
},
|
||||||
severity: warning
|
"expr": "kube_daemonset_status_number_ready{job=\"kube-state-metrics\"}\n /\nkube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"} * 100 < 100\n",
|
||||||
annotations:
|
"for": "15m",
|
||||||
description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}
|
"labels": {
|
||||||
compaction failures over the last four hours.'
|
"severity": "critical"
|
||||||
summary: Prometheus has issues compacting sample blocks
|
}
|
||||||
- alert: PrometheusTSDBWALCorruptions
|
},
|
||||||
expr: tsdb_wal_corruptions_total > 0
|
{
|
||||||
for: 4h
|
"alert": "KubeDaemonSetNotScheduled",
|
||||||
labels:
|
"annotations": {
|
||||||
severity: warning
|
"message": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled.",
|
||||||
annotations:
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetnotscheduled"
|
||||||
description: '{{$labels.job}} at {{$labels.instance}} has a corrupted write-ahead
|
},
|
||||||
log (WAL).'
|
"expr": "kube_daemonset_status_desired_number_scheduled{job=\"kube-state-metrics\"}\n -\nkube_daemonset_status_current_number_scheduled{job=\"kube-state-metrics\"} > 0\n",
|
||||||
summary: Prometheus write-ahead log is corrupted
|
"for": "10m",
|
||||||
- alert: PrometheusNotIngestingSamples
|
"labels": {
|
||||||
expr: rate(prometheus_tsdb_head_samples_appended_total[5m]) <= 0
|
"severity": "warning"
|
||||||
for: 10m
|
}
|
||||||
labels:
|
},
|
||||||
severity: warning
|
{
|
||||||
annotations:
|
"alert": "KubeDaemonSetMisScheduled",
|
||||||
description: "Prometheus {{ $labels.namespace }}/{{ $labels.pod}} isn't ingesting samples."
|
"annotations": {
|
||||||
summary: "Prometheus isn't ingesting samples"
|
"message": "{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are running where they are not supposed to run.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetmisscheduled"
|
||||||
|
},
|
||||||
|
"expr": "kube_daemonset_status_number_misscheduled{job=\"kube-state-metrics\"} > 0\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeCronJobRunning",
|
||||||
|
"annotations": {
|
||||||
|
"message": "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is taking more than 1h to complete.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecronjobrunning"
|
||||||
|
},
|
||||||
|
"expr": "time() - kube_cronjob_next_schedule_time{job=\"kube-state-metrics\"} > 3600\n",
|
||||||
|
"for": "1h",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeJobCompletion",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more than one hour to complete.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobcompletion"
|
||||||
|
},
|
||||||
|
"expr": "kube_job_spec_completions{job=\"kube-state-metrics\"} - kube_job_status_succeeded{job=\"kube-state-metrics\"} > 0\n",
|
||||||
|
"for": "1h",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeJobFailed",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubejobfailed"
|
||||||
|
},
|
||||||
|
"expr": "kube_job_status_failed{job=\"kube-state-metrics\"} > 0\n",
|
||||||
|
"for": "1h",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "kubernetes-resources",
|
||||||
|
"rules": [
|
||||||
|
{
|
||||||
|
"alert": "KubeCPUOvercommit",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit"
|
||||||
|
},
|
||||||
|
"expr": "sum(namespace_name:kube_pod_container_resource_requests_cpu_cores:sum)\n /\nsum(node:node_num_cpu:sum)\n >\n(count(node:node_num_cpu:sum)-1) / count(node:node_num_cpu:sum)\n",
|
||||||
|
"for": "5m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeMemOvercommit",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Cluster has overcommitted memory resource requests for Pods and cannot tolerate node failure.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit"
|
||||||
|
},
|
||||||
|
"expr": "sum(namespace_name:kube_pod_container_resource_requests_memory_bytes:sum)\n /\nsum(node_memory_MemTotal_bytes)\n >\n(count(node:node_num_cpu:sum)-1)\n /\ncount(node:node_num_cpu:sum)\n",
|
||||||
|
"for": "5m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeCPUOvercommit",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Cluster has overcommitted CPU resource requests for Namespaces.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit"
|
||||||
|
},
|
||||||
|
"expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"requests.cpu\"})\n /\nsum(node:node_num_cpu:sum)\n > 1.5\n",
|
||||||
|
"for": "5m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeMemOvercommit",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Cluster has overcommitted memory resource requests for Namespaces.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememovercommit"
|
||||||
|
},
|
||||||
|
"expr": "sum(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\", resource=\"requests.memory\"})\n /\nsum(node_memory_MemTotal_bytes{job=\"node-exporter\"})\n > 1.5\n",
|
||||||
|
"for": "5m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeQuotaExceeded",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Namespace {{ $labels.namespace }} is using {{ printf \"%0.0f\" $value }}% of its {{ $labels.resource }} quota.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubequotaexceeded"
|
||||||
|
},
|
||||||
|
"expr": "100 * kube_resourcequota{job=\"kube-state-metrics\", type=\"used\"}\n / ignoring(instance, job, type)\n(kube_resourcequota{job=\"kube-state-metrics\", type=\"hard\"} > 0)\n > 90\n",
|
||||||
|
"for": "15m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "CPUThrottlingHigh",
|
||||||
|
"annotations": {
|
||||||
|
"message": "{{ printf \"%0.0f\" $value }}% throttling of CPU in namespace {{ $labels.namespace }} for container {{ $labels.container_name }} in pod {{ $labels.pod_name }}.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-cputhrottlinghigh"
|
||||||
|
},
|
||||||
|
"expr": "100 * sum(increase(container_cpu_cfs_throttled_periods_total{container_name!=\"\", }[5m])) by (container_name, pod_name, namespace)\n /\nsum(increase(container_cpu_cfs_periods_total{}[5m])) by (container_name, pod_name, namespace)\n > 100 \n",
|
||||||
|
"for": "15m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "kubernetes-storage",
|
||||||
|
"rules": [
|
||||||
|
{
|
||||||
|
"alert": "KubePersistentVolumeUsageCritical",
|
||||||
|
"annotations": {
|
||||||
|
"message": "The PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is only {{ printf \"%0.2f\" $value }}% free.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeusagecritical"
|
||||||
|
},
|
||||||
|
"expr": "100 * kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\nkubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n < 3\n",
|
||||||
|
"for": "1m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "critical"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubePersistentVolumeFullInFourDays",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Based on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace }} is expected to fill up within four days. Currently {{ printf \"%0.2f\" $value }}% is available.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumefullinfourdays"
|
||||||
|
},
|
||||||
|
"expr": "100 * (\n kubelet_volume_stats_available_bytes{job=\"kubelet\"}\n /\n kubelet_volume_stats_capacity_bytes{job=\"kubelet\"}\n) < 15\nand\npredict_linear(kubelet_volume_stats_available_bytes{job=\"kubelet\"}[6h], 4 * 24 * 3600) < 0\n",
|
||||||
|
"for": "5m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "critical"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubePersistentVolumeErrors",
|
||||||
|
"annotations": {
|
||||||
|
"message": "The persistent volume {{ $labels.persistentvolume }} has status {{ $labels.phase }}.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeerrors"
|
||||||
|
},
|
||||||
|
"expr": "kube_persistentvolume_status_phase{phase=~\"Failed|Pending\",job=\"kube-state-metrics\"} > 0\n",
|
||||||
|
"for": "5m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "critical"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "kubernetes-system",
|
||||||
|
"rules": [
|
||||||
|
{
|
||||||
|
"alert": "KubeNodeNotReady",
|
||||||
|
"annotations": {
|
||||||
|
"message": "{{ $labels.node }} has been unready for more than an hour.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodenotready"
|
||||||
|
},
|
||||||
|
"expr": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n",
|
||||||
|
"for": "1h",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeVersionMismatch",
|
||||||
|
"annotations": {
|
||||||
|
"message": "There are {{ $value }} different semantic versions of Kubernetes components running.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch"
|
||||||
|
},
|
||||||
|
"expr": "count(count by (gitVersion) (label_replace(kubernetes_build_info{job!=\"coredns\"},\"gitVersion\",\"$1\",\"gitVersion\",\"(v[0-9]*.[0-9]*.[0-9]*).*\"))) > 1\n",
|
||||||
|
"for": "1h",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeClientErrors",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ printf \"%0.0f\" $value }}% errors.'",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors"
|
||||||
|
},
|
||||||
|
"expr": "(sum(rate(rest_client_requests_total{code=~\"5..\"}[5m])) by (instance, job)\n /\nsum(rate(rest_client_requests_total[5m])) by (instance, job))\n* 100 > 1\n",
|
||||||
|
"for": "15m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeClientErrors",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance }}' is experiencing {{ printf \"%0.0f\" $value }} errors / second.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors"
|
||||||
|
},
|
||||||
|
"expr": "sum(rate(ksm_scrape_error_total{job=\"kube-state-metrics\"}[5m])) by (instance, job) > 0.1\n",
|
||||||
|
"for": "15m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeletTooManyPods",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Kubelet {{ $labels.instance }} is running {{ $value }} Pods, close to the limit of 110.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods"
|
||||||
|
},
|
||||||
|
"expr": "kubelet_running_pod_count{job=\"kubelet\"} > 110 * 0.9\n",
|
||||||
|
"for": "15m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeAPILatencyHigh",
|
||||||
|
"annotations": {
|
||||||
|
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
|
||||||
|
},
|
||||||
|
"expr": "cluster_quantile:apiserver_request_latencies:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"^(?:LIST|WATCH|WATCHLIST|PROXY|CONNECT)$\"} > 1\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeAPILatencyHigh",
|
||||||
|
"annotations": {
|
||||||
|
"message": "The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh"
|
||||||
|
},
|
||||||
|
"expr": "cluster_quantile:apiserver_request_latencies:histogram_quantile{job=\"apiserver\",quantile=\"0.99\",subresource!=\"log\",verb!~\"^(?:LIST|WATCH|WATCHLIST|PROXY|CONNECT)$\"} > 4\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "critical"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeAPIErrorsHigh",
|
||||||
|
"annotations": {
|
||||||
|
"message": "API server is returning errors for {{ $value }}% of requests.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
|
||||||
|
},
|
||||||
|
"expr": "sum(rate(apiserver_request_count{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m])) without(instance, pod)\n /\nsum(rate(apiserver_request_count{job=\"apiserver\"}[5m])) without(instance, pod) * 100 > 10\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "critical"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeAPIErrorsHigh",
|
||||||
|
"annotations": {
|
||||||
|
"message": "API server is returning errors for {{ $value }}% of requests.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh"
|
||||||
|
},
|
||||||
|
"expr": "sum(rate(apiserver_request_count{job=\"apiserver\",code=~\"^(?:5..)$\"}[5m])) without(instance, pod)\n /\nsum(rate(apiserver_request_count{job=\"apiserver\"}[5m])) without(instance, pod) * 100 > 5\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeClientCertificateExpiration",
|
||||||
|
"annotations": {
|
||||||
|
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 7 days.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration"
|
||||||
|
},
|
||||||
|
"expr": "histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 604800\n",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "KubeClientCertificateExpiration",
|
||||||
|
"annotations": {
|
||||||
|
"message": "A client certificate used to authenticate to the apiserver is expiring in less than 24 hours.",
|
||||||
|
"runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclientcertificateexpiration"
|
||||||
|
},
|
||||||
|
"expr": "histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 86400\n",
|
||||||
|
"labels": {
|
||||||
|
"severity": "critical"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
kubeprom.yaml: |-
|
||||||
|
{
|
||||||
|
"groups": [
|
||||||
|
{
|
||||||
|
"name": "kube-prometheus-node-recording.rules",
|
||||||
|
"rules": [
|
||||||
|
{
|
||||||
|
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[3m])) BY (instance)",
|
||||||
|
"record": "instance:node_cpu:rate:sum"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"expr": "sum((node_filesystem_size_bytes{mountpoint=\"/\"} - node_filesystem_free_bytes{mountpoint=\"/\"})) BY (instance)",
|
||||||
|
"record": "instance:node_filesystem_usage:sum"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"expr": "sum(rate(node_network_receive_bytes_total[3m])) BY (instance)",
|
||||||
|
"record": "instance:node_network_receive_bytes:rate:sum"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"expr": "sum(rate(node_network_transmit_bytes_total[3m])) BY (instance)",
|
||||||
|
"record": "instance:node_network_transmit_bytes:rate:sum"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[5m])) WITHOUT (cpu, mode) / ON(instance) GROUP_LEFT() count(sum(node_cpu_seconds_total) BY (instance, cpu)) BY (instance)",
|
||||||
|
"record": "instance:node_cpu:ratio"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"expr": "sum(rate(node_cpu_seconds_total{mode!=\"idle\",mode!=\"iowait\"}[5m]))",
|
||||||
|
"record": "cluster:node_cpu:sum_rate5m"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"expr": "cluster:node_cpu_seconds_total:rate5m / count(sum(node_cpu_seconds_total) BY (instance, cpu))",
|
||||||
|
"record": "cluster:node_cpu:ratio"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "kube-prometheus-node-alerting.rules",
|
||||||
|
"rules": [
|
||||||
|
{
|
||||||
|
"alert": "NodeDiskRunningFull",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Device {{ $labels.device }} of node-exporter {{ $labels.namespace }}/{{ $labels.pod }} will be full within the next 24 hours."
|
||||||
|
},
|
||||||
|
"expr": "(node:node_filesystem_usage: > 0.85) and (predict_linear(node:node_filesystem_avail:[6h], 3600 * 24) < 0)\n",
|
||||||
|
"for": "30m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "NodeDiskRunningFull",
|
||||||
|
"annotations": {
|
||||||
|
"message": "Device {{ $labels.device }} of node-exporter {{ $labels.namespace }}/{{ $labels.pod }} will be full within the next 2 hours."
|
||||||
|
},
|
||||||
|
"expr": "(node:node_filesystem_usage: > 0.85) and (predict_linear(node:node_filesystem_avail:[30m], 3600 * 2) < 0)\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "critical"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "prometheus.rules",
|
||||||
|
"rules": [
|
||||||
|
{
|
||||||
|
"alert": "PrometheusConfigReloadFailed",
|
||||||
|
"annotations": {
|
||||||
|
"description": "Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}}",
|
||||||
|
"summary": "Reloading Prometheus' configuration failed"
|
||||||
|
},
|
||||||
|
"expr": "prometheus_config_last_reload_successful{job=\"prometheus\"} == 0\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "PrometheusNotificationQueueRunningFull",
|
||||||
|
"annotations": {
|
||||||
|
"description": "Prometheus' alert notification queue is running full for {{$labels.namespace}}/{{ $labels.pod}}",
|
||||||
|
"summary": "Prometheus' alert notification queue is running full"
|
||||||
|
},
|
||||||
|
"expr": "predict_linear(prometheus_notifications_queue_length{job=\"prometheus\"}[5m], 60 * 30) > prometheus_notifications_queue_capacity{job=\"prometheus\"}\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "PrometheusErrorSendingAlerts",
|
||||||
|
"annotations": {
|
||||||
|
"description": "Errors while sending alerts from Prometheus {{$labels.namespace}}/{{ $labels.pod}} to Alertmanager {{$labels.Alertmanager}}",
|
||||||
|
"summary": "Errors while sending alert from Prometheus"
|
||||||
|
},
|
||||||
|
"expr": "rate(prometheus_notifications_errors_total{job=\"prometheus\"}[5m]) / rate(prometheus_notifications_sent_total{job=\"prometheus\"}[5m]) > 0.01\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "PrometheusErrorSendingAlerts",
|
||||||
|
"annotations": {
|
||||||
|
"description": "Errors while sending alerts from Prometheus {{$labels.namespace}}/{{ $labels.pod}} to Alertmanager {{$labels.Alertmanager}}",
|
||||||
|
"summary": "Errors while sending alerts from Prometheus"
|
||||||
|
},
|
||||||
|
"expr": "rate(prometheus_notifications_errors_total{job=\"prometheus\"}[5m]) / rate(prometheus_notifications_sent_total{job=\"prometheus\"}[5m]) > 0.03\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "critical"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "PrometheusNotConnectedToAlertmanagers",
|
||||||
|
"annotations": {
|
||||||
|
"description": "Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected to any Alertmanagers",
|
||||||
|
"summary": "Prometheus is not connected to any Alertmanagers"
|
||||||
|
},
|
||||||
|
"expr": "prometheus_notifications_alertmanagers_discovered{job=\"prometheus\"} < 1\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "PrometheusTSDBReloadsFailing",
|
||||||
|
"annotations": {
|
||||||
|
"description": "{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}} reload failures over the last four hours.",
|
||||||
|
"summary": "Prometheus has issues reloading data blocks from disk"
|
||||||
|
},
|
||||||
|
"expr": "increase(prometheus_tsdb_reloads_failures_total{job=\"prometheus\"}[2h]) > 0\n",
|
||||||
|
"for": "12h",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "PrometheusTSDBCompactionsFailing",
|
||||||
|
"annotations": {
|
||||||
|
"description": "{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}} compaction failures over the last four hours.",
|
||||||
|
"summary": "Prometheus has issues compacting sample blocks"
|
||||||
|
},
|
||||||
|
"expr": "increase(prometheus_tsdb_compactions_failed_total{job=\"prometheus\"}[2h]) > 0\n",
|
||||||
|
"for": "12h",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "PrometheusTSDBWALCorruptions",
|
||||||
|
"annotations": {
|
||||||
|
"description": "{{$labels.job}} at {{$labels.instance}} has a corrupted write-ahead log (WAL).",
|
||||||
|
"summary": "Prometheus write-ahead log is corrupted"
|
||||||
|
},
|
||||||
|
"expr": "tsdb_wal_corruptions_total{job=\"prometheus\"} > 0\n",
|
||||||
|
"for": "4h",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "PrometheusNotIngestingSamples",
|
||||||
|
"annotations": {
|
||||||
|
"description": "Prometheus {{ $labels.namespace }}/{{ $labels.pod}} isn't ingesting samples.",
|
||||||
|
"summary": "Prometheus isn't ingesting samples"
|
||||||
|
},
|
||||||
|
"expr": "rate(prometheus_tsdb_head_samples_appended_total{job=\"prometheus\"}[5m]) <= 0\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"alert": "PrometheusTargetScrapesDuplicate",
|
||||||
|
"annotations": {
|
||||||
|
"description": "{{$labels.namespace}}/{{$labels.pod}} has many samples rejected due to duplicate timestamps but different values",
|
||||||
|
"summary": "Prometheus has many samples rejected"
|
||||||
|
},
|
||||||
|
"expr": "increase(prometheus_target_scrapes_sample_duplicate_timestamp_total{job=\"prometheus\"}[5m]) > 0\n",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "general.rules",
|
||||||
|
"rules": [
|
||||||
|
{
|
||||||
|
"alert": "TargetDown",
|
||||||
|
"annotations": {
|
||||||
|
"message": "{{ $value }}% of the {{ $labels.job }} targets are down."
|
||||||
|
},
|
||||||
|
"expr": "100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10",
|
||||||
|
"for": "10m",
|
||||||
|
"labels": {
|
||||||
|
"severity": "warning"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
@ -11,10 +11,10 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
|
|||||||
|
|
||||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||||
|
|
||||||
* Kubernetes v1.12.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
* Kubernetes v1.13.4 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||||
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/)
|
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [spot](https://typhoon.psdn.io/cl/aws/#spot) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
|
||||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||||
|
|
||||||
## Docs
|
## Docs
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||||
module "bootkube" {
|
module "bootkube" {
|
||||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=f39f8294c465397e622c606174e6f412ee3ca0f8"
|
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=953521dbba49eb6a39204f30a3978730eac01e11"
|
||||||
|
|
||||||
cluster_name = "${var.cluster_name}"
|
cluster_name = "${var.cluster_name}"
|
||||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||||
@ -11,4 +11,5 @@ module "bootkube" {
|
|||||||
pod_cidr = "${var.pod_cidr}"
|
pod_cidr = "${var.pod_cidr}"
|
||||||
service_cidr = "${var.service_cidr}"
|
service_cidr = "${var.service_cidr}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
|
enable_reporting = "${var.enable_reporting}"
|
||||||
}
|
}
|
||||||
|
@ -7,7 +7,7 @@ systemd:
|
|||||||
- name: 40-etcd-cluster.conf
|
- name: 40-etcd-cluster.conf
|
||||||
contents: |
|
contents: |
|
||||||
[Service]
|
[Service]
|
||||||
Environment="ETCD_IMAGE_TAG=v3.3.10"
|
Environment="ETCD_IMAGE_TAG=v3.3.12"
|
||||||
Environment="ETCD_NAME=${etcd_name}"
|
Environment="ETCD_NAME=${etcd_name}"
|
||||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
||||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
||||||
@ -78,7 +78,7 @@ systemd:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -123,7 +123,7 @@ storage:
|
|||||||
contents:
|
contents:
|
||||||
inline: |
|
inline: |
|
||||||
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
||||||
KUBELET_IMAGE_TAG=v1.12.2
|
KUBELET_IMAGE_TAG=v1.13.4
|
||||||
- path: /etc/sysctl.d/max-user-watches.conf
|
- path: /etc/sysctl.d/max-user-watches.conf
|
||||||
filesystem: root
|
filesystem: root
|
||||||
contents:
|
contents:
|
||||||
@ -143,17 +143,14 @@ storage:
|
|||||||
set -e
|
set -e
|
||||||
# Move experimental manifests
|
# Move experimental manifests
|
||||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
|
||||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.13.0}"
|
|
||||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
|
||||||
exec /usr/bin/rkt run \
|
exec /usr/bin/rkt run \
|
||||||
--trust-keys-from-https \
|
--trust-keys-from-https \
|
||||||
--volume assets,kind=host,source=$${BOOTKUBE_ASSETS} \
|
--volume assets,kind=host,source=/opt/bootkube/assets \
|
||||||
--mount volume=assets,target=/assets \
|
--mount volume=assets,target=/assets \
|
||||||
--volume bootstrap,kind=host,source=/etc/kubernetes \
|
--volume bootstrap,kind=host,source=/etc/kubernetes \
|
||||||
--mount volume=bootstrap,target=/etc/kubernetes \
|
--mount volume=bootstrap,target=/etc/kubernetes \
|
||||||
$${RKT_OPTS} \
|
$${RKT_OPTS} \
|
||||||
$${BOOTKUBE_ACI}:$${BOOTKUBE_VERSION} \
|
quay.io/coreos/bootkube:v0.14.0 \
|
||||||
--net=host \
|
--net=host \
|
||||||
--dns=host \
|
--dns=host \
|
||||||
--exec=/bootkube -- start --asset-dir=/assets "$@"
|
--exec=/bootkube -- start --asset-dir=/assets "$@"
|
||||||
|
@ -68,9 +68,9 @@ data "template_file" "controller-configs" {
|
|||||||
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
||||||
etcd_initial_cluster = "${join(",", data.template_file.etcds.*.rendered)}"
|
etcd_initial_cluster = "${join(",", data.template_file.etcds.*.rendered)}"
|
||||||
|
|
||||||
kubeconfig = "${indent(10, module.bootkube.kubeconfig)}"
|
kubeconfig = "${indent(10, module.bootkube.kubeconfig-kubelet)}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
cluster_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -1,3 +1,7 @@
|
|||||||
|
output "kubeconfig-admin" {
|
||||||
|
value = "${module.bootkube.kubeconfig-admin}"
|
||||||
|
}
|
||||||
|
|
||||||
# Outputs for Kubernetes Ingress
|
# Outputs for Kubernetes Ingress
|
||||||
|
|
||||||
output "ingress_dns_name" {
|
output "ingress_dns_name" {
|
||||||
@ -5,6 +9,11 @@ output "ingress_dns_name" {
|
|||||||
description = "DNS name of the network load balancer for distributing traffic to Ingress controllers"
|
description = "DNS name of the network load balancer for distributing traffic to Ingress controllers"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
output "ingress_zone_id" {
|
||||||
|
value = "${aws_lb.nlb.zone_id}"
|
||||||
|
description = "Route53 zone id of the network load balancer DNS name that can be used in Route53 alias records"
|
||||||
|
}
|
||||||
|
|
||||||
# Outputs for worker pools
|
# Outputs for worker pools
|
||||||
|
|
||||||
output "vpc_id" {
|
output "vpc_id" {
|
||||||
@ -23,7 +32,7 @@ output "worker_security_groups" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
output "kubeconfig" {
|
output "kubeconfig" {
|
||||||
value = "${module.bootkube.kubeconfig}"
|
value = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
}
|
}
|
||||||
|
|
||||||
# Outputs for custom load balancing
|
# Outputs for custom load balancing
|
||||||
|
@ -31,13 +31,13 @@ variable "worker_count" {
|
|||||||
|
|
||||||
variable "controller_type" {
|
variable "controller_type" {
|
||||||
type = "string"
|
type = "string"
|
||||||
default = "t2.small"
|
default = "t3.small"
|
||||||
description = "EC2 instance type for controllers"
|
description = "EC2 instance type for controllers"
|
||||||
}
|
}
|
||||||
|
|
||||||
variable "worker_type" {
|
variable "worker_type" {
|
||||||
type = "string"
|
type = "string"
|
||||||
default = "t2.small"
|
default = "t3.small"
|
||||||
description = "EC2 instance type for workers"
|
description = "EC2 instance type for workers"
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -134,3 +134,9 @@ variable "cluster_domain_suffix" {
|
|||||||
type = "string"
|
type = "string"
|
||||||
default = "cluster.local"
|
default = "cluster.local"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "enable_reporting" {
|
||||||
|
type = "string"
|
||||||
|
description = "Enable usage or analytics reporting to upstreams (Calico)"
|
||||||
|
default = "false"
|
||||||
|
}
|
||||||
|
@ -13,7 +13,7 @@ module "workers" {
|
|||||||
spot_price = "${var.worker_price}"
|
spot_price = "${var.worker_price}"
|
||||||
|
|
||||||
# configuration
|
# configuration
|
||||||
kubeconfig = "${module.bootkube.kubeconfig}"
|
kubeconfig = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
service_cidr = "${var.service_cidr}"
|
service_cidr = "${var.service_cidr}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
|
@ -51,7 +51,7 @@ systemd:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -93,7 +93,7 @@ storage:
|
|||||||
contents:
|
contents:
|
||||||
inline: |
|
inline: |
|
||||||
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
||||||
KUBELET_IMAGE_TAG=v1.12.2
|
KUBELET_IMAGE_TAG=v1.13.4
|
||||||
- path: /etc/sysctl.d/max-user-watches.conf
|
- path: /etc/sysctl.d/max-user-watches.conf
|
||||||
filesystem: root
|
filesystem: root
|
||||||
contents:
|
contents:
|
||||||
@ -111,7 +111,7 @@ storage:
|
|||||||
--volume config,kind=host,source=/etc/kubernetes \
|
--volume config,kind=host,source=/etc/kubernetes \
|
||||||
--mount volume=config,target=/etc/kubernetes \
|
--mount volume=config,target=/etc/kubernetes \
|
||||||
--insecure-options=image \
|
--insecure-options=image \
|
||||||
docker://k8s.gcr.io/hyperkube:v1.12.2 \
|
docker://k8s.gcr.io/hyperkube:v1.13.4 \
|
||||||
--net=host \
|
--net=host \
|
||||||
--dns=host \
|
--dns=host \
|
||||||
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
|
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
|
||||||
|
@ -30,7 +30,7 @@ variable "count" {
|
|||||||
|
|
||||||
variable "instance_type" {
|
variable "instance_type" {
|
||||||
type = "string"
|
type = "string"
|
||||||
default = "t2.small"
|
default = "t3.small"
|
||||||
description = "EC2 instance type"
|
description = "EC2 instance type"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -79,7 +79,7 @@ data "template_file" "worker-config" {
|
|||||||
vars = {
|
vars = {
|
||||||
kubeconfig = "${indent(10, var.kubeconfig)}"
|
kubeconfig = "${indent(10, var.kubeconfig)}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
cluster_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -11,10 +11,10 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
|
|||||||
|
|
||||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||||
|
|
||||||
* Kubernetes v1.12.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
* Kubernetes v1.13.4 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||||
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/)
|
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/) and [spot](https://typhoon.psdn.io/cl/aws/#spot) workers
|
||||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||||
|
|
||||||
## Docs
|
## Docs
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||||
module "bootkube" {
|
module "bootkube" {
|
||||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=f39f8294c465397e622c606174e6f412ee3ca0f8"
|
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=953521dbba49eb6a39204f30a3978730eac01e11"
|
||||||
|
|
||||||
cluster_name = "${var.cluster_name}"
|
cluster_name = "${var.cluster_name}"
|
||||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||||
@ -11,6 +11,7 @@ module "bootkube" {
|
|||||||
pod_cidr = "${var.pod_cidr}"
|
pod_cidr = "${var.pod_cidr}"
|
||||||
service_cidr = "${var.service_cidr}"
|
service_cidr = "${var.service_cidr}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
|
enable_reporting = "${var.enable_reporting}"
|
||||||
|
|
||||||
# Fedora
|
# Fedora
|
||||||
trusted_certs_dir = "/etc/pki/tls/certs"
|
trusted_certs_dir = "/etc/pki/tls/certs"
|
||||||
|
@ -19,24 +19,9 @@ write_files:
|
|||||||
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
|
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
|
||||||
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
|
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
|
||||||
ETCD_PEER_CLIENT_CERT_AUTH=true
|
ETCD_PEER_CLIENT_CERT_AUTH=true
|
||||||
- path: /etc/systemd/system/cloud-metadata.service
|
|
||||||
content: |
|
|
||||||
[Unit]
|
|
||||||
Description=Cloud metadata agent
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
Environment=OUTPUT=/run/metadata/cloud
|
|
||||||
ExecStart=/usr/bin/mkdir -p /run/metadata
|
|
||||||
ExecStart=/usr/bin/bash -c 'echo "HOSTNAME_OVERRIDE=$(curl\
|
|
||||||
--url http://169.254.169.254/latest/meta-data/local-ipv4\
|
|
||||||
--retry 10)" > $${OUTPUT}'
|
|
||||||
[Install]
|
|
||||||
WantedBy=multi-user.target
|
|
||||||
- path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf
|
- path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf
|
||||||
content: |
|
content: |
|
||||||
[Unit]
|
[Unit]
|
||||||
Requires=cloud-metadata.service
|
|
||||||
After=cloud-metadata.service
|
|
||||||
Wants=rpc-statd.service
|
Wants=rpc-statd.service
|
||||||
[Service]
|
[Service]
|
||||||
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
||||||
@ -55,7 +40,7 @@ write_files:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -93,11 +78,10 @@ bootcmd:
|
|||||||
runcmd:
|
runcmd:
|
||||||
- [systemctl, daemon-reload]
|
- [systemctl, daemon-reload]
|
||||||
- [systemctl, restart, NetworkManager]
|
- [systemctl, restart, NetworkManager]
|
||||||
- "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.10"
|
- "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.12"
|
||||||
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.12.2"
|
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.4"
|
||||||
- "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.13.0"
|
- "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.14.0"
|
||||||
- [systemctl, start, --no-block, etcd.service]
|
- [systemctl, start, --no-block, etcd.service]
|
||||||
- [systemctl, enable, cloud-metadata.service]
|
|
||||||
- [systemctl, start, --no-block, kubelet.service]
|
- [systemctl, start, --no-block, kubelet.service]
|
||||||
users:
|
users:
|
||||||
- default
|
- default
|
||||||
|
@ -60,9 +60,9 @@ data "template_file" "controller-cloudinit" {
|
|||||||
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
||||||
etcd_initial_cluster = "${join(",", data.template_file.etcds.*.rendered)}"
|
etcd_initial_cluster = "${join(",", data.template_file.etcds.*.rendered)}"
|
||||||
|
|
||||||
kubeconfig = "${indent(6, module.bootkube.kubeconfig)}"
|
kubeconfig = "${indent(6, module.bootkube.kubeconfig-kubelet)}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
cluster_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -1,3 +1,7 @@
|
|||||||
|
output "kubeconfig-admin" {
|
||||||
|
value = "${module.bootkube.kubeconfig-admin}"
|
||||||
|
}
|
||||||
|
|
||||||
# Outputs for Kubernetes Ingress
|
# Outputs for Kubernetes Ingress
|
||||||
|
|
||||||
output "ingress_dns_name" {
|
output "ingress_dns_name" {
|
||||||
@ -5,6 +9,11 @@ output "ingress_dns_name" {
|
|||||||
description = "DNS name of the network load balancer for distributing traffic to Ingress controllers"
|
description = "DNS name of the network load balancer for distributing traffic to Ingress controllers"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
output "ingress_zone_id" {
|
||||||
|
value = "${aws_lb.nlb.zone_id}"
|
||||||
|
description = "Route53 zone id of the network load balancer DNS name that can be used in Route53 alias records"
|
||||||
|
}
|
||||||
|
|
||||||
# Outputs for worker pools
|
# Outputs for worker pools
|
||||||
|
|
||||||
output "vpc_id" {
|
output "vpc_id" {
|
||||||
@ -23,7 +32,7 @@ output "worker_security_groups" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
output "kubeconfig" {
|
output "kubeconfig" {
|
||||||
value = "${module.bootkube.kubeconfig}"
|
value = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
}
|
}
|
||||||
|
|
||||||
# Outputs for custom load balancing
|
# Outputs for custom load balancing
|
||||||
|
@ -31,13 +31,13 @@ variable "worker_count" {
|
|||||||
|
|
||||||
variable "controller_type" {
|
variable "controller_type" {
|
||||||
type = "string"
|
type = "string"
|
||||||
default = "t2.small"
|
default = "t3.small"
|
||||||
description = "EC2 instance type for controllers"
|
description = "EC2 instance type for controllers"
|
||||||
}
|
}
|
||||||
|
|
||||||
variable "worker_type" {
|
variable "worker_type" {
|
||||||
type = "string"
|
type = "string"
|
||||||
default = "t2.small"
|
default = "t3.small"
|
||||||
description = "EC2 instance type for workers"
|
description = "EC2 instance type for workers"
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -116,3 +116,9 @@ variable "cluster_domain_suffix" {
|
|||||||
type = "string"
|
type = "string"
|
||||||
default = "cluster.local"
|
default = "cluster.local"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "enable_reporting" {
|
||||||
|
type = "string"
|
||||||
|
description = "Enable usage or analytics reporting to upstreams (Calico)"
|
||||||
|
default = "false"
|
||||||
|
}
|
||||||
|
@ -12,7 +12,7 @@ module "workers" {
|
|||||||
spot_price = "${var.worker_price}"
|
spot_price = "${var.worker_price}"
|
||||||
|
|
||||||
# configuration
|
# configuration
|
||||||
kubeconfig = "${module.bootkube.kubeconfig}"
|
kubeconfig = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
service_cidr = "${var.service_cidr}"
|
service_cidr = "${var.service_cidr}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
|
@ -1,23 +1,8 @@
|
|||||||
#cloud-config
|
#cloud-config
|
||||||
write_files:
|
write_files:
|
||||||
- path: /etc/systemd/system/cloud-metadata.service
|
|
||||||
content: |
|
|
||||||
[Unit]
|
|
||||||
Description=Cloud metadata agent
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
Environment=OUTPUT=/run/metadata/cloud
|
|
||||||
ExecStart=/usr/bin/mkdir -p /run/metadata
|
|
||||||
ExecStart=/usr/bin/bash -c 'echo "HOSTNAME_OVERRIDE=$(curl\
|
|
||||||
--url http://169.254.169.254/latest/meta-data/local-ipv4\
|
|
||||||
--retry 10)" > $${OUTPUT}'
|
|
||||||
[Install]
|
|
||||||
WantedBy=multi-user.target
|
|
||||||
- path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf
|
- path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf
|
||||||
content: |
|
content: |
|
||||||
[Unit]
|
[Unit]
|
||||||
Requires=cloud-metadata.service
|
|
||||||
After=cloud-metadata.service
|
|
||||||
Wants=rpc-statd.service
|
Wants=rpc-statd.service
|
||||||
[Service]
|
[Service]
|
||||||
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
||||||
@ -34,7 +19,7 @@ write_files:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -69,8 +54,7 @@ bootcmd:
|
|||||||
runcmd:
|
runcmd:
|
||||||
- [systemctl, daemon-reload]
|
- [systemctl, daemon-reload]
|
||||||
- [systemctl, restart, NetworkManager]
|
- [systemctl, restart, NetworkManager]
|
||||||
- [systemctl, enable, cloud-metadata.service]
|
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.4"
|
||||||
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.12.2"
|
|
||||||
- [systemctl, start, --no-block, kubelet.service]
|
- [systemctl, start, --no-block, kubelet.service]
|
||||||
users:
|
users:
|
||||||
- default
|
- default
|
||||||
|
@ -30,7 +30,7 @@ variable "count" {
|
|||||||
|
|
||||||
variable "instance_type" {
|
variable "instance_type" {
|
||||||
type = "string"
|
type = "string"
|
||||||
default = "t2.small"
|
default = "t3.small"
|
||||||
description = "EC2 instance type"
|
description = "EC2 instance type"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -72,7 +72,7 @@ data "template_file" "worker-cloudinit" {
|
|||||||
vars = {
|
vars = {
|
||||||
kubeconfig = "${indent(6, var.kubeconfig)}"
|
kubeconfig = "${indent(6, var.kubeconfig)}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
cluster_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -11,9 +11,10 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
|
|||||||
|
|
||||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||||
|
|
||||||
* Kubernetes v1.12.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
* Kubernetes v1.13.4 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||||
* Single or multi-master, workloads isolated on workers, [flannel](https://github.com/coreos/flannel) networking
|
* Single or multi-master, [flannel](https://github.com/coreos/flannel) networking
|
||||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled
|
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled
|
||||||
|
* Advanced features like [worker pools](https://typhoon.psdn.io/advanced/worker-pools/), [low-priority](https://typhoon.psdn.io/cl/azure/#low-priority) workers, and [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
|
||||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||||
|
|
||||||
## Docs
|
## Docs
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||||
module "bootkube" {
|
module "bootkube" {
|
||||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=f39f8294c465397e622c606174e6f412ee3ca0f8"
|
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=953521dbba49eb6a39204f30a3978730eac01e11"
|
||||||
|
|
||||||
cluster_name = "${var.cluster_name}"
|
cluster_name = "${var.cluster_name}"
|
||||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||||
@ -10,4 +10,5 @@ module "bootkube" {
|
|||||||
pod_cidr = "${var.pod_cidr}"
|
pod_cidr = "${var.pod_cidr}"
|
||||||
service_cidr = "${var.service_cidr}"
|
service_cidr = "${var.service_cidr}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
|
enable_reporting = "${var.enable_reporting}"
|
||||||
}
|
}
|
||||||
|
@ -7,7 +7,7 @@ systemd:
|
|||||||
- name: 40-etcd-cluster.conf
|
- name: 40-etcd-cluster.conf
|
||||||
contents: |
|
contents: |
|
||||||
[Service]
|
[Service]
|
||||||
Environment="ETCD_IMAGE_TAG=v3.3.10"
|
Environment="ETCD_IMAGE_TAG=v3.3.12"
|
||||||
Environment="ETCD_NAME=${etcd_name}"
|
Environment="ETCD_NAME=${etcd_name}"
|
||||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
||||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
||||||
@ -78,7 +78,7 @@ systemd:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -123,7 +123,7 @@ storage:
|
|||||||
contents:
|
contents:
|
||||||
inline: |
|
inline: |
|
||||||
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
||||||
KUBELET_IMAGE_TAG=v1.12.2
|
KUBELET_IMAGE_TAG=v1.13.4
|
||||||
- path: /etc/sysctl.d/max-user-watches.conf
|
- path: /etc/sysctl.d/max-user-watches.conf
|
||||||
filesystem: root
|
filesystem: root
|
||||||
contents:
|
contents:
|
||||||
@ -143,17 +143,14 @@ storage:
|
|||||||
set -e
|
set -e
|
||||||
# Move experimental manifests
|
# Move experimental manifests
|
||||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
|
||||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.13.0}"
|
|
||||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
|
||||||
exec /usr/bin/rkt run \
|
exec /usr/bin/rkt run \
|
||||||
--trust-keys-from-https \
|
--trust-keys-from-https \
|
||||||
--volume assets,kind=host,source=$${BOOTKUBE_ASSETS} \
|
--volume assets,kind=host,source=/opt/bootkube/assets \
|
||||||
--mount volume=assets,target=/assets \
|
--mount volume=assets,target=/assets \
|
||||||
--volume bootstrap,kind=host,source=/etc/kubernetes \
|
--volume bootstrap,kind=host,source=/etc/kubernetes \
|
||||||
--mount volume=bootstrap,target=/etc/kubernetes \
|
--mount volume=bootstrap,target=/etc/kubernetes \
|
||||||
$${RKT_OPTS} \
|
$${RKT_OPTS} \
|
||||||
$${BOOTKUBE_ACI}:$${BOOTKUBE_VERSION} \
|
quay.io/coreos/bootkube:v0.14.0 \
|
||||||
--net=host \
|
--net=host \
|
||||||
--dns=host \
|
--dns=host \
|
||||||
--exec=/bootkube -- start --asset-dir=/assets "$@"
|
--exec=/bootkube -- start --asset-dir=/assets "$@"
|
||||||
|
@ -124,7 +124,7 @@ resource "azurerm_public_ip" "controllers" {
|
|||||||
name = "${var.cluster_name}-controller-${count.index}"
|
name = "${var.cluster_name}-controller-${count.index}"
|
||||||
location = "${azurerm_resource_group.cluster.location}"
|
location = "${azurerm_resource_group.cluster.location}"
|
||||||
sku = "Standard"
|
sku = "Standard"
|
||||||
public_ip_address_allocation = "static"
|
allocation_method = "Static"
|
||||||
}
|
}
|
||||||
|
|
||||||
# Controller Ignition configs
|
# Controller Ignition configs
|
||||||
@ -149,9 +149,9 @@ data "template_file" "controller-configs" {
|
|||||||
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
||||||
etcd_initial_cluster = "${join(",", data.template_file.etcds.*.rendered)}"
|
etcd_initial_cluster = "${join(",", data.template_file.etcds.*.rendered)}"
|
||||||
|
|
||||||
kubeconfig = "${indent(10, module.bootkube.kubeconfig)}"
|
kubeconfig = "${indent(10, module.bootkube.kubeconfig-kubelet)}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
cluster_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -20,7 +20,7 @@ resource "azurerm_public_ip" "apiserver-ipv4" {
|
|||||||
name = "${var.cluster_name}-apiserver-ipv4"
|
name = "${var.cluster_name}-apiserver-ipv4"
|
||||||
location = "${var.region}"
|
location = "${var.region}"
|
||||||
sku = "Standard"
|
sku = "Standard"
|
||||||
public_ip_address_allocation = "static"
|
allocation_method = "Static"
|
||||||
}
|
}
|
||||||
|
|
||||||
# Static IPv4 address for the ingress frontend
|
# Static IPv4 address for the ingress frontend
|
||||||
@ -30,7 +30,7 @@ resource "azurerm_public_ip" "ingress-ipv4" {
|
|||||||
name = "${var.cluster_name}-ingress-ipv4"
|
name = "${var.cluster_name}-ingress-ipv4"
|
||||||
location = "${var.region}"
|
location = "${var.region}"
|
||||||
sku = "Standard"
|
sku = "Standard"
|
||||||
public_ip_address_allocation = "static"
|
allocation_method = "Static"
|
||||||
}
|
}
|
||||||
|
|
||||||
# Network Load Balancer for apiservers and ingress
|
# Network Load Balancer for apiservers and ingress
|
||||||
|
@ -1,3 +1,7 @@
|
|||||||
|
output "kubeconfig-admin" {
|
||||||
|
value = "${module.bootkube.kubeconfig-admin}"
|
||||||
|
}
|
||||||
|
|
||||||
# Outputs for Kubernetes Ingress
|
# Outputs for Kubernetes Ingress
|
||||||
|
|
||||||
output "ingress_static_ipv4" {
|
output "ingress_static_ipv4" {
|
||||||
@ -28,5 +32,5 @@ output "backend_address_pool_id" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
output "kubeconfig" {
|
output "kubeconfig" {
|
||||||
value = "${module.bootkube.kubeconfig}"
|
value = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
}
|
}
|
||||||
|
@ -5,7 +5,7 @@ terraform {
|
|||||||
}
|
}
|
||||||
|
|
||||||
provider "azurerm" {
|
provider "azurerm" {
|
||||||
version = "~> 1.17"
|
version = "~> 1.21"
|
||||||
}
|
}
|
||||||
|
|
||||||
provider "local" {
|
provider "local" {
|
||||||
|
@ -115,3 +115,9 @@ variable "cluster_domain_suffix" {
|
|||||||
type = "string"
|
type = "string"
|
||||||
default = "cluster.local"
|
default = "cluster.local"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "enable_reporting" {
|
||||||
|
type = "string"
|
||||||
|
description = "Enable usage or analytics reporting to upstreams (Calico)"
|
||||||
|
default = "false"
|
||||||
|
}
|
||||||
|
@ -15,7 +15,7 @@ module "workers" {
|
|||||||
priority = "${var.worker_priority}"
|
priority = "${var.worker_priority}"
|
||||||
|
|
||||||
# configuration
|
# configuration
|
||||||
kubeconfig = "${module.bootkube.kubeconfig}"
|
kubeconfig = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
service_cidr = "${var.service_cidr}"
|
service_cidr = "${var.service_cidr}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
|
@ -51,7 +51,7 @@ systemd:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -93,7 +93,7 @@ storage:
|
|||||||
contents:
|
contents:
|
||||||
inline: |
|
inline: |
|
||||||
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
||||||
KUBELET_IMAGE_TAG=v1.12.2
|
KUBELET_IMAGE_TAG=v1.13.4
|
||||||
- path: /etc/sysctl.d/max-user-watches.conf
|
- path: /etc/sysctl.d/max-user-watches.conf
|
||||||
filesystem: root
|
filesystem: root
|
||||||
contents:
|
contents:
|
||||||
@ -111,7 +111,7 @@ storage:
|
|||||||
--volume config,kind=host,source=/etc/kubernetes \
|
--volume config,kind=host,source=/etc/kubernetes \
|
||||||
--mount volume=config,target=/etc/kubernetes \
|
--mount volume=config,target=/etc/kubernetes \
|
||||||
--insecure-options=image \
|
--insecure-options=image \
|
||||||
docker://k8s.gcr.io/hyperkube:v1.12.2 \
|
docker://k8s.gcr.io/hyperkube:v1.13.4 \
|
||||||
--net=host \
|
--net=host \
|
||||||
--dns=host \
|
--dns=host \
|
||||||
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname | tr '[:upper:]' '[:lower:]')
|
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname | tr '[:upper:]' '[:lower:]')
|
||||||
|
@ -67,8 +67,9 @@ resource "azurerm_virtual_machine_scale_set" "workers" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
# lifecycle
|
# lifecycle
|
||||||
priority = "${var.priority}"
|
|
||||||
upgrade_policy_mode = "Manual"
|
upgrade_policy_mode = "Manual"
|
||||||
|
priority = "${var.priority}"
|
||||||
|
eviction_policy = "Delete"
|
||||||
}
|
}
|
||||||
|
|
||||||
# Scale up or down to maintain desired number, tolerating deallocations.
|
# Scale up or down to maintain desired number, tolerating deallocations.
|
||||||
@ -107,7 +108,7 @@ data "template_file" "worker-config" {
|
|||||||
vars = {
|
vars = {
|
||||||
kubeconfig = "${indent(10, var.kubeconfig)}"
|
kubeconfig = "${indent(10, var.kubeconfig)}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
cluster_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -11,9 +11,10 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
|
|||||||
|
|
||||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||||
|
|
||||||
* Kubernetes v1.12.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
* Kubernetes v1.13.4 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||||
|
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
|
||||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||||
|
|
||||||
## Docs
|
## Docs
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||||
module "bootkube" {
|
module "bootkube" {
|
||||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=f39f8294c465397e622c606174e6f412ee3ca0f8"
|
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=953521dbba49eb6a39204f30a3978730eac01e11"
|
||||||
|
|
||||||
cluster_name = "${var.cluster_name}"
|
cluster_name = "${var.cluster_name}"
|
||||||
api_servers = ["${var.k8s_domain_name}"]
|
api_servers = ["${var.k8s_domain_name}"]
|
||||||
@ -12,4 +12,5 @@ module "bootkube" {
|
|||||||
pod_cidr = "${var.pod_cidr}"
|
pod_cidr = "${var.pod_cidr}"
|
||||||
service_cidr = "${var.service_cidr}"
|
service_cidr = "${var.service_cidr}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
|
enable_reporting = "${var.enable_reporting}"
|
||||||
}
|
}
|
||||||
|
@ -7,7 +7,7 @@ systemd:
|
|||||||
- name: 40-etcd-cluster.conf
|
- name: 40-etcd-cluster.conf
|
||||||
contents: |
|
contents: |
|
||||||
[Service]
|
[Service]
|
||||||
Environment="ETCD_IMAGE_TAG=v3.3.10"
|
Environment="ETCD_IMAGE_TAG=v3.3.12"
|
||||||
Environment="ETCD_NAME=${etcd_name}"
|
Environment="ETCD_NAME=${etcd_name}"
|
||||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
|
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
|
||||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
|
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
|
||||||
@ -70,6 +70,10 @@ systemd:
|
|||||||
--mount volume=opt-cni-bin,target=/opt/cni/bin \
|
--mount volume=opt-cni-bin,target=/opt/cni/bin \
|
||||||
--volume var-log,kind=host,source=/var/log \
|
--volume var-log,kind=host,source=/var/log \
|
||||||
--mount volume=var-log,target=/var/log \
|
--mount volume=var-log,target=/var/log \
|
||||||
|
--volume iscsiconf,kind=host,source=/etc/iscsi/ \
|
||||||
|
--mount volume=iscsiconf,target=/etc/iscsi/ \
|
||||||
|
--volume iscsiadm,kind=host,source=/usr/sbin/iscsiadm \
|
||||||
|
--mount volume=iscsiadm,target=/sbin/iscsiadm \
|
||||||
--insecure-options=image"
|
--insecure-options=image"
|
||||||
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
||||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
|
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
|
||||||
@ -86,7 +90,7 @@ systemd:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -124,7 +128,7 @@ storage:
|
|||||||
contents:
|
contents:
|
||||||
inline: |
|
inline: |
|
||||||
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
||||||
KUBELET_IMAGE_TAG=v1.12.2
|
KUBELET_IMAGE_TAG=v1.13.4
|
||||||
- path: /etc/hostname
|
- path: /etc/hostname
|
||||||
filesystem: root
|
filesystem: root
|
||||||
mode: 0644
|
mode: 0644
|
||||||
@ -150,17 +154,14 @@ storage:
|
|||||||
set -e
|
set -e
|
||||||
# Move experimental manifests
|
# Move experimental manifests
|
||||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
|
||||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.13.0}"
|
|
||||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
|
||||||
exec /usr/bin/rkt run \
|
exec /usr/bin/rkt run \
|
||||||
--trust-keys-from-https \
|
--trust-keys-from-https \
|
||||||
--volume assets,kind=host,source=$BOOTKUBE_ASSETS \
|
--volume assets,kind=host,source=/opt/bootkube/assets \
|
||||||
--mount volume=assets,target=/assets \
|
--mount volume=assets,target=/assets \
|
||||||
--volume bootstrap,kind=host,source=/etc/kubernetes \
|
--volume bootstrap,kind=host,source=/etc/kubernetes \
|
||||||
--mount volume=bootstrap,target=/etc/kubernetes \
|
--mount volume=bootstrap,target=/etc/kubernetes \
|
||||||
$$RKT_OPTS \
|
$$RKT_OPTS \
|
||||||
$${BOOTKUBE_ACI}:$${BOOTKUBE_VERSION} \
|
quay.io/coreos/bootkube:v0.14.0 \
|
||||||
--net=host \
|
--net=host \
|
||||||
--dns=host \
|
--dns=host \
|
||||||
--exec=/bootkube -- start --asset-dir=/assets "$@"
|
--exec=/bootkube -- start --asset-dir=/assets "$@"
|
||||||
|
@ -45,6 +45,10 @@ systemd:
|
|||||||
--mount volume=opt-cni-bin,target=/opt/cni/bin \
|
--mount volume=opt-cni-bin,target=/opt/cni/bin \
|
||||||
--volume var-log,kind=host,source=/var/log \
|
--volume var-log,kind=host,source=/var/log \
|
||||||
--mount volume=var-log,target=/var/log \
|
--mount volume=var-log,target=/var/log \
|
||||||
|
--volume iscsiconf,kind=host,source=/etc/iscsi/ \
|
||||||
|
--mount volume=iscsiconf,target=/etc/iscsi/ \
|
||||||
|
--volume iscsiadm,kind=host,source=/usr/sbin/iscsiadm \
|
||||||
|
--mount volume=iscsiadm,target=/sbin/iscsiadm \
|
||||||
--insecure-options=image"
|
--insecure-options=image"
|
||||||
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
||||||
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
|
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
|
||||||
@ -59,7 +63,7 @@ systemd:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -85,7 +89,7 @@ storage:
|
|||||||
contents:
|
contents:
|
||||||
inline: |
|
inline: |
|
||||||
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
||||||
KUBELET_IMAGE_TAG=v1.12.2
|
KUBELET_IMAGE_TAG=v1.13.4
|
||||||
- path: /etc/hostname
|
- path: /etc/hostname
|
||||||
filesystem: root
|
filesystem: root
|
||||||
mode: 0644
|
mode: 0644
|
||||||
|
@ -1,3 +1,3 @@
|
|||||||
output "kubeconfig" {
|
output "kubeconfig-admin" {
|
||||||
value = "${module.bootkube.kubeconfig}"
|
value = "${module.bootkube.kubeconfig-admin}"
|
||||||
}
|
}
|
||||||
|
@ -163,7 +163,7 @@ data "template_file" "controller-configs" {
|
|||||||
domain_name = "${element(var.controller_domains, count.index)}"
|
domain_name = "${element(var.controller_domains, count.index)}"
|
||||||
etcd_name = "${element(var.controller_names, count.index)}"
|
etcd_name = "${element(var.controller_names, count.index)}"
|
||||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
|
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
|
||||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
cluster_dns_service_ip = "${module.bootkube.cluster_dns_service_ip}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
}
|
}
|
||||||
@ -192,7 +192,7 @@ data "template_file" "worker-configs" {
|
|||||||
|
|
||||||
vars {
|
vars {
|
||||||
domain_name = "${element(var.worker_domains, count.index)}"
|
domain_name = "${element(var.worker_domains, count.index)}"
|
||||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
cluster_dns_service_ip = "${module.bootkube.cluster_dns_service_ip}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
}
|
}
|
||||||
|
@ -18,7 +18,7 @@ resource "null_resource" "copy-controller-secrets" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
provisioner "file" {
|
provisioner "file" {
|
||||||
content = "${module.bootkube.kubeconfig}"
|
content = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
destination = "$HOME/kubeconfig"
|
destination = "$HOME/kubeconfig"
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -94,7 +94,7 @@ resource "null_resource" "copy-worker-secrets" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
provisioner "file" {
|
provisioner "file" {
|
||||||
content = "${module.bootkube.kubeconfig}"
|
content = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
destination = "$HOME/kubeconfig"
|
destination = "$HOME/kubeconfig"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -141,3 +141,9 @@ variable "kernel_args" {
|
|||||||
type = "list"
|
type = "list"
|
||||||
default = []
|
default = []
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "enable_reporting" {
|
||||||
|
type = "string"
|
||||||
|
description = "Enable usage or analytics reporting to upstreams (Calico)"
|
||||||
|
default = "false"
|
||||||
|
}
|
||||||
|
@ -11,8 +11,8 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
|
|||||||
|
|
||||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||||
|
|
||||||
* Kubernetes v1.12.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
* Kubernetes v1.13.4 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
* Single or multi-master, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
||||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||||
|
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||||
module "bootkube" {
|
module "bootkube" {
|
||||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=f39f8294c465397e622c606174e6f412ee3ca0f8"
|
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=953521dbba49eb6a39204f30a3978730eac01e11"
|
||||||
|
|
||||||
cluster_name = "${var.cluster_name}"
|
cluster_name = "${var.cluster_name}"
|
||||||
api_servers = ["${var.k8s_domain_name}"]
|
api_servers = ["${var.k8s_domain_name}"]
|
||||||
@ -11,6 +11,7 @@ module "bootkube" {
|
|||||||
pod_cidr = "${var.pod_cidr}"
|
pod_cidr = "${var.pod_cidr}"
|
||||||
service_cidr = "${var.service_cidr}"
|
service_cidr = "${var.service_cidr}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
|
enable_reporting = "${var.enable_reporting}"
|
||||||
|
|
||||||
# Fedora
|
# Fedora
|
||||||
trusted_certs_dir = "/etc/pki/tls/certs"
|
trusted_certs_dir = "/etc/pki/tls/certs"
|
||||||
|
@ -40,7 +40,7 @@ write_files:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -84,9 +84,9 @@ runcmd:
|
|||||||
- [systemctl, daemon-reload]
|
- [systemctl, daemon-reload]
|
||||||
- [systemctl, restart, NetworkManager]
|
- [systemctl, restart, NetworkManager]
|
||||||
- [hostnamectl, set-hostname, ${domain_name}]
|
- [hostnamectl, set-hostname, ${domain_name}]
|
||||||
- "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.10"
|
- "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.12"
|
||||||
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.12.2"
|
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.4"
|
||||||
- "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.13.0"
|
- "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.14.0"
|
||||||
- [systemctl, start, --no-block, etcd.service]
|
- [systemctl, start, --no-block, etcd.service]
|
||||||
- [systemctl, enable, kubelet.path]
|
- [systemctl, enable, kubelet.path]
|
||||||
- [systemctl, start, --no-block, kubelet.path]
|
- [systemctl, start, --no-block, kubelet.path]
|
||||||
|
@ -19,7 +19,7 @@ write_files:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -60,7 +60,7 @@ runcmd:
|
|||||||
- [systemctl, daemon-reload]
|
- [systemctl, daemon-reload]
|
||||||
- [systemctl, restart, NetworkManager]
|
- [systemctl, restart, NetworkManager]
|
||||||
- [hostnamectl, set-hostname, ${domain_name}]
|
- [hostnamectl, set-hostname, ${domain_name}]
|
||||||
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.12.2"
|
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.4"
|
||||||
- [systemctl, enable, kubelet.path]
|
- [systemctl, enable, kubelet.path]
|
||||||
- [systemctl, start, --no-block, kubelet.path]
|
- [systemctl, start, --no-block, kubelet.path]
|
||||||
users:
|
users:
|
||||||
|
@ -1,3 +1,3 @@
|
|||||||
output "kubeconfig" {
|
output "kubeconfig-admin" {
|
||||||
value = "${module.bootkube.kubeconfig}"
|
value = "${module.bootkube.kubeconfig-admin}"
|
||||||
}
|
}
|
||||||
|
@ -58,7 +58,7 @@ data "template_file" "controller-configs" {
|
|||||||
domain_name = "${element(var.controller_domains, count.index)}"
|
domain_name = "${element(var.controller_domains, count.index)}"
|
||||||
etcd_name = "${element(var.controller_names, count.index)}"
|
etcd_name = "${element(var.controller_names, count.index)}"
|
||||||
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
|
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
|
||||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
cluster_dns_service_ip = "${module.bootkube.cluster_dns_service_ip}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
}
|
}
|
||||||
@ -80,7 +80,7 @@ data "template_file" "worker-configs" {
|
|||||||
|
|
||||||
vars {
|
vars {
|
||||||
domain_name = "${element(var.worker_domains, count.index)}"
|
domain_name = "${element(var.worker_domains, count.index)}"
|
||||||
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
|
cluster_dns_service_ip = "${module.bootkube.cluster_dns_service_ip}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
}
|
}
|
||||||
|
@ -18,7 +18,7 @@ resource "null_resource" "copy-controller-secrets" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
provisioner "file" {
|
provisioner "file" {
|
||||||
content = "${module.bootkube.kubeconfig}"
|
content = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
destination = "$HOME/kubeconfig"
|
destination = "$HOME/kubeconfig"
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -92,7 +92,7 @@ resource "null_resource" "copy-worker-secrets" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
provisioner "file" {
|
provisioner "file" {
|
||||||
content = "${module.bootkube.kubeconfig}"
|
content = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
destination = "$HOME/kubeconfig"
|
destination = "$HOME/kubeconfig"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -110,3 +110,9 @@ variable "kernel_args" {
|
|||||||
type = "list"
|
type = "list"
|
||||||
default = []
|
default = []
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "enable_reporting" {
|
||||||
|
type = "string"
|
||||||
|
description = "Enable usage or analytics reporting to upstreams (Calico)"
|
||||||
|
default = "false"
|
||||||
|
}
|
||||||
|
@ -11,10 +11,11 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
|
|||||||
|
|
||||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||||
|
|
||||||
* Kubernetes v1.12.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
* Kubernetes v1.13.4 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||||
* Single or multi-master, workloads isolated on workers, [flannel](https://github.com/coreos/flannel) networking
|
* Single or multi-master, [flannel](https://github.com/coreos/flannel) networking
|
||||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled
|
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled
|
||||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
* Advanced features like [snippets](https://typhoon.psdn.io/advanced/customization/#container-linux) customization
|
||||||
|
* Ready for Ingress, Prometheus, Grafana, CSI, and other [addons](https://typhoon.psdn.io/addons/overview/)
|
||||||
|
|
||||||
## Docs
|
## Docs
|
||||||
|
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||||
module "bootkube" {
|
module "bootkube" {
|
||||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=f39f8294c465397e622c606174e6f412ee3ca0f8"
|
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=953521dbba49eb6a39204f30a3978730eac01e11"
|
||||||
|
|
||||||
cluster_name = "${var.cluster_name}"
|
cluster_name = "${var.cluster_name}"
|
||||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||||
@ -11,4 +11,5 @@ module "bootkube" {
|
|||||||
pod_cidr = "${var.pod_cidr}"
|
pod_cidr = "${var.pod_cidr}"
|
||||||
service_cidr = "${var.service_cidr}"
|
service_cidr = "${var.service_cidr}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
|
enable_reporting = "${var.enable_reporting}"
|
||||||
}
|
}
|
||||||
|
@ -7,7 +7,7 @@ systemd:
|
|||||||
- name: 40-etcd-cluster.conf
|
- name: 40-etcd-cluster.conf
|
||||||
contents: |
|
contents: |
|
||||||
[Service]
|
[Service]
|
||||||
Environment="ETCD_IMAGE_TAG=v3.3.10"
|
Environment="ETCD_IMAGE_TAG=v3.3.12"
|
||||||
Environment="ETCD_NAME=${etcd_name}"
|
Environment="ETCD_NAME=${etcd_name}"
|
||||||
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
|
||||||
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
|
||||||
@ -56,12 +56,9 @@ systemd:
|
|||||||
contents: |
|
contents: |
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=Kubelet via Hyperkube
|
Description=Kubelet via Hyperkube
|
||||||
Requires=coreos-metadata.service
|
|
||||||
After=coreos-metadata.service
|
|
||||||
Wants=rpc-statd.service
|
Wants=rpc-statd.service
|
||||||
[Service]
|
[Service]
|
||||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||||
EnvironmentFile=/run/metadata/coreos
|
|
||||||
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
|
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
|
||||||
--volume=resolv,kind=host,source=/etc/resolv.conf \
|
--volume=resolv,kind=host,source=/etc/resolv.conf \
|
||||||
--mount volume=resolv,target=/etc/resolv.conf \
|
--mount volume=resolv,target=/etc/resolv.conf \
|
||||||
@ -89,11 +86,10 @@ systemd:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
--hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
|
|
||||||
--kubeconfig=/etc/kubernetes/kubeconfig \
|
--kubeconfig=/etc/kubernetes/kubeconfig \
|
||||||
--lock-file=/var/run/lock/kubelet.lock \
|
--lock-file=/var/run/lock/kubelet.lock \
|
||||||
--network-plugin=cni \
|
--network-plugin=cni \
|
||||||
@ -129,7 +125,7 @@ storage:
|
|||||||
contents:
|
contents:
|
||||||
inline: |
|
inline: |
|
||||||
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
||||||
KUBELET_IMAGE_TAG=v1.12.2
|
KUBELET_IMAGE_TAG=v1.13.4
|
||||||
- path: /etc/sysctl.d/max-user-watches.conf
|
- path: /etc/sysctl.d/max-user-watches.conf
|
||||||
filesystem: root
|
filesystem: root
|
||||||
contents:
|
contents:
|
||||||
@ -149,17 +145,14 @@ storage:
|
|||||||
set -e
|
set -e
|
||||||
# Move experimental manifests
|
# Move experimental manifests
|
||||||
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
|
||||||
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
|
|
||||||
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.13.0}"
|
|
||||||
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
|
|
||||||
exec /usr/bin/rkt run \
|
exec /usr/bin/rkt run \
|
||||||
--trust-keys-from-https \
|
--trust-keys-from-https \
|
||||||
--volume assets,kind=host,source=$${BOOTKUBE_ASSETS} \
|
--volume assets,kind=host,source=/opt/bootkube/assets \
|
||||||
--mount volume=assets,target=/assets \
|
--mount volume=assets,target=/assets \
|
||||||
--volume bootstrap,kind=host,source=/etc/kubernetes \
|
--volume bootstrap,kind=host,source=/etc/kubernetes \
|
||||||
--mount volume=bootstrap,target=/etc/kubernetes \
|
--mount volume=bootstrap,target=/etc/kubernetes \
|
||||||
$${RKT_OPTS} \
|
$${RKT_OPTS} \
|
||||||
$${BOOTKUBE_ACI}:$${BOOTKUBE_VERSION} \
|
quay.io/coreos/bootkube:v0.14.0 \
|
||||||
--net=host \
|
--net=host \
|
||||||
--dns=host \
|
--dns=host \
|
||||||
--exec=/bootkube -- start --asset-dir=/assets "$@"
|
--exec=/bootkube -- start --asset-dir=/assets "$@"
|
||||||
|
@ -31,12 +31,9 @@ systemd:
|
|||||||
contents: |
|
contents: |
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=Kubelet via Hyperkube
|
Description=Kubelet via Hyperkube
|
||||||
Requires=coreos-metadata.service
|
|
||||||
After=coreos-metadata.service
|
|
||||||
Wants=rpc-statd.service
|
Wants=rpc-statd.service
|
||||||
[Service]
|
[Service]
|
||||||
EnvironmentFile=/etc/kubernetes/kubelet.env
|
EnvironmentFile=/etc/kubernetes/kubelet.env
|
||||||
EnvironmentFile=/run/metadata/coreos
|
|
||||||
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
|
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
|
||||||
--volume=resolv,kind=host,source=/etc/resolv.conf \
|
--volume=resolv,kind=host,source=/etc/resolv.conf \
|
||||||
--mount volume=resolv,target=/etc/resolv.conf \
|
--mount volume=resolv,target=/etc/resolv.conf \
|
||||||
@ -62,11 +59,10 @@ systemd:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
--hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
|
|
||||||
--kubeconfig=/etc/kubernetes/kubeconfig \
|
--kubeconfig=/etc/kubernetes/kubeconfig \
|
||||||
--lock-file=/var/run/lock/kubelet.lock \
|
--lock-file=/var/run/lock/kubelet.lock \
|
||||||
--network-plugin=cni \
|
--network-plugin=cni \
|
||||||
@ -99,7 +95,7 @@ storage:
|
|||||||
contents:
|
contents:
|
||||||
inline: |
|
inline: |
|
||||||
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube
|
||||||
KUBELET_IMAGE_TAG=v1.12.2
|
KUBELET_IMAGE_TAG=v1.13.4
|
||||||
- path: /etc/sysctl.d/max-user-watches.conf
|
- path: /etc/sysctl.d/max-user-watches.conf
|
||||||
filesystem: root
|
filesystem: root
|
||||||
contents:
|
contents:
|
||||||
@ -117,7 +113,7 @@ storage:
|
|||||||
--volume config,kind=host,source=/etc/kubernetes \
|
--volume config,kind=host,source=/etc/kubernetes \
|
||||||
--mount volume=config,target=/etc/kubernetes \
|
--mount volume=config,target=/etc/kubernetes \
|
||||||
--insecure-options=image \
|
--insecure-options=image \
|
||||||
docker://k8s.gcr.io/hyperkube:v1.12.2 \
|
docker://k8s.gcr.io/hyperkube:v1.13.4 \
|
||||||
--net=host \
|
--net=host \
|
||||||
--dns=host \
|
--dns=host \
|
||||||
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
|
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
|
||||||
|
@ -84,7 +84,7 @@ data "template_file" "controller-configs" {
|
|||||||
|
|
||||||
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
|
||||||
etcd_initial_cluster = "${join(",", data.template_file.etcds.*.rendered)}"
|
etcd_initial_cluster = "${join(",", data.template_file.etcds.*.rendered)}"
|
||||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
cluster_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -1,3 +1,7 @@
|
|||||||
|
output "kubeconfig-admin" {
|
||||||
|
value = "${module.bootkube.kubeconfig-admin}"
|
||||||
|
}
|
||||||
|
|
||||||
output "controllers_dns" {
|
output "controllers_dns" {
|
||||||
value = "${digitalocean_record.controllers.0.fqdn}"
|
value = "${digitalocean_record.controllers.0.fqdn}"
|
||||||
}
|
}
|
||||||
|
@ -10,7 +10,7 @@ resource "null_resource" "copy-controller-secrets" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
provisioner "file" {
|
provisioner "file" {
|
||||||
content = "${module.bootkube.kubeconfig}"
|
content = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
destination = "$HOME/kubeconfig"
|
destination = "$HOME/kubeconfig"
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -78,7 +78,7 @@ resource "null_resource" "copy-worker-secrets" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
provisioner "file" {
|
provisioner "file" {
|
||||||
content = "${module.bootkube.kubeconfig}"
|
content = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
destination = "$HOME/kubeconfig"
|
destination = "$HOME/kubeconfig"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -92,3 +92,9 @@ variable "cluster_domain_suffix" {
|
|||||||
type = "string"
|
type = "string"
|
||||||
default = "cluster.local"
|
default = "cluster.local"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "enable_reporting" {
|
||||||
|
type = "string"
|
||||||
|
description = "Enable usage or analytics reporting to upstreams (Calico)"
|
||||||
|
default = "false"
|
||||||
|
}
|
||||||
|
@ -66,7 +66,7 @@ data "template_file" "worker-config" {
|
|||||||
template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
|
template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
|
||||||
|
|
||||||
vars = {
|
vars = {
|
||||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
cluster_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -11,9 +11,9 @@ Typhoon distributes upstream Kubernetes, architectural conventions, and cluster
|
|||||||
|
|
||||||
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
|
||||||
|
|
||||||
* Kubernetes v1.12.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
* Kubernetes v1.13.4 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
|
||||||
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
|
* Single or multi-master, [flannel](https://github.com/coreos/flannel) networking
|
||||||
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled
|
||||||
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
* Ready for Ingress, Prometheus, Grafana, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
|
||||||
|
|
||||||
## Docs
|
## Docs
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
# Self-hosted Kubernetes assets (kubeconfig, manifests)
|
||||||
module "bootkube" {
|
module "bootkube" {
|
||||||
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=f39f8294c465397e622c606174e6f412ee3ca0f8"
|
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=953521dbba49eb6a39204f30a3978730eac01e11"
|
||||||
|
|
||||||
cluster_name = "${var.cluster_name}"
|
cluster_name = "${var.cluster_name}"
|
||||||
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
|
||||||
@ -11,6 +11,7 @@ module "bootkube" {
|
|||||||
pod_cidr = "${var.pod_cidr}"
|
pod_cidr = "${var.pod_cidr}"
|
||||||
service_cidr = "${var.service_cidr}"
|
service_cidr = "${var.service_cidr}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
|
enable_reporting = "${var.enable_reporting}"
|
||||||
|
|
||||||
# Fedora
|
# Fedora
|
||||||
trusted_certs_dir = "/etc/pki/tls/certs"
|
trusted_certs_dir = "/etc/pki/tls/certs"
|
||||||
|
@ -19,24 +19,9 @@ write_files:
|
|||||||
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
|
ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt
|
||||||
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
|
ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key
|
||||||
ETCD_PEER_CLIENT_CERT_AUTH=true
|
ETCD_PEER_CLIENT_CERT_AUTH=true
|
||||||
- path: /etc/systemd/system/cloud-metadata.service
|
|
||||||
content: |
|
|
||||||
[Unit]
|
|
||||||
Description=Cloud metadata agent
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
Environment=OUTPUT=/run/metadata/cloud
|
|
||||||
ExecStart=/usr/bin/mkdir -p /run/metadata
|
|
||||||
ExecStart=/usr/bin/bash -c 'echo "HOSTNAME_OVERRIDE=$(curl\
|
|
||||||
--url http://169.254.169.254/metadata/v1/interfaces/private/0/ipv4/address\
|
|
||||||
--retry 10)" > $${OUTPUT}'
|
|
||||||
[Install]
|
|
||||||
WantedBy=multi-user.target
|
|
||||||
- path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf
|
- path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf
|
||||||
content: |
|
content: |
|
||||||
[Unit]
|
[Unit]
|
||||||
Requires=cloud-metadata.service
|
|
||||||
After=cloud-metadata.service
|
|
||||||
Wants=rpc-statd.service
|
Wants=rpc-statd.service
|
||||||
[Service]
|
[Service]
|
||||||
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
||||||
@ -55,7 +40,7 @@ write_files:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -90,11 +75,10 @@ bootcmd:
|
|||||||
- [modprobe, ip_vs]
|
- [modprobe, ip_vs]
|
||||||
runcmd:
|
runcmd:
|
||||||
- [systemctl, daemon-reload]
|
- [systemctl, daemon-reload]
|
||||||
- "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.10"
|
- "atomic install --system --name=etcd quay.io/poseidon/etcd:v3.3.12"
|
||||||
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.12.2"
|
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.4"
|
||||||
- "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.13.0"
|
- "atomic install --system --name=bootkube quay.io/poseidon/bootkube:v0.14.0"
|
||||||
- [systemctl, start, --no-block, etcd.service]
|
- [systemctl, start, --no-block, etcd.service]
|
||||||
- [systemctl, enable, cloud-metadata.service]
|
|
||||||
- [systemctl, enable, kubelet.path]
|
- [systemctl, enable, kubelet.path]
|
||||||
- [systemctl, start, --no-block, kubelet.path]
|
- [systemctl, start, --no-block, kubelet.path]
|
||||||
users:
|
users:
|
||||||
|
@ -1,23 +1,8 @@
|
|||||||
#cloud-config
|
#cloud-config
|
||||||
write_files:
|
write_files:
|
||||||
- path: /etc/systemd/system/cloud-metadata.service
|
|
||||||
content: |
|
|
||||||
[Unit]
|
|
||||||
Description=Cloud metadata agent
|
|
||||||
[Service]
|
|
||||||
Type=oneshot
|
|
||||||
Environment=OUTPUT=/run/metadata/cloud
|
|
||||||
ExecStart=/usr/bin/mkdir -p /run/metadata
|
|
||||||
ExecStart=/usr/bin/bash -c 'echo "HOSTNAME_OVERRIDE=$(curl\
|
|
||||||
--url http://169.254.169.254/metadata/v1/interfaces/private/0/ipv4/address\
|
|
||||||
--retry 10)" > $${OUTPUT}'
|
|
||||||
[Install]
|
|
||||||
WantedBy=multi-user.target
|
|
||||||
- path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf
|
- path: /etc/systemd/system/kubelet.service.d/10-typhoon.conf
|
||||||
content: |
|
content: |
|
||||||
[Unit]
|
[Unit]
|
||||||
Requires=cloud-metadata.service
|
|
||||||
After=cloud-metadata.service
|
|
||||||
Wants=rpc-statd.service
|
Wants=rpc-statd.service
|
||||||
[Service]
|
[Service]
|
||||||
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
ExecStartPre=/bin/mkdir -p /opt/cni/bin
|
||||||
@ -34,7 +19,7 @@ write_files:
|
|||||||
--authentication-token-webhook \
|
--authentication-token-webhook \
|
||||||
--authorization-mode=Webhook \
|
--authorization-mode=Webhook \
|
||||||
--client-ca-file=/etc/kubernetes/ca.crt \
|
--client-ca-file=/etc/kubernetes/ca.crt \
|
||||||
--cluster_dns=${k8s_dns_service_ip} \
|
--cluster_dns=${cluster_dns_service_ip} \
|
||||||
--cluster_domain=${cluster_domain_suffix} \
|
--cluster_domain=${cluster_domain_suffix} \
|
||||||
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
--cni-conf-dir=/etc/kubernetes/cni/net.d \
|
||||||
--exit-on-lock-contention \
|
--exit-on-lock-contention \
|
||||||
@ -66,8 +51,7 @@ bootcmd:
|
|||||||
- [modprobe, ip_vs]
|
- [modprobe, ip_vs]
|
||||||
runcmd:
|
runcmd:
|
||||||
- [systemctl, daemon-reload]
|
- [systemctl, daemon-reload]
|
||||||
- [systemctl, enable, cloud-metadata.service]
|
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.13.4"
|
||||||
- "atomic install --system --name=kubelet quay.io/poseidon/kubelet:v1.12.2"
|
|
||||||
- [systemctl, enable, kubelet.path]
|
- [systemctl, enable, kubelet.path]
|
||||||
- [systemctl, start, --no-block, kubelet.path]
|
- [systemctl, start, --no-block, kubelet.path]
|
||||||
users:
|
users:
|
||||||
|
@ -78,7 +78,7 @@ data "template_file" "controller-cloudinit" {
|
|||||||
etcd_initial_cluster = "${join(",", data.template_file.etcds.*.rendered)}"
|
etcd_initial_cluster = "${join(",", data.template_file.etcds.*.rendered)}"
|
||||||
|
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
cluster_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -1,3 +1,7 @@
|
|||||||
|
output "kubeconfig-admin" {
|
||||||
|
value = "${module.bootkube.kubeconfig-admin}"
|
||||||
|
}
|
||||||
|
|
||||||
output "controllers_dns" {
|
output "controllers_dns" {
|
||||||
value = "${digitalocean_record.controllers.0.fqdn}"
|
value = "${digitalocean_record.controllers.0.fqdn}"
|
||||||
}
|
}
|
||||||
|
@ -10,7 +10,7 @@ resource "null_resource" "copy-controller-secrets" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
provisioner "file" {
|
provisioner "file" {
|
||||||
content = "${module.bootkube.kubeconfig}"
|
content = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
destination = "$HOME/kubeconfig"
|
destination = "$HOME/kubeconfig"
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -76,7 +76,7 @@ resource "null_resource" "copy-worker-secrets" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
provisioner "file" {
|
provisioner "file" {
|
||||||
content = "${module.bootkube.kubeconfig}"
|
content = "${module.bootkube.kubeconfig-kubelet}"
|
||||||
destination = "$HOME/kubeconfig"
|
destination = "$HOME/kubeconfig"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -85,3 +85,9 @@ variable "cluster_domain_suffix" {
|
|||||||
type = "string"
|
type = "string"
|
||||||
default = "cluster.local"
|
default = "cluster.local"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "enable_reporting" {
|
||||||
|
type = "string"
|
||||||
|
description = "Enable usage or analytics reporting to upstreams (Calico)"
|
||||||
|
default = "false"
|
||||||
|
}
|
||||||
|
@ -60,7 +60,7 @@ data "template_file" "worker-cloudinit" {
|
|||||||
|
|
||||||
vars = {
|
vars = {
|
||||||
ssh_authorized_key = "${var.ssh_authorized_key}"
|
ssh_authorized_key = "${var.ssh_authorized_key}"
|
||||||
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
cluster_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
|
||||||
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
cluster_domain_suffix = "${var.cluster_domain_suffix}"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -14,7 +14,8 @@ kubectl port-forward grafana-POD-ID 8080 -n monitoring
|
|||||||
|
|
||||||
Visit [127.0.0.1:8080](http://127.0.0.1:8080) to view the bundled dashboards.
|
Visit [127.0.0.1:8080](http://127.0.0.1:8080) to view the bundled dashboards.
|
||||||
|
|
||||||

|

|
||||||

|

|
||||||

|

|
||||||
|

|
||||||
|
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Heapster
|
# Heapster
|
||||||
|
|
||||||
[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command and Kubernetes dashboard graphs.
|
[Heapster](https://kubernetes.io/docs/user-guide/monitoring/) collects data from apiservers and kubelets and exposes it through a REST API. This API powers the `kubectl top` command.
|
||||||
|
|
||||||
## Create
|
## Create
|
||||||
|
|
||||||
|
@ -16,7 +16,7 @@ Create a cluster following the AWS [tutorial](../cl/aws.md#cluster). Define a wo
|
|||||||
|
|
||||||
```tf
|
```tf
|
||||||
module "tempest-worker-pool" {
|
module "tempest-worker-pool" {
|
||||||
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes/workers?ref=v1.12.2"
|
source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes/workers?ref=v1.13.4"
|
||||||
|
|
||||||
providers = {
|
providers = {
|
||||||
aws = "aws.default"
|
aws = "aws.default"
|
||||||
@ -67,7 +67,7 @@ The AWS internal `workers` module supports a number of [variables](https://githu
|
|||||||
| Name | Description | Default | Example |
|
| Name | Description | Default | Example |
|
||||||
|:-----|:------------|:--------|:--------|
|
|:-----|:------------|:--------|:--------|
|
||||||
| count | Number of instances | 1 | 3 |
|
| count | Number of instances | 1 | 3 |
|
||||||
| instance_type | EC2 instance type | "t2.small" | "t2.medium" |
|
| instance_type | EC2 instance type | "t3.small" | "t3.medium" |
|
||||||
| os_image | AMI channel for a Container Linux derivative | coreos-stable | coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha |
|
| os_image | AMI channel for a Container Linux derivative | coreos-stable | coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, flatcar-alpha |
|
||||||
| disk_size | Size of the disk in GB | 40 | 100 |
|
| disk_size | Size of the disk in GB | 40 | 100 |
|
||||||
| spot_price | Spot price in USD for workers. Leave as default empty string for regular on-demand instances | "" | "0.10" |
|
| spot_price | Spot price in USD for workers. Leave as default empty string for regular on-demand instances | "" | "0.10" |
|
||||||
@ -82,7 +82,7 @@ Create a cluster following the Azure [tutorial](../cl/azure.md#cluster). Define
|
|||||||
|
|
||||||
```tf
|
```tf
|
||||||
module "ramius-worker-pool" {
|
module "ramius-worker-pool" {
|
||||||
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes/workers?ref=v1.12.2"
|
source = "git::https://github.com/poseidon/typhoon//azure/container-linux/kubernetes/workers?ref=v1.13.4"
|
||||||
|
|
||||||
providers = {
|
providers = {
|
||||||
azurerm = "azurerm.default"
|
azurerm = "azurerm.default"
|
||||||
@ -152,7 +152,7 @@ Create a cluster following the Google Cloud [tutorial](../cl/google-cloud.md#clu
|
|||||||
|
|
||||||
```tf
|
```tf
|
||||||
module "yavin-worker-pool" {
|
module "yavin-worker-pool" {
|
||||||
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes/workers?ref=v1.12.2"
|
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes/workers?ref=v1.13.4"
|
||||||
|
|
||||||
providers = {
|
providers = {
|
||||||
google = "google.default"
|
google = "google.default"
|
||||||
@ -187,11 +187,11 @@ Verify a managed instance group of workers joins the cluster within a few minute
|
|||||||
```
|
```
|
||||||
$ kubectl get nodes
|
$ kubectl get nodes
|
||||||
NAME STATUS AGE VERSION
|
NAME STATUS AGE VERSION
|
||||||
yavin-controller-0.c.example-com.internal Ready 6m v1.12.2
|
yavin-controller-0.c.example-com.internal Ready 6m v1.13.4
|
||||||
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.12.2
|
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.13.4
|
||||||
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.12.2
|
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.13.4
|
||||||
yavin-16x-worker-jrbf.c.example-com.internal Ready 3m v1.12.2
|
yavin-16x-worker-jrbf.c.example-com.internal Ready 3m v1.13.4
|
||||||
yavin-16x-worker-mzdm.c.example-com.internal Ready 3m v1.12.2
|
yavin-16x-worker-mzdm.c.example-com.internal Ready 3m v1.13.4
|
||||||
```
|
```
|
||||||
|
|
||||||
### Variables
|
### Variables
|
||||||
|
@ -18,7 +18,7 @@ Fedora Atomic is a container-optimized operating system designed for large-scale
|
|||||||
|
|
||||||
For newcomers, Typhoon is a free (cost and freedom) Kubernetes distribution providing upstream Kubernetes, declarative configuration via [Terraform](https://www.terraform.io/intro/index.html), and support for AWS, Google Cloud, DigitalOcean, and bare-metal. Typhoon clusters use a [self-hosted](https://github.com/kubernetes-incubator/bootkube) control plane, support [Calico](https://www.projectcalico.org/blog/) and [flannel](https://coreos.com/flannel/docs/latest/) CNI networking, and enable etcd TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/), and network policy.
|
For newcomers, Typhoon is a free (cost and freedom) Kubernetes distribution providing upstream Kubernetes, declarative configuration via [Terraform](https://www.terraform.io/intro/index.html), and support for AWS, Google Cloud, DigitalOcean, and bare-metal. Typhoon clusters use a [self-hosted](https://github.com/kubernetes-incubator/bootkube) control plane, support [Calico](https://www.projectcalico.org/blog/) and [flannel](https://coreos.com/flannel/docs/latest/) CNI networking, and enable etcd TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/), and network policy.
|
||||||
|
|
||||||
Typhoon for Fedora Atomic reflects many of the same principles that created Typhoon for Container Linux. Clusters are declared using plain Terraform configs that can be versioned. In lieu of Ignition, instances are declaratively provisioned with Cloud-Init and kickstart (bare-metal only). TLS assets are generated. Hosts run only a kubelet service, other components are scheduled (i.e. self-hosted). The upstream hyperkube is used directly[^1]. And clusters are kept minimal by offering optional addons for [Ingress](https://typhoon.psdn.io/addons/ingress/), [Prometheus](https://typhoon.psdn.io/addons/prometheus/), and [Grafana](https://typhoon.psdn.io/addons/grafana/). Typhoon compliments and enhances Fedora Atomic as a choice of operating system for Kubernetes.
|
Typhoon for Fedora Atomic reflects many of the same principles that created Typhoon for Container Linux. Clusters are declared using plain Terraform configs that can be versioned. In lieu of Ignition, instances are declaratively provisioned with Cloud-Init and kickstart (bare-metal only). TLS assets are generated. Hosts run only a kubelet service, other components are scheduled (i.e. self-hosted). The upstream hyperkube is used directly[^1]. And clusters are kept minimal by offering optional addons for [Ingress](/addons/ingress/), [Prometheus](/addons/prometheus/), and [Grafana](/addons/grafana/). Typhoon compliments and enhances Fedora Atomic as a choice of operating system for Kubernetes.
|
||||||
|
|
||||||
Meanwhile, Fedora Atomic adds some promising new low-level technologies:
|
Meanwhile, Fedora Atomic adds some promising new low-level technologies:
|
||||||
|
|
||||||
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user