Compare commits

..

201 Commits

Author SHA1 Message Date
3dc755994b Add missing changelog entry for Digital Ocean fix 2018-01-20 07:52:40 -08:00
ddbfb2eee1 Set module version tutorials docs for good practice 2018-01-19 23:16:48 -08:00
868265988b Update bootkube and terraform-render-bootkube to v0.10.0 2018-01-19 23:10:45 -08:00
6adffcb778 Update Kubernetes from v1.9.1 to v1.9.2 2018-01-19 08:40:09 -08:00
bc967ddcd0 addons: Update CLUO to fix compatability with Kubernetes 1.9
* Update CLUO from v0.4.1 to v0.5.0
* Earlier versions of CLUO fail to drain nodes on Kubernetes 1.9
so nodes drain one at a time repeatedly and Container Linux OS
updates are not applied to nodes.
* Check current OS versions via `kubectl get nodes --show-labels`
2018-01-19 08:33:26 -08:00
ef18f19ec4 Edit digital ocean port range and ordering to suppress diff
* Change port range from keyword "all" to "1-65535", which is the
same but with digitalocean provider 0.1.3 doesn't produce a diff
* Rearrange egress firewall rules to order the Digtial Ocean API
and provider returns. In current testing, this fixes the last diff
that was present on `terraform plan`.
2018-01-15 22:13:59 -08:00
f5efcc1ff8 Relax digitalocean provider version constraints
* Relax fixed 0.1.2 version constraint to "~> 0.1.2", which
allows 0.1.3, 0.1.4, etc, but would not allow 0.2.0
2018-01-15 21:04:53 -08:00
996651c605 Update kube-state-metrics version and RBAC cluster role
* https://github.com/kubernetes/kube-state-metrics/pull/345
* https://github.com/kubernetes/kube-state-metrics/pull/334
2018-01-15 08:33:44 -08:00
38fa7dff1a Create separate bare-metal container-linux-install profiles
* Create separate container-linux-install profiles (and
cached-container-linux-install) for each node in a cluster
* Fix contention bug on bare-metal during `terraform apply`.
With only a global install profile, terraform would create
(or retain) the profile for each cluster and try to delete
it for each cluster being deleted. As a result, in some cases
apply had to be run multiple times before terraform's repr
of constraints was satisfied (profile deleted and recreated)
* Allow Container Linux install properties to vary between
clusters, such as using a different Container Linux channel
or version for different clusters
2018-01-15 08:23:03 -08:00
bbe295a3f1 Add Terraform v0.11.x support and migration docs
* Add explicit "providers" section to modules for Terraform v0.11.x
* Retain support for Terraform v0.10.4+
* Add migration guide from Terraform v0.10.x to v0.11.x for those managing
existing clusters (action required!)
2018-01-13 15:30:08 -08:00
d8db296932 Update kube-dns and use separate service account
* Update kube-dns from v1.14.7 to v1.14.8
* Use a separate kube-dns service account
* https://github.com/kubernetes/kubernetes/pull/57918
2018-01-12 10:29:30 -08:00
388ac08492 Update etcd from 3.2.13 to 3.2.14
* https://github.com/coreos/etcd/releases/tag/v3.2.14
2018-01-12 07:20:55 -08:00
527b5ca602 Update CHANGELOG.md for v1.9.1 2018-01-09 07:03:04 -08:00
ecd6a9443b Add maintenance docs with upgrade policies
* Add best practices for maintenance
* Describe blue-green replacement strategy
* Mention unsupported in-place edit and
node replacement strategies
2018-01-09 06:54:44 -08:00
2523d64f95 Fix docs to show exporting KUBECONFIG 2018-01-06 16:55:06 -08:00
fc455c8624 Remove old mention of ACIs in bootkube.service description 2018-01-06 16:20:34 -08:00
7a0a60708e Bump Container Linux version shown in docs
* Be sure docs and examples list Container Linux versions that
have been patched for Meltdown just in case someone copy-pastes
or sees them as recent versions
2018-01-06 14:58:38 -08:00
51a5f64024 Enable portmap plugin alongside Calico to fix hostPort
* https://github.com/poseidon/terraform-render-bootkube/pull/36
2018-01-06 14:01:18 -08:00
e1f2125f02 Update etcd from 3.2.0 to 3.2.13
* https://github.com/coreos/etcd/releases/tag/v3.2.13
2018-01-06 14:01:18 -08:00
9329b775f6 Update Kubernetes from v1.8.6 to v1.9.1 2018-01-06 14:01:16 -08:00
e04cce1201 Update mkdocs and material docs theme 2018-01-06 10:59:56 -08:00
201a38bd90 Update CHANGELOG.md for v1.8.6 2017-12-22 13:00:18 -08:00
fbdd946601 Update Kubernetes from v1.8.5 to v1.8.6 2017-12-21 11:20:37 -08:00
19102636a9 Add link to dashboard 315 2017-12-15 18:52:40 -08:00
21e540159b addons: Update grafana from v4.6.2 to v4.6.3
* https://github.com/grafana/grafana/releases/tag/v4.6.3
2017-12-15 16:09:14 -08:00
43e65a4d13 Update CHANGELOG.md for v1.8.5 2017-12-15 02:04:13 -08:00
e79088baa0 Add optional cluster_domain_suffix variable
* Allow kube-dns to respond to DNS queries with a custom
suffix, instead of the default 'cluster.local'
* Useful when multiple clusters exist on the same local
network and wish to query services on one another
2017-12-15 01:45:52 -08:00
495e33e213 Update bootkube and terraform-render-bootkube to v0.9.1 2017-12-15 01:45:02 -08:00
63f5a26a72 Eliminate steps to move self-hosted etcd assets
* bootkube/assets/experimental/* assets corresponded to self-hosted
etcd manifests, which are no longer an option in Typhoon
2017-12-13 01:06:56 -08:00
eea79e895d Fix manifest consolidation in bootkube start wrapper
* Fix manifest existence test in /opt/bootkube/bootkube-start
to also work with more than one directory
2017-12-12 23:08:22 -08:00
99c07661c6 Fix old Container Linux versions mentioned in docs 2017-12-11 23:36:16 -08:00
521a1f0fee addons: Update heapster from v1.4.3 to v1.5.0
* Rollback addon-resizer to 1.7 to address issues in large
clusters https://github.com/kubernetes/kubernetes/pull/52536
2017-12-11 23:34:25 -08:00
7345cb6419 addons: Update nginx-ingress to 0.9.0 2017-12-11 00:48:15 -08:00
a481d71d7d addons: Update nginx-ingress to 0.9.0-beta.19
* Undo rollback f00ecde854
* Port binding regression only occurs with --enable-ssl-passthrough,
which isn't used in these examples. See
https://github.com/kubernetes/ingress-nginx/issues/1788
2017-12-11 00:44:32 -08:00
831a5c976c Add Kubernetes Dashboard warning and improve changelog 2017-12-09 22:38:27 -08:00
85e6783503 Recommend Container Linux images with Docker 17.09
* Container Linux stable and beta now provide Docker 17.09 (instead
of 1.12). Recommend images which provide 17.09.
* Older clusters (with CLUO addon) auto-update node's Container Linux version
and will begin using Docker 17.09.
2017-12-09 22:14:13 -08:00
165396d6aa Update Kubernetes from v1.8.4 to v1.8.5 2017-12-09 21:28:31 -08:00
ce49a93d5d Fix issue with etcd-member failing to resolve peers
* When restarting masters, `etcd-member.service` may fail to lookup peers if
/etc/resolv.conf hasn't been populated yet. Require the wait-for-dns.service.
2017-12-09 20:12:49 -08:00
e623439eec Fix typos in docs and CONTRIBUTING.md 2017-12-09 19:58:09 -08:00
9548572d98 Add kubelet --volume-plugin-dir flag on bare-metal
* Kubelet will search path for flexvolume plugins
2017-12-05 13:12:53 -08:00
f00ecde854 Rollback nginx-ingress on GCE to 0.9.0-beta.17
* https://github.com/kubernetes/ingress-nginx/issues/1788
2017-12-02 14:06:22 -08:00
d85300f947 Clarify only Terraform v0.10.x should be used
* It is not safe to update to Terraform v0.11.x yet
* https://github.com/hashicorp/terraform/issues/16824
2017-12-02 01:31:39 -08:00
65f006e6cc addons: Sync prometheus alerts to upstream
* https://github.com/coreos/prometheus-operator/pull/774
2017-12-01 23:24:08 -08:00
8d3817e0ae addons: Update nginx-ingress to 0.9.0-beta.19
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.9.0-beta.19
2017-12-01 22:32:33 -08:00
5f5eec1175 Update bootkube and terraform-render-bootkube to v0.9.0 2017-12-01 22:27:48 -08:00
5308fde3d3 Add Kubernetes certification badge 2017-11-29 19:26:49 -08:00
9ab61d7bf5 Add Typhoon images with and without text
* Serve images from GCS poseidon, rather than dghubble
2017-11-29 01:01:01 -08:00
6483f613c5 Update Kubernetes from v1.8.3 to v1.8.4 2017-11-28 21:52:11 -08:00
56c6bf431a Update terraform-render-bootkube for Kubernetes v1.8.4
* Update hyperkube from v1.8.3 to v1.8.4
* Remove flock from bootstrap-apiserver and kube-apiserver
* Remove unused critical-pod annotations in manifests
* Use service accounts for kube-proxy and pod-checkpointer
* Update Calico from v2.6.1 to v2.6.3
* Update flannel from v0.9.0 to v0.9.1
* Remove Calico termination grace period to prevent calico
from getting stuck for extended periods
* https://github.com/poseidon/terraform-render-bootkube/pull/29
2017-11-28 21:42:26 -08:00
63ab117205 addons: Add prometheus rules for DaemonSets
* https://github.com/coreos/prometheus-operator/pull/755
2017-11-16 23:51:21 -08:00
1cd262e712 addons: Fix prometheus K8SApiServerLatency alert rule
* https://github.com/coreos/prometheus-operator/issues/751
2017-11-16 23:37:15 -08:00
32bdda1b6c addons: Update Grafana from v4.6.1 to v4.6.2
* https://github.com/grafana/grafana/releases/tag/v4.6.2
2017-11-16 23:34:36 -08:00
07d257aa7b Add initrd kernel argument needed by UEFI clients
* https://github.com/coreos/bugs/issues/1239
2017-11-16 23:19:51 -08:00
fd96067125 Fix docs link for security issue reporting 2017-11-10 21:38:41 -08:00
9d16f5c78a Update min Google plugin and remove target pool workaround
* With google provider 1.2, target pool instances can use self_link
and zone/name formats without causing a diff on each plan
* Original workaround: 77fc14db71
2017-11-10 21:15:19 -08:00
159443bae7 addons: Add better alerting rules to Prometheus manifests
* Adapt the coreos/prometheus-operator alerting rules for Typhoon,
https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus/manifests
* Add controller manager and scheduler shim services to let
prometheus discover them via service endpoints
* Fix several alert rules to use service endpoint discovery
* A few rules still don't do much, but they default to green
2017-11-10 20:57:47 -08:00
119dc859d3 addons: Update nginx-ingress to 0.9.0-beta.17
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.9.0-beta.17
2017-11-10 20:16:40 -08:00
5f6b0728c5 Update bootkube and terraform-render-bootkube to v0.8.2 2017-11-10 20:01:37 -08:00
d774c51297 Update Kubernetes from v1.8.2 to v1.8.3 2017-11-08 23:34:19 -08:00
f6a8fb363e Remove deprecated kubelet --require-kubeconfig flag
* https://github.com/kubernetes/kubernetes/pull/40050
2017-11-08 23:34:19 -08:00
f570af9418 addons: Update from Prometheus v1.8.2 to v2.0.0 2017-11-08 22:48:23 -08:00
4ec6732b98 Output the Google network name and self_link
* Allow users to add custom firewall rules for unique cases
2017-11-08 00:19:49 -08:00
ea1efb536a Remove old firewall rule for bootstrap self-hosted etcd 2017-11-08 00:15:20 -08:00
451fd86470 Improve internal firewall rules on Google Cloud
* Whitelist internal traffic between controllers and workers
* Switch to tag-based firewall policies rather than source IP
2017-11-08 00:15:06 -08:00
b1b611b22c Add docs to use one controller on Google Cloud 2017-11-07 19:51:03 -08:00
eabf00fbf1 Add missing controller dependency before bootkube start
* Require the controller module to be completed before starting
to remote exec bootkube start, otherwise its possible the controller
nodes were created, but not the network load balancer
2017-11-07 19:12:05 -08:00
8eaa72c1ca addons: Update nginx-ingress to 0.9.0-beta.16
* Image registry changed from gcr.io to quay.io
* https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.9.0-beta.16
2017-11-06 23:15:15 -08:00
58cf82da56 Promote AWS platform from alpha to beta 2017-11-06 21:38:24 -08:00
ccc832f468 Add firewall rule to allow apiserver to proxy other controller kubelets
* Prometheus proxies through the apiserver to scrape kubelets
* In multi-controller setups, an apiserver must be able to scrape
kubelets (10250) on other controllers
2017-11-06 01:03:53 -08:00
90f8d62204 Add firewall rules to allow prometheus to reach node-exporter
* node_exporter service endpoints run on hostNetwork port 9100
* Re-evaluate after https://github.com/kubernetes-incubator/bootkube/pull/711
2017-11-06 01:03:53 -08:00
af5c413abf Focus controller ELB on load balancing apiservers
* ELB distributing load across controllers is no longer the mechanism
used to SSH to instances to distribute secrets
* Focus the ELB on load balancing across apiserver and edit the HTTP
health check to an SSL:443 check
2017-11-06 01:03:53 -08:00
168c487484 Remove mention of self-hosted etcd, its deprecated 2017-11-06 01:03:53 -08:00
805dd772a8 Run etcd cluster on-host, across controllers on AWS
* Change controllers ASG to heterogeneous EC2 instances
* Create DNS records for each controller's private IP for etcd
* Change etcd to run on-host, across controllers (etcd-member.service)
* Reduce time to bootstrap a cluster
* Deprecate self-hosted-etcd on the AWS platform
2017-11-06 01:03:53 -08:00
c6ec6596d8 Minor cleanup for zones, docs, and outputs
* Spread across all zones, regardless of UP/DOWN state
* Remove unused outputs of private IPs
2017-11-06 00:56:26 -08:00
47a9989927 Fix null_resource ordering constraints
* Ensure etcd TLS assets and kubeconfig are copied before
any attempt is made to run bootkube start
2017-11-06 00:55:44 -08:00
10b977d54a addons: Set kube-state-metrics to have clusterIP None
* kube-state-metrics service exists to facilitate prometheus discovery
2017-11-05 17:54:09 -08:00
b7a268fc45 addons: Add prometheus alertmanager flag
* Pass -alertmanager.url to work with a user's in-cluster
alertmanager deployment, if any
2017-11-05 15:50:46 -08:00
279f36effd addons: Add grafana 4.6.1 and extend prometheus docs 2017-11-05 15:23:56 -08:00
77fc14db71 Workaround target pool issue by listing instances as zone/name
* Instances can be listed by zone/name or self_link URL, but the
provider desires they be in zone/name form, which causes a diff
* https://github.com/terraform-providers/terraform-provider-google/issues/46
2017-11-05 14:07:05 -08:00
2b0296d671 Create controller instances across zones in the region
* Change controller instances to automatically span zones in a region
* Remove the `zone` required variable
2017-11-05 13:24:32 -08:00
7b38271212 Run etcd cluster on-host, across controllers on Google Cloud
* Change controllers from a managed group to individual instances
* Create discrete DNS records to each controller's private IP for etcd
* Change etcd to run on-host, across controllers (etcd-member.service)
* Reduce time to bootstrap a cluster
* Deprecate self-hosted-etcd on the Google Cloud platform
2017-11-05 11:03:35 -08:00
ae07a21e3d addons: Omit static resource requests/limits for kube-state-metrics
* Allow the addon-resizer to dynamically set resource values
* https://github.com/kubernetes/kube-state-metrics/pull/285
2017-11-04 14:41:04 -07:00
0ab1ae3210 addons: Fix typo in kube-state-metrics strategy 2017-11-04 14:39:56 -07:00
67e3d2b86e docs: GCE network bandwidth is excellent, even btw zones
* Remove performance note that the GCE vs AWS network performance
is not an equal comparison. On both platforms, workers now span the
(availability) zones of a region.
* Testing host-to-host and pod-to-pod network bandwidth between nodes
(now located in different zones) showed no reduction in bandwidth
2017-11-04 14:08:20 -07:00
a48dd9ebd8 Require google provider version ~> 1.1
* Require google provider plugin 1.1 or higher which includes fix:
https://github.com/terraform-providers/terraform-provider-google/issues/574
* Remove workaround which statically set the persistent disk name
* Original reasons for workaround in a97df839 or GH #34
2017-11-04 12:59:19 -07:00
26a291aef4 Remove controller_preemptible option on Google Cloud
* Controller preemption is not safe or covered in documentation. Delete
the option, the variable is a holdover from old experiments
* Note, worker_preemeptible is still a great feature that's supported
2017-11-04 12:59:19 -07:00
251a14519f Fix typo in internal template variable name
* ssh_authorized_keys should be ssh_authorized_key to match the user
facing variable which only allows a single SSH authorized key
2017-11-04 12:59:19 -07:00
6300383b43 Change worker managed instance group to span zones in region
* Change Google Cloud module to require the `region` variable
* Workers are created in random zones within the given region
* Tolerate Google Cloud zone failures or capacity issues
* If workers are preempted (if enabled), replacement instances can
be drawn from any zone in the region, which should avoid scheduling
issues that were possible before if a single zone aggressively
preempts instances (presumably due to Google Cloud capacity)
2017-11-04 12:59:19 -07:00
e32885c9cd addons: Update prometheus from v1.8.0 to v1.8.2
* https://github.com/prometheus/prometheus/releases/tag/v1.8.2
2017-11-04 11:00:39 -07:00
fe8afdbee9 Update Typhoon logo and favicon 2017-11-04 01:20:17 -07:00
878f5a3647 Bump bootkube and terraform-render-bootkube to v0.8.1
* Use the v0.8.1 tagged terraform-render-bootkube module
* Use the v0.8.1 quay.io/coreos/bootkube image to bootstrap
2017-10-28 12:50:37 -07:00
34ec7e9862 Relax pessimistic constraints on 1.0+ providers
* Constrains ~> 1.0 means users can use 1.0.1, 1.1, but not 2.0
* https://www.terraform.io/docs/configuration/terraform.html
2017-10-25 23:27:28 -07:00
f6c6e85f84 Require minimum Terraform and plugin versions
* Bump minimum Terraform version to v0.10.4
* Allow minor version updates for 1.0+ plugins
* Fix versions for plugins which are pre-1.0
2017-10-25 23:00:31 -07:00
8582e19077 Expand Nginx Ingress liveness and readiness probes
* Remove dnsPolicy: ClusterFirst
* https://github.com/kubernetes/ingress-nginx/pull/1584
2017-10-25 22:29:20 -07:00
3727c40c6c Update Nginx Ingress defaultbackend from 1.0 to 1.4
* https://github.com/kubernetes/ingress-nginx/pull/1568
2017-10-25 22:16:23 -07:00
b608f9c615 addons: Use service endpoints to scrape node-exporter 2017-10-24 22:59:00 -07:00
ec1dbb853c addons: Include kube-state-metrics exporter manifests 2017-10-24 22:59:00 -07:00
d046d45769 addons: Include Prometheus and node-exporter manifests 2017-10-24 22:58:59 -07:00
a73f57fe4e Update CLUO from v0.4.0 to v0.4.1 2017-10-24 22:14:03 -07:00
60bc8957c9 Update Kubernetes from v1.8.1 to v1.8.2
* Kubernetes v1.8.2 fixes a memory leak in the v1.8.1 apiserver
* Switch to using the `gcr.io/google_containers/hyperkube` for the
on-host kubelet and shutdown drains
* Update terraform-render-bootkube manifests generation
  * Update flannel from v0.8.0 to v0.9.0
  * Add `hairpinMode` to flannel CNI config
  * Add `--no-negcache` to kube-dns dnsmasq
2017-10-24 21:44:26 -07:00
8b78c65483 Update Google Cloud Kubernetes from v1.7.7 to v1.8.1 2017-10-20 16:09:11 -07:00
f86c00288f Add missing update-agent RBAC role to get pods
* Drain now gets pods, deletes pods, and waits for deletion
2017-10-20 01:21:46 -07:00
a57b3cf973 Update CLUO addon to v0.4.0 and RBAC ClusterRole 2017-10-20 00:40:17 -07:00
10c5487ad7 Add docs corrections for versions and log output 2017-10-20 00:39:17 -07:00
e4c479554c Update AWS, DO, BM Kubernetes from v1.7.7 to v1.8.1
* Update from bootkube v0.7.0 to v0.8.0
* Leave Google Cloud update to a followup commit
2017-10-19 21:10:04 -07:00
be113e77b4 Fix links and add Calico BGP peering notes 2017-10-17 19:10:18 -07:00
911c53e4ae Add Ubiquity EdgeRouter documentation 2017-10-17 18:51:40 -07:00
bfa8dfc75d Conditionally set networkd content on bare-metal
* Without this change, if a cluster doesn't set the controller
or worker networkd lists, an err "element() may not be used
with an empty list" occurs.
* controller_networkds and worker_networks are intended to be
optional and temporary, not required at all
2017-10-17 18:47:12 -07:00
43dc44623f Fix the terraform fmt of configs 2017-10-16 01:32:25 -07:00
734bc1d32a Add performance benchmark for flannel with bonded NICs 2017-10-16 01:12:13 -07:00
41e632280f Remove unused storage section ala PXE-only Matchbox templating 2017-10-16 00:42:20 -07:00
fc22f04dd6 Add temporary variables for multi-nic testing
* Accept ordered lists of controller and worker networkd configs
* Do not rely on these variables. They will be replaced with a
cleaner mechanism at a future date
2017-10-16 00:39:58 -07:00
377e14c80b Fix ingress addon docs recursive apply command 2017-10-16 00:29:04 -07:00
9ec8ec4afc Secure copy etcd TLS credentials to controllers only
* Controllers receive etcd TLS credentials
* Controllers and workers receive a kubeconfig
2017-10-14 20:48:02 -07:00
5c1ed37ff5 Add SSH key to user "debug" during disk-install phase
* Avoid adding SSH authorized key for user "core" during the disk
install, so that terraform apply cannot SSH until post-install
2017-10-14 20:37:42 -07:00
e765fb310d Allow setting custom PXE boot kernel_args on bare-metal 2017-10-14 19:39:10 -07:00
7b5ffd0085 Add Container Linux reboot-coordinator RBAC
* Add a reboot-coordinator namespace for CLUO components
* Define an RBAC ClusterRole for update-operator and update-agent
* Replace the older-style where CLUO ran in kube-system, with
admin privilege
2017-10-14 19:35:06 -07:00
123439c2a4 Remove or compress docs image assets 2017-10-14 19:12:22 -07:00
11453bac91 Update heapster addon from v1.4.0 to v1.4.3
* Use normal name and phase labels
2017-10-14 19:07:37 -07:00
dd0c61d1d9 Update Nginx Ingress controller addon to 0.9.0-beta.15 2017-10-14 18:30:58 -07:00
5c87529011 Demote Google Cloud from stable to beta
* See #34 postmortem and action items for context on
when stable status will be restored
2017-10-11 19:32:04 -07:00
a97df839ea google-cloud: Set disk.device_name to match API default
* Terraform provider "google" plugin releases leave the disk
device_name as "" by default. Recently the API has started to
set a default name "persistent-disk-0". Plan and apply show
all instance groups need to be recreated to "fix" the name
* Impact: Controller and worker instance groups are deleted
and recreated, deleting data on controllers and bringing
down clusters
* Fix: Explicitly set the disk_name to persistent-disk-0 so
that terraform finds no diff needs to be applied.
* https://github.com/poseidon/typhoon/issues/34
* https://github.com/terraform-providers/terraform-provider-google/issues/574
2017-10-11 18:04:39 -07:00
a5290dac32 Update docs to show Digital Ocean with on-host etcd 2017-10-09 23:47:32 -07:00
308c7dfb6e digital-ocean: Run etcd cluster on-host, across controllers
* Run etcd peers with TLS across controller nodes
* Deprecate self-hosted-etcd on the Digital Ocean platform
* Distribute etcd TLS certificates as part of initial provisioning
* Check the status of etcd by running `systemctl status etcd-member`
2017-10-09 22:43:23 -07:00
da63c89d71 Remove mention of ct plugin in bare-metal docs 2017-10-08 23:37:41 -07:00
62d7ccfff3 Add docs on provision time and network performance 2017-10-04 00:05:43 -07:00
1bc25c1036 Update Kubernetes from v1.7.5 to v1.7.7
* Update from bootkube v0.6.2 to v0.7.0
* Use renamed terraform-render-bootkube. Renamed from
bootkube-terraform to meet Terraform Module requirements
2017-10-03 21:03:15 -07:00
2d5a4ae1ef Update kube-dns image to address dnsmasq vulnerability
* https://security.googleblog.com/2017/10/behind-masq-yet-more-dns-and-dhcp.html
2017-10-02 10:27:10 -07:00
1ab27ae1f1 Fix status of the google-cloud module to production 2017-10-01 21:41:08 -07:00
def84aa5a0 docs: Add details about security features 2017-10-01 21:38:52 -07:00
dd883988bd Update from Calico v2.5.1 to v2.6.1
* Network policy improvements
* Update cni sidecar image from v1.10.0 to v1.11.0
* Lower log level in Calico CNI config from debug to info
2017-09-30 16:16:40 -07:00
e0d8917573 Add LICENSE to top-level of each module 2017-09-28 20:41:19 -07:00
f7f983c7da docs: Add docs and addons for Nginx AWS Ingress 2017-09-28 01:09:31 -07:00
b20233e05d aws: Add Ingress ELB DNS name output as ingress_dns_name
* Expose the Ingress ELB DNS name so application DNS records can
be defined in Terraform to resolve to the Ingress ELB
2017-09-28 00:46:17 -07:00
77e387cf83 Add top-level README.md with module overview 2017-09-27 22:09:52 -07:00
795428329a google-cloud: Move controller and worker submodules under kubernetes 2017-09-27 20:50:32 -07:00
f7dd959e9c bare-metal: Stop including etcd-network-checkpointer 2017-09-27 18:25:20 -07:00
b62a6def23 Merge pull request #26 from poseidon/fix-nfs-issue
Add Wants=rpc-statd.service to Kubelet
2017-09-24 20:18:22 -07:00
1b5caef4c1 Add Wants=rpc-statd.service to Kubelet
* Mounting NFS exports as volumes from some NFS servers fails because
the kubelet isn't starting rpc-statd as expected. Describing pods
that are stuck creating shows rpc.statd is required for remote locking
* Starting rpc-statd.service resolves the issue and all NFS mounts
seem to be working.
* Recommended approach https://github.com/coreos/bugs/issues/2074
2017-09-24 18:23:55 -07:00
767efabeb2 Merge pull request #23 from poseidon/drop-bm-self-etcd
bare-metal: Remove support for experimental_self_hosted_etcd
2017-09-23 16:55:25 -07:00
68726a2773 bare-metal: Remove support for experimental_self_hosted_etcd
* Transition from discouraging self-hosted etcd for bare-metal,
to removing it as an option
* See #13 and FAQ for self-hosted etcd discussion
2017-09-23 16:49:15 -07:00
4ea85b1ac8 Merge pull request #25 from poseidon/fix-bm-bootkube
bare-metal: Update to using Kubernetes v1.7.5 assets
2017-09-23 16:31:01 -07:00
74d8b9dabe *: Update bootkube-terraform sha hash to corresponding named tag
* bootkube-terraform v0.6.2 dbfb11c6eafa08f839eac2834ca1aca35dafe965
2017-09-23 14:10:42 -07:00
777c860b1c bare-metal: Update to using Kubernetes v1.7.5 control plane manifests
* bootkube-terraform module wasn't bumped for bare-metal
2017-09-23 14:04:18 -07:00
b033a94efc Merge pull request #24 from poseidon/improve-docs
README: Add IRC link, CHANGES.md, and minor fixes
2017-09-23 14:02:25 -07:00
235c8a5222 README: Add IRC link, CHANGES.md, and minor fixes 2017-09-23 13:55:44 -07:00
69cabd9486 Merge pull request #22 from poseidon/better-templating
bare-metal: Ues Terraform templating for Container Linux configs
2017-09-23 12:55:55 -07:00
bca96bb124 bare-metal: Ues Terraform templating for Container Linux configs
* Template bare-metal Container Linux configs with Terraform's
(limited) template_file module. This allows rendering problems
to be identified during `terraform plan` and is favored over
using the Matchbox templating feature when the configs are
served to PXE booting nodes.
* Writes a Matchbox profile for each machine, which will be served
as-is. The effect is the same, each node gets provisioned with its
own Container Linux config.
2017-09-23 11:49:12 -07:00
cd368c123f docs: Add missing Terraform plugin section for bare-metal 2017-09-18 22:36:01 -07:00
7c733bd314 Add Nginx Ingress controller addons and docs 2017-09-18 01:48:21 -07:00
229a4c5293 Merge pull request #18 from poseidon/add-aws
Add AWS module and docs
2017-09-17 23:50:49 -07:00
47387d552a docs: Add tutorial for AWS usage 2017-09-17 23:41:43 -07:00
7c046b6206 *: Fix Terraform fmt and comments 2017-09-17 21:43:00 -07:00
d8e4ac172a Add dghubble/pegasus AWS Kubernetes Terraform module 2017-09-17 21:40:33 -07:00
663f37ed6d google-cloud: Remove unused service accounts 2017-09-14 15:47:44 -07:00
fb5f63c8be google-cloud: Update kubelet.service unit to match upstream
* Mount host /opt/cni/bin in Kubelet to use host's CNI plugins
* Switch /var/run/kubelet-pod.uuid to /var/cache/kubelet-pod.uuid
to persist between reboots and cleanup old Kubelet pods
* Organize Kubelet flags in alphabetical order
2017-09-14 15:47:44 -07:00
0d6410505d bare-metal: Update kubelet.service unit to match upstream
* Mount host /opt/cni/bin in Kubelet to use host's CNI plugins
* Switch /var/run/kubelet-pod.uuid to /var/cache/kubelet-pod.uuid
to persist between reboots and cleanup old Kubelet pods
* Organize Kubelet flags in alphabetical order
2017-09-14 11:44:02 -07:00
2a2ed372c8 digital-ocean: Update kubelet.service unit to match upstream
* Mount host /opt/cni/bin in Kubelet to use host's CNI plugins
* Switch /var/run/kubelet-pod.uuid to /var/cache/kubelet-pod.uuid
to persist between reboots and cleanup old Kubelet pods
* Organize Kubelet flags in alphabetical order
2017-09-13 20:49:23 -07:00
2ff6d602d8 digital-ocean: Distribute kubeconfig via Terraform null_resource
* Keep kubeconfig out of DigitalOcean metadata user-data
2017-09-13 20:19:52 -07:00
64e8d207b1 Change bare-metal and GCE networking default to calico
* Switch networking default from flannel to calico
2017-09-12 09:16:58 -07:00
a441f5c6e0 Update Kubernetes from v1.7.3 to v1.7.5 2017-09-08 13:56:20 -07:00
00b61a26c0 docs: Add docs on Calico networking support
* Digital Ocean firewalls don't yet support the required
IP tunneling protocol so Calico cannot be used without
disabling firewalls right now.
2017-09-05 19:01:32 -07:00
1efe39d6bc Allow MTU for bare-metal Calico to be customized
* Calico on bare-metal defaults to IP-in-IP encapsulation and MTU 1480
2017-09-05 19:01:18 -07:00
ec46bc13ae Add support for Calico networking on GCE
* Calico on GCE with IP-in-IP encapsulation and MTU 1440
* Calico on DO with IP-in-IP encapsulation and MTU 1440
* Digital Ocean firewalls don't support IPIP protocol yet
2017-09-05 18:22:14 -07:00
d48f88cfd6 Fix typo in the issue template 2017-09-04 20:56:01 -07:00
6ef326a872 bare-metal: Add support for Calico networking
* Add variable networking with "flannel" or "calico"
2017-09-01 17:52:22 -07:00
64435adbc3 Merge pull request #7 from ericchiang/fix-link
README.md: fix addons link
2017-08-29 10:51:35 -07:00
140e869278 README.md: fix addons link 2017-08-29 10:49:01 -07:00
082dedbdbd docs: Fix broken addons overview.md link 2017-08-27 21:11:24 -07:00
a2609c14c0 addons: Disable Google Analytics in CLUO 2017-08-27 21:06:49 -07:00
564c0160bf Add heapster, dashboard, and CLUO addons 2017-08-27 17:20:29 -07:00
5b2275872c Update README to match docs index page 2017-08-27 16:09:23 -07:00
2faacc6a50 Add concepts, tutorials, and faq docs
* Add bare-metal tutorial
* Add DigitalOcean tutorial
* Add Google Cloud tutorial
2017-08-27 15:21:57 -07:00
056bd8a059 google-cloud: Remove deprecated automatic_restart field
* In terraform-provider-google v0.1.3, it is no longer neccessary
to supply a (duplicated) value for the instance_template field
automatic_restart
* Previously this field was set to match the scheduling
automatic_restart since the field defaulted to true and would
cause plan to always show changes were needed
2017-08-25 00:14:02 -07:00
6a574d4a01 Organize README to work with published docs 2017-08-23 00:53:21 -07:00
b29a6cd1cd digital-ocean: Fix the digital-ocean default variables.tf
* Set the controller_type default to 2gb, the minimum that will
work
2017-08-23 00:53:03 -07:00
a97bbf7128 digital-ocean: Switch droplet tag string to tag reference
* Without a reference a Digital Ocean tag object, terraform may
try to create a firewall rule before a tag actually exists. By
referencing the actual tag objects, the dependency order is
implied
2017-08-16 20:13:18 -07:00
dc3ff174ea Update Kubernetes from v1.7.1 to v1.7.3 2017-08-16 20:12:59 -07:00
fc018ffa28 Rename project and organization 2017-08-14 19:24:04 -07:00
bac968d3eb Simplify google-cloud cluster variables
* Remove k8s_domain_name input variable, the controller DNS
record will be "${var.cluster_name}.${dns_zone}"
* Rename dns_base_zone to dns_zone
* Rename dns_base_zone_name to dns_zone_name
2017-08-13 13:06:12 -07:00
40bd338eab Add Github issue and pull request templates 2017-08-13 12:30:30 -07:00
e5975cf9c7 Add CONTRIBUTING.md and DCO agreement 2017-08-13 12:27:17 -07:00
e19517d3df Fix the terraform fmt of configs 2017-08-12 18:26:05 -07:00
f04411377f digital-ocean: Add cluster firewall rules
* Requires Terraform v0.10.0+
2017-08-12 18:22:18 -07:00
cafc58c610 Update module source from dghubble to purenetes 2017-08-07 19:30:41 -07:00
88ba9bd273 Update README with features, modules, and social contract
* Add customization, contributing, and non-goal sections
2017-08-06 23:49:05 -07:00
29ed9eaa33 Add MIT License 2017-08-06 23:41:16 -07:00
b303c0e986 digital-ocean: Add ssh step dependency on the 0th controller 2017-07-29 15:18:24 -07:00
097dcdf47e digital-ocean: Add kubelet hostname-override flag
* Kubelets should register nodes via their private IPv4 address,
as provided by the metadata service from Digital Ocean
* By default, Kubelet exec's hostname to determine the name it should
use when registering with the apiserver. On Digital Ocean, the hostname
is not routeable by other instances. Digital Ocean does not run an
internal DNS service.
* Fixes issue where the apiserver can't reach the worker nodes. This
prevented kubectl logs and exec commands from working
2017-07-29 14:32:51 -07:00
efff7497eb digital-ocean: Join name.dns_zone for controller domain
* Output the DNS FQDNs, IPv4 addresses, and IPv6 addresses
2017-07-29 12:47:47 -07:00
6070ffb449 Add dghubble/pegasus Digital Ocean Kubernetes Terraform module 2017-07-29 11:36:33 -07:00
2d33b9abe2 Update diskless PXE workers to Kubernetes v1.7.1 2017-07-27 23:11:41 -07:00
da596e06bb Add bare-metal support for Container Linux with Matchbox 2017-07-24 23:24:12 -07:00
833c92b2bf Add initial README with goals, operating systems, and platforms 2017-07-24 22:12:13 -07:00
386dc49a58 Organize metal-worker-pxe under bare-metal 2017-07-24 21:40:05 -07:00
4df6bb81a8 Organize modules by platform and OS distribution 2017-07-24 19:41:36 -07:00
75f4826097 Update Kubernetes from v1.6.7 to v1.7.1 2017-07-24 18:51:39 -07:00
b8478007ca Update Kubernetes from v1.6.6 to v1.6.7 2017-07-19 11:50:21 -07:00
4175b3ba66 Add Terraform module for bare-metal PXE workers 2017-07-16 22:45:34 -07:00
eb0df49171 Update Google Cloud Kubernetes to use bootkube v0.5.0 2017-07-16 21:42:32 -07:00
ddfa5e1bea gce: Update Kubernetes to v1.6.6
* Disable locksmithd.service on hosts, the container linux
update operator will be used instead
2017-06-26 22:24:02 -07:00
159 changed files with 8687 additions and 339 deletions

33
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

@ -0,0 +1,33 @@
<!-- Fill in either the 'Bug' or 'Feature Request' section -->
## Bug
### Environment
* Platform: bare-metal, google-cloud, digital-ocean
* OS: container-linux, fedora-cloud
* Terraform: `terraform version`
* Plugins: Provider plugin versions
* Ref: Git SHA (if applicable)
### Problem
Describe the problem.
### Desired Behavior
Describe the goal.
### Steps to Reproduce
Provide clear steps to reproduce the issue unless already covered.
## Feature Request
### Feature
Describe the feature and what problem it solves.
### Tradeoffs
What are the pros and cons of this feature? How will it be exercised and maintained?

10
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@ -0,0 +1,10 @@
High level description of the change.
* Specific change
* Specific change
## Testing
Describe your work to validate the change works.
rel: issue number (if applicable)

200
CHANGES.md Normal file
View File

@ -0,0 +1,200 @@
# Typhoon
Notable changes between versions.
## Latest
## v1.9.2
* Kubernetes [v1.9.2](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v192)
* Add Terraform v0.11.x support
* Add explicit "providers" section to modules for Terraform v0.11.x
* Retain support for Terraform v0.10.4+
* Add [migration guide](https://github.com/poseidon/typhoon/blob/master/docs/topics/maintenance.md) from Terraform v0.10.x to v0.11.x (**action required!**)
* Update etcd from 3.2.13 to 3.2.14
* Update calico from 2.6.5 to 2.6.6
* Update kube-dns from v1.14.7 to v1.14.8
* Use separate service account for kube-dns
* Use kubernetes-incubator/bootkube v0.10.0
#### Addons
* Update CLUO to v0.5.0 to fix compatibility with Kubernetes 1.9 (**important**)
* Earlier versions can't roll out Container Linux updates on Kubernetes 1.9 nodes ([cluo#163](https://github.com/coreos/container-linux-update-operator/issues/163))
* Update kube-state-metrics from v1.1.0 to v1.2.0
* Fix RBAC cluster role for kube-state-metrics
#### Bare-Metal
* Use per-node Container Linux install profiles ([#97](https://github.com/poseidon/typhoon/pull/97))
* Allow Container Linux channel/version to be chosen per-cluster
* Fix issue where cluster deletion could require `terraform apply` multiple times
#### Digital Ocean
* Relax `digitalocean` provider version constraint
* Fix bug with `terraform plan` always showing a firewall diff to be applied ([#3](https://github.com/poseidon/typhoon/issues/3))
## v1.9.1
* Kubernetes [v1.9.1](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v191)
* Update kube-dns from 1.14.5 to v1.14.7
* Update etcd from 3.2.0 to 3.2.13
* Update Calico from v2.6.4 to v2.6.5
* Enable portmap to fix hostPort with Calico
* Use separate service account for controller-manager
## v1.8.6
* Kubernetes [v1.8.6](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#v186)
* Update Calico from v2.6.3 to v2.6.4
## v1.8.5
* Kubernetes [v1.8.5](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md#v185)
* Recommend Container Linux [images](https://coreos.com/releases/) with Docker 17.09
* Container Linux stable, beta, and alpha now provide Docker 17.09 (instead
of 1.12)
* Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09
* Fix race where `etcd-member.service` could fail to resolve peers ([#69](https://github.com/poseidon/typhoon/pull/69))
* Add optional `cluster_domain_suffix` variable (#74)
* Use kubernetes-incubator/bootkube v0.9.1
#### Bare-Metal
* Add kubelet `--volume-plugin-dir` flag to allow flexvolume providers ([#61](https://github.com/poseidon/typhoon/pull/61))
#### Addons
* Discourage deploying the Kubernetes Dashboard (security)
## v1.8.4
* Kubernetes v1.8.4
* Calico related bug fixes
* Update Calico from v2.6.1 to v2.6.3
* Update flannel from v0.9.0 to v0.9.1
* Service accounts for kube-proxy and pod-checkpointer
* Use kubernetes-incubator/bootkube v0.9.0
## v1.8.3
* Kubernetes v1.8.3
* Run etcd on-host, across controllers
* Promote AWS platform to beta
* Use kubernetes-incubator/bootkube v0.8.2
#### Google Cloud
* Add required variable `region` (e.g. "us-central1")
* Reduce time to bootstrap a cluster
* Change etcd to run on-host, across controllers (etcd-member.service)
* Change controller instances to automatically span zones in the region
* Change worker managed instance group to automatically span zones in the region
* Improve internal firewall rules and use tag-based firewall policies
* Remove support for self-hosted etcd
* Remove the `zone` required variable
* Remove the `controller_preemptible` optional variable
#### AWS
* Promote AWS platform to beta
* Reduce time to bootstrap a cluster
* Change etcd to run on-host, across controllers (etcd-member.service)
* Fix firewall rules for multi-controller kubelet scraping and node-exporter
* Remove support for self-hosted etcd
#### Addons
* Add Prometheus 2.0 addon with alerting rules
* Add Grafana dashboard for observing metrics
## v1.8.2
* Kubernetes v1.8.2
* Fixes a memory leak in the v1.8.1 apiserver ([kubernetes#53485](https://github.com/kubernetes/kubernetes/issues/53485))
* Switch to using the `gcr.io/google_containers/hyperkube`
* Update flannel from v0.8.0 to v0.9.0
* Add `hairpinMode` to flannel CNI config
* Add `--no-negcache` to kube-dns dnsmasq
* Use kubernetes-incubator/bootkube v0.8.1
## v1.8.1
* Kubernetes v1.8.1
* Use kubernetes-incubator/bootkube v0.8.0
#### Digital Ocean
* Run etcd cluster across controller nodes (etcd-member.service)
* Remove support for self-hosted etcd
* Reduce time to bootstrap a cluster
## v1.7.7
* Kubernetes v1.7.7
* Use kubernetes-incubator/bootkube v0.7.0
* Update kube-dns to 1.14.5 to fix dnsmasq [vulnerability](https://security.googleblog.com/2017/10/behind-masq-yet-more-dns-and-dhcp.html)
* Calico v2.6.1
* flannel-cni v0.3.0
* Update flannel CNI config to fix hostPort
## v1.7.5
* Kubernetes v1.7.5
* Use kubernetes-incubator/bootkube v0.6.2
* Add AWS Terraform module (alpha)
* Add support for Calico networking (bare-metal, Google Cloud, AWS)
* Change networking default from "flannel" to "calico"
#### AWS
* Add `network_mtu` to allow CNI interface MTU customization
#### Bare-Metal
* Add `network_mtu` to allow CNI interface MTU customization
* Remove support for `experimental_self_hosted_etcd`
## v1.7.3
* Kubernetes v1.7.3
* Use kubernetes-incubator/bootkube v0.6.1
#### Digital Ocean
* Add cloud firewall rules (requires Terraform v0.10)
* Change nodes tags from strings to DO tags
## v1.7.1
* Kubernetes v1.7.1
* Use kubernetes-incubator/bootkube v0.6.0
* Add Bare-Metal Terraform module (stable)
* Add Digital Ocean Terraform module (beta)
#### Google Cloud
* Remove `k8s_domain_name` variable, `cluster_name` + `dns_zone` resolves to controllers
* Rename `dns_base_zone` to `dns_zone`
* Rename `dns_base_zone_name` to `dns_zone_name`
## v1.6.7
* Kubernetes v1.6.7
* Use kubernetes-incubator/bootkube v0.5.1
## v1.6.6
* Kubernetes v1.6.6
* Use kubernetes-incubator/bootkube v0.4.5
* Disable locksmithd on hosts, in favor of [CLUO](https://github.com/coreos/container-linux-update-operator).
## v1.6.4
* Kubernetes v1.6.4
* Add Google Cloud Terraform module (stable)
## Earlier
Earlier versions, back to v1.3.0, used different designs and mechanisms.

5
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,5 @@
# Contributing
## Developer Certificate of Origin
By contributing, you agree to the Linux Foundation's Developer Certificate of Origin ([DCO](DCO)). The DCO is a statement that you, the contributor, have the legal right to make your contribution and understand the contribution will be distributed as part of this project.

37
DCO Normal file
View File

@ -0,0 +1,37 @@
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

23
LICENSE Normal file
View File

@ -0,0 +1,23 @@
The MIT License (MIT)
Copyright (c) 2017 Typhoon Authors
Copyright (c) 2017 Dalton Hubble
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

138
README.md Normal file
View File

@ -0,0 +1,138 @@
# Typhoon [![IRC](https://img.shields.io/badge/freenode-%23typhoon-0099ef.svg)]() <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
Typhoon is a minimal and free Kubernetes distribution.
* Minimal, stable base Kubernetes distribution
* Declarative infrastructure and configuration
* [Free](#social-contract) (freedom and cost) and privacy-respecting
* Practical for labs, datacenters, and clouds
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.9.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Modules
Typhoon provides a Terraform Module for each supported operating system and platform.
| Platform | Operating System | Terraform Module | Status |
|---------------|------------------|------------------|--------|
| AWS | Container Linux | [aws/container-linux/kubernetes](aws/container-linux/kubernetes) | beta |
| Bare-Metal | Container Linux | [bare-metal/container-linux/kubernetes](bare-metal/container-linux/kubernetes) | stable |
| Digital Ocean | Container Linux | [digital-ocean/container-linux/kubernetes](digital-ocean/container-linux/kubernetes) | beta |
| Google Cloud | Container Linux | [google-cloud/container-linux/kubernetes](google-cloud/container-linux/kubernetes) | beta |
## Usage
* [Docs](https://typhoon.psdn.io)
* [Concepts](https://typhoon.psdn.io/concepts/)
* Tutorials
* [AWS](https://typhoon.psdn.io/aws/)
* [Bare-Metal](https://typhoon.psdn.io/bare-metal/)
* [Digital Ocean](https://typhoon.psdn.io/digital-ocean/)
* [Google-Cloud](https://typhoon.psdn.io/google-cloud/)
## Example
Define a Kubernetes cluster by using the Terraform module for your chosen platform and operating system. Here's a minimal example:
```tf
module "google-cloud-yavin" {
source = "git::https://github.com/poseidon/typhoon//google-cloud/container-linux/kubernetes"
providers = {
google = "google.default"
local = "local.default"
null = "null.default"
template = "template.default"
tls = "tls.default"
}
# Google Cloud
region = "us-central1"
dns_zone = "example.com"
dns_zone_name = "example-zone"
os_image = "coreos-stable-1576-5-0-v20180105"
cluster_name = "yavin"
controller_count = 1
worker_count = 2
ssh_authorized_key = "ssh-rsa AAAAB3Nz..."
# output assets dir
asset_dir = "/home/user/.secrets/clusters/yavin"
}
```
Fetch modules, plan the changes to be made, and apply the changes.
```sh
$ terraform init
$ terraform get --update
$ terraform plan
Plan: 37 to add, 0 to change, 0 to destroy.
$ terraform apply
Apply complete! Resources: 37 added, 0 changed, 0 destroyed.
```
In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a `yavin.example.com` DNS record to resolve to a network load balancer across controller nodes.
```sh
$ export KUBECONFIG=/home/user/.secrets/clusters/yavin/auth/kubeconfig
$ kubectl get nodes
NAME STATUS AGE VERSION
yavin-controller-0.c.example-com.internal Ready 6m v1.9.2
yavin-worker-jrbf.c.example-com.internal Ready 5m v1.9.2
yavin-worker-mzdm.c.example-com.internal Ready 5m v1.9.2
```
List the pods.
```
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-1cs8z 2/2 Running 0 6m
kube-system calico-node-d1l5b 2/2 Running 0 6m
kube-system calico-node-sp9ps 2/2 Running 0 6m
kube-system kube-apiserver-zppls 1/1 Running 0 6m
kube-system kube-controller-manager-3271970485-gh9kt 1/1 Running 0 6m
kube-system kube-controller-manager-3271970485-h90v8 1/1 Running 1 6m
kube-system kube-dns-1187388186-zj5dl 3/3 Running 0 6m
kube-system kube-proxy-117v6 1/1 Running 0 6m
kube-system kube-proxy-9886n 1/1 Running 0 6m
kube-system kube-proxy-njn47 1/1 Running 0 6m
kube-system kube-scheduler-3895335239-5x87r 1/1 Running 0 6m
kube-system kube-scheduler-3895335239-bzrrt 1/1 Running 1 6m
kube-system pod-checkpointer-l6lrt 1/1 Running 0 6m
```
## Non-Goals
Typhoon is strict about minimalism, maturity, and scope. These are not in scope:
* In-place Kubernetes Upgrades
* Adding every possible option
* Openstack or Mesos platforms
## Help
Ask questions on the IRC #typhoon channel on [freenode.net](http://freenode.net/).
## Background
Typhoon powers the author's cloud and colocation clusters. The project has evolved through operational experience and Kubernetes changes. Typhoon is shared under a free license to allow others to use the work freely and contribute to its upkeep.
Typhoon addresses real world needs, which you may share. It is honest about limitations or areas that aren't mature yet. It avoids buzzword bingo and hype. It does not aim to be the one-solution-fits-all distro. An ecosystem of free (or enterprise) Kubernetes distros is healthy.
## Social Contract
Typhoon is not a product, trial, or free-tier. It is not run by a company, does not offer support or services, and does not accept or make any money. It is not associated with any operating system or platform vendor.
Typhoon clusters will contain only [free](https://www.debian.org/intro/free) components. Cluster components will not collect data on users without their permission.
*Disclosure: The author works for CoreOS and previously wrote Matchbox and original Tectonic for bare-metal and AWS. This project is not associated with CoreOS.*

View File

@ -0,0 +1,12 @@
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: reboot-coordinator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: reboot-coordinator
subjects:
- kind: ServiceAccount
namespace: reboot-coordinator
name: default

View File

@ -0,0 +1,45 @@
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: reboot-coordinator
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- get
- update
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- delete
- apiGroups:
- "extensions"
resources:
- daemonsets
verbs:
- get

View File

@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: reboot-coordinator

View File

@ -0,0 +1,56 @@
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: container-linux-update-agent
namespace: reboot-coordinator
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
app: container-linux-update-agent
spec:
containers:
- name: update-agent
image: quay.io/coreos/container-linux-update-operator:v0.5.0
command:
- "/bin/update-agent"
volumeMounts:
- mountPath: /var/run/dbus
name: var-run-dbus
- mountPath: /etc/coreos
name: etc-coreos
- mountPath: /usr/share/coreos
name: usr-share-coreos
- mountPath: /etc/os-release
name: etc-os-release
env:
# read by update-agent as the node name to manage reboots for
- name: UPDATE_AGENT_NODE
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
volumes:
- name: var-run-dbus
hostPath:
path: /var/run/dbus
- name: etc-coreos
hostPath:
path: /etc/coreos
- name: usr-share-coreos
hostPath:
path: /usr/share/coreos
- name: etc-os-release
hostPath:
path: /etc/os-release

View File

@ -0,0 +1,26 @@
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: container-linux-update-operator
namespace: reboot-coordinator
spec:
replicas: 1
template:
metadata:
labels:
app: container-linux-update-operator
spec:
containers:
- name: update-operator
image: quay.io/coreos/container-linux-update-operator:v0.5.0
command:
- "/bin/update-operator"
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule

View File

@ -0,0 +1,32 @@
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kubernetes-dashboard
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
name: kubernetes-dashboard
phase: prod
spec:
containers:
- name: kubernetes-dashboard
image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.1
ports:
- name: http
containerPort: 9090
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 100Mi
livenessProbe:
httpGet:
path: /
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30

View File

@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: kubernetes-dashboard
namespace: kube-system
spec:
type: ClusterIP
selector:
name: kubernetes-dashboard
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 9090

View File

@ -0,0 +1,46 @@
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
name: grafana
phase: prod
template:
metadata:
labels:
name: grafana
phase: prod
spec:
containers:
- name: grafana
image: grafana/grafana:4.6.3
env:
- name: GF_SERVER_HTTP_PORT
value: "8080"
- name: GF_AUTH_BASIC_ENABLED
value: "false"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
value: Admin
ports:
- name: http
containerPort: 8080
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 200Mi
cpu: 200m
volumes:
- name: grafana-storage
emptyDir: {}

View File

@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
type: ClusterIP
selector:
name: grafana
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080

View File

@ -0,0 +1,61 @@
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: heapster
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
name: heapster
phase: prod
template:
metadata:
labels:
name: heapster
phase: prod
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
containers:
- name: heapster
image: gcr.io/google_containers/heapster-amd64:v1.5.0
command:
- /heapster
- --source=kubernetes.summary_api:''
livenessProbe:
httpGet:
path: /healthz
port: 8082
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 5
- name: heapster-nanny
image: gcr.io/google_containers/addon-resizer:1.7
command:
- /pod_nanny
- --cpu=80m
- --extra-cpu=0.5m
- --memory=140Mi
- --extra-memory=4Mi
- --threshold=5
- --deployment=heapster
- --container=heapster
- --poll-period=300000
- --estimator=exponential
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
limits:
cpu: 50m
memory: 90Mi
requests:
cpu: 50m
memory: 90Mi

View File

@ -0,0 +1,12 @@
apiVersion: v1
kind: Service
metadata:
name: heapster
namespace: kube-system
spec:
type: ClusterIP
selector:
name: heapster
ports:
- port: 80
targetPort: 8082

View File

@ -0,0 +1,36 @@
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: default-backend
namespace: ingress
spec:
replicas: 1
template:
metadata:
labels:
name: default-backend
phase: prod
spec:
containers:
- name: default-backend
# Any image is permissable as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: gcr.io/google_containers/defaultbackend:1.4
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
terminationGracePeriodSeconds: 60

View File

@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: default-backend
namespace: ingress
spec:
type: ClusterIP
selector:
name: default-backend
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080

View File

@ -0,0 +1,67 @@
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-ingress-controller
namespace: ingress
spec:
replicas: 2
strategy:
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
name: nginx-ingress-controller
phase: prod
spec:
nodeSelector:
node-role.kubernetes.io/node: ""
hostNetwork: true
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-backend
- --ingress-class=public
# use downward API
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: http
containerPort: 80
hostPort: 80
- name: https
containerPort: 443
hostPort: 443
- name: health
containerPort: 10254
hostPort: 10254
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
restartPolicy: Always
terminationGracePeriodSeconds: 60

View File

@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: ingress

View File

@ -0,0 +1,12 @@
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: ingress
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ingress
subjects:
- kind: ServiceAccount
namespace: ingress
name: default

View File

@ -0,0 +1,51 @@
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: ingress
rules:
- apiGroups:
- ""
resources:
- configmaps
- endpoints
- nodes
- pods
- secrets
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- apiGroups:
- ""
resources:
- services
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- "extensions"
resources:
- ingresses/status
verbs:
- update

View File

@ -0,0 +1,13 @@
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: ingress
namespace: ingress
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ingress
subjects:
- kind: ServiceAccount
namespace: ingress
name: default

View File

@ -0,0 +1,41 @@
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: ingress
namespace: ingress
rules:
- apiGroups:
- ""
resources:
- configmaps
- pods
- secrets
verbs:
- get
- apiGroups:
- ""
resources:
- configmaps
resourceNames:
# Defaults to "<election-id>-<ingress-class>"
# Here: "<ingress-controller-leader>-<nginx>"
# This has to be adapted if you change either parameter
# when launching the nginx-ingress-controller.
- "ingress-controller-leader-public"
verbs:
- get
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
- create
- update

View File

@ -0,0 +1,19 @@
apiVersion: v1
kind: Service
metadata:
name: nginx-ingress-controller
namespace: ingress
spec:
type: ClusterIP
selector:
name: nginx-ingress-controller
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
- name: https
protocol: TCP
port: 443
targetPort: 443

View File

@ -0,0 +1,67 @@
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: nginx-ingress-controller
namespace: ingress
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
name: nginx-ingress-controller
phase: prod
spec:
nodeSelector:
node-role.kubernetes.io/node: ""
hostNetwork: true
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-backend
- --ingress-class=public
# use downward API
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: http
containerPort: 80
hostPort: 80
- name: https
containerPort: 443
hostPort: 443
- name: health
containerPort: 10254
hostPort: 10254
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
restartPolicy: Always
terminationGracePeriodSeconds: 60

View File

@ -0,0 +1,36 @@
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: default-backend
namespace: ingress
spec:
replicas: 1
template:
metadata:
labels:
name: default-backend
phase: prod
spec:
containers:
- name: default-backend
# Any image is permissable as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: gcr.io/google_containers/defaultbackend:1.4
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
terminationGracePeriodSeconds: 60

View File

@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: default-backend
namespace: ingress
spec:
type: ClusterIP
selector:
name: default-backend
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080

View File

@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: ingress

View File

@ -0,0 +1,12 @@
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: ingress
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ingress
subjects:
- kind: ServiceAccount
namespace: ingress
name: default

View File

@ -0,0 +1,51 @@
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: ingress
rules:
- apiGroups:
- ""
resources:
- configmaps
- endpoints
- nodes
- pods
- secrets
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- apiGroups:
- ""
resources:
- services
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- "extensions"
resources:
- ingresses/status
verbs:
- update

View File

@ -0,0 +1,13 @@
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: ingress
namespace: ingress
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ingress
subjects:
- kind: ServiceAccount
namespace: ingress
name: default

View File

@ -0,0 +1,41 @@
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: ingress
namespace: ingress
rules:
- apiGroups:
- ""
resources:
- configmaps
- pods
- secrets
verbs:
- get
- apiGroups:
- ""
resources:
- configmaps
resourceNames:
# Defaults to "<election-id>-<ingress-class>"
# Here: "<ingress-controller-leader>-<nginx>"
# This has to be adapted if you change either parameter
# when launching the nginx-ingress-controller.
- "ingress-controller-leader-public"
verbs:
- get
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
- create
- update

View File

@ -0,0 +1,19 @@
apiVersion: v1
kind: Service
metadata:
name: nginx-ingress-controller
namespace: ingress
spec:
type: ClusterIP
selector:
name: nginx-ingress-controller
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
- name: https
protocol: TCP
port: 443
targetPort: 443

View File

@ -0,0 +1,36 @@
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: default-backend
namespace: ingress
spec:
replicas: 1
template:
metadata:
labels:
name: default-backend
phase: prod
spec:
containers:
- name: default-backend
# Any image is permissable as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: gcr.io/google_containers/defaultbackend:1.4
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
terminationGracePeriodSeconds: 60

View File

@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: default-backend
namespace: ingress
spec:
type: ClusterIP
selector:
name: default-backend
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 8080

View File

@ -0,0 +1,67 @@
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-ingress-controller
namespace: ingress
spec:
replicas: 2
strategy:
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
name: nginx-ingress-controller
phase: prod
spec:
nodeSelector:
node-role.kubernetes.io/node: ""
hostNetwork: true
containers:
- name: nginx-ingress-controller
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0
args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-backend
- --ingress-class=public
# use downward API
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: http
containerPort: 80
hostPort: 80
- name: https
containerPort: 443
hostPort: 443
- name: health
containerPort: 10254
hostPort: 10254
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
restartPolicy: Always
terminationGracePeriodSeconds: 60

View File

@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: ingress

View File

@ -0,0 +1,12 @@
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: ingress
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ingress
subjects:
- kind: ServiceAccount
namespace: ingress
name: default

View File

@ -0,0 +1,51 @@
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: ingress
rules:
- apiGroups:
- ""
resources:
- configmaps
- endpoints
- nodes
- pods
- secrets
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- apiGroups:
- ""
resources:
- services
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- "extensions"
resources:
- ingresses/status
verbs:
- update

View File

@ -0,0 +1,13 @@
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: ingress
namespace: ingress
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ingress
subjects:
- kind: ServiceAccount
namespace: ingress
name: default

View File

@ -0,0 +1,41 @@
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: ingress
namespace: ingress
rules:
- apiGroups:
- ""
resources:
- configmaps
- pods
- secrets
verbs:
- get
- apiGroups:
- ""
resources:
- configmaps
resourceNames:
# Defaults to "<election-id>-<ingress-class>"
# Here: "<ingress-controller-leader>-<nginx>"
# This has to be adapted if you change either parameter
# when launching the nginx-ingress-controller.
- "ingress-controller-leader-public"
verbs:
- get
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
- create
- update

View File

@ -0,0 +1,19 @@
apiVersion: v1
kind: Service
metadata:
name: nginx-ingress-controller
namespace: ingress
spec:
type: ClusterIP
selector:
name: nginx-ingress-controller
phase: prod
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
- name: https
protocol: TCP
port: 443
targetPort: 443

View File

@ -0,0 +1,226 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yaml: |-
# Global config
global:
scrape_interval: 15s
# AlertManager
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
# Scrape configs for running Prometheus on a Kubernetes cluster.
# This uses separate scrape configs for cluster components (i.e. API server, node)
# and services to allow each to use different authentication configs.
#
# Kubernetes labels will be added as Prometheus labels on metrics via the
# `labelmap` relabeling action.
scrape_configs:
# Scrape config for API servers.
#
# Kubernetes exposes API servers as endpoints to the default/kubernetes
# service so this uses `endpoints` role and uses relabelling to only keep
# the endpoints associated with the default/kubernetes service using the
# default named port `https`. This works for single API server deployments as
# well as HA API server deployments.
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# Using endpoints to discover kube-apiserver targets finds the pod IP
# (host IP since apiserver is uses host network) which is not used in
# the server certificate.
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
# Keep only the default/kubernetes service endpoints for the https port. This
# will add targets for each API server which Kubernetes adds an endpoint to
# the default/kubernetes service.
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
# metrics from a node by scraping kubelet (127.0.0.1:10255/metrics).
#
# Rather than connecting directly to the node, the scrape is proxied though the
# Kubernetes apiserver. This means it will work if Prometheus is running out of
# cluster, or can't connect to nodes for some other reason (e.g. because of
# firewalling).
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# Scrape config for Kubelet cAdvisor. Explore metrics from a node by
# scraping kubelet (127.0.0.1:10255/metrics/cadvisor).
#
# This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
# (those whose names begin with 'container_') have been removed from the
# Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to
# retrieve those metrics.
#
# Rather than connecting directly to the node, the scrape is proxied though the
# Kubernetes apiserver. This means it will work if Prometheus is running out of
# cluster, or can't connect to nodes for some other reason (e.g. because of
# firewalling).
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
# Scrape config for service endpoints.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/scrape`: Only scrape services that have a value of `true`
# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
# to set this to `https` & most likely set the `tls_config` of the scrape config.
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately.
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
# Example scrape config for probing services via the Blackbox Exporter.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
# Example scrape config for pods
#
# The relabeling allows the actual pod scrape endpoint to be configured via the
# following annotations:
#
# * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
# pod's declared ports (default is a port-free target if none are declared).
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# Rule files
rule_files:
- "/etc/prometheus/rules/*.rules"
- "/etc/prometheus/rules/*.yaml"
- "/etc/prometheus/rules/*.yml"

View File

@ -0,0 +1,43 @@
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 1
strategy:
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
name: prometheus
phase: prod
spec:
containers:
- name: prometheus
image: quay.io/prometheus/prometheus:v2.0.0
args:
- '--config.file=/etc/prometheus/prometheus.yaml'
ports:
- name: web
containerPort: 9090
volumeMounts:
- name: config
mountPath: /etc/prometheus
- name: rules
mountPath: /etc/prometheus/rules
- name: data
mountPath: /var/lib/prometheus
dnsPolicy: ClusterFirst
restartPolicy: Always
terminationGracePeriodSeconds: 30
volumes:
- name: config
configMap:
name: prometheus-config
- name: rules
configMap:
name: prometheus-rules
- name: data
emptyDir: {}

View File

@ -0,0 +1,18 @@
apiVersion: v1
kind: Service
metadata:
name: kube-controller-manager
namespace: kube-system
annotations:
prometheus.io/scrape: 'true'
spec:
type: ClusterIP
# service is created to allow prometheus to scrape endpoints
clusterIP: None
selector:
k8s-app: kube-controller-manager
ports:
- name: metrics
protocol: TCP
port: 10252
targetPort: 10252

View File

@ -0,0 +1,18 @@
apiVersion: v1
kind: Service
metadata:
name: kube-scheduler
namespace: kube-system
annotations:
prometheus.io/scrape: 'true'
spec:
type: ClusterIP
# service is created to allow prometheus to scrape endpoints
clusterIP: None
selector:
k8s-app: kube-scheduler
ports:
- name: metrics
protocol: TCP
port: 10251
targetPort: 10251

View File

@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: monitoring

View File

@ -0,0 +1,38 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]

View File

@ -0,0 +1,61 @@
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: kube-state-metrics
namespace: monitoring
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
name: kube-state-metrics
phase: prod
template:
metadata:
labels:
name: kube-state-metrics
phase: prod
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.2.0
ports:
- name: metrics
containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
- name: addon-resizer
image: gcr.io/google_containers/addon-resizer:1.0
resources:
limits:
cpu: 100m
memory: 30Mi
requests:
cpu: 100m
memory: 30Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --container=kube-state-metrics
- --cpu=100m
- --extra-cpu=1m
- --memory=100Mi
- --extra-memory=2Mi
- --threshold=5
- --deployment=kube-state-metrics

View File

@ -0,0 +1,13 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kube-state-metrics
namespace: monitoring
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: monitoring

View File

@ -0,0 +1,15 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kube-state-metrics-resizer
namespace: monitoring
rules:
- apiGroups: [""]
resources:
- pods
verbs: ["get"]
- apiGroups: ["extensions"]
resources:
- deployments
resourceNames: ["kube-state-metrics"]
verbs: ["get", "update"]

View File

@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: monitoring

View File

@ -0,0 +1,19 @@
apiVersion: v1
kind: Service
metadata:
name: kube-state-metrics
namespace: monitoring
annotations:
prometheus.io/scrape: 'true'
spec:
type: ClusterIP
# service is created to allow prometheus to scape endpoints
clusterIP: None
selector:
name: kube-state-metrics
phase: prod
ports:
- name: metrics
protocol: TCP
port: 80
targetPort: 8080

View File

@ -0,0 +1,57 @@
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
name: node-exporter
phase: prod
template:
metadata:
labels:
name: node-exporter
phase: prod
spec:
hostNetwork: true
hostPID: true
containers:
- name: node-exporter
image: quay.io/prometheus/node-exporter:v0.15.0
args:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
ports:
- name: metrics
containerPort: 9100
hostPort: 9100
resources:
requests:
memory: 30Mi
cpu: 100m
limits:
memory: 50Mi
cpu: 200m
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys

View File

@ -0,0 +1,19 @@
apiVersion: v1
kind: Service
metadata:
name: node-exporter
namespace: monitoring
annotations:
prometheus.io/scrape: 'true'
spec:
type: ClusterIP
# service is created to allow prometheus to scape endpoints
clusterIP: None
selector:
name: node-exporter
phase: prod
ports:
- name: metrics
protocol: TCP
port: 80
targetPort: 9100

View File

@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: monitoring

View File

@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: default
namespace: monitoring

View File

@ -0,0 +1,15 @@
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]

View File

@ -0,0 +1,546 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: monitoring
data:
# Rules adapted from those provided by coreos/prometheus-operator and SoundCloud
alertmanager.rules.yaml: |+
groups:
- name: alertmanager.rules
rules:
- alert: AlertmanagerConfigInconsistent
expr: count_values("config_hash", alertmanager_config_hash) BY (service) / ON(service)
GROUP_LEFT() label_replace(prometheus_operator_alertmanager_spec_replicas, "service",
"alertmanager-$1", "alertmanager", "(.*)") != 1
for: 5m
labels:
severity: critical
annotations:
description: The configuration of the instances of the Alertmanager cluster
`{{$labels.service}}` are out of sync.
- alert: AlertmanagerDownOrMissing
expr: label_replace(prometheus_operator_alertmanager_spec_replicas, "job", "alertmanager-$1",
"alertmanager", "(.*)") / ON(job) GROUP_RIGHT() sum(up) BY (job) != 1
for: 5m
labels:
severity: warning
annotations:
description: An unexpected number of Alertmanagers are scraped or Alertmanagers
disappeared from discovery.
- alert: AlertmanagerFailedReload
expr: alertmanager_config_last_reload_successful == 0
for: 10m
labels:
severity: warning
annotations:
description: Reloading Alertmanager's configuration has failed for {{ $labels.namespace
}}/{{ $labels.pod}}.
etcd3.rules.yaml: |+
groups:
- name: ./etcd3.rules
rules:
- alert: InsufficientMembers
expr: count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1)
for: 3m
labels:
severity: critical
annotations:
description: If one more etcd member goes down the cluster will be unavailable
summary: etcd cluster insufficient members
- alert: NoLeader
expr: etcd_server_has_leader{job="etcd"} == 0
for: 1m
labels:
severity: critical
annotations:
description: etcd member {{ $labels.instance }} has no leader
summary: etcd member has no leader
- alert: HighNumberOfLeaderChanges
expr: increase(etcd_server_leader_changes_seen_total{job="etcd"}[1h]) > 3
labels:
severity: warning
annotations:
description: etcd instance {{ $labels.instance }} has seen {{ $value }} leader
changes within the last hour
summary: a high number of leader changes within the etcd cluster are happening
- alert: HighNumberOfFailedGRPCRequests
expr: sum(rate(etcd_grpc_requests_failed_total{job="etcd"}[5m])) BY (grpc_method)
/ sum(rate(etcd_grpc_total{job="etcd"}[5m])) BY (grpc_method) > 0.01
for: 10m
labels:
severity: warning
annotations:
description: '{{ $value }}% of requests for {{ $labels.grpc_method }} failed
on etcd instance {{ $labels.instance }}'
summary: a high number of gRPC requests are failing
- alert: HighNumberOfFailedGRPCRequests
expr: sum(rate(etcd_grpc_requests_failed_total{job="etcd"}[5m])) BY (grpc_method)
/ sum(rate(etcd_grpc_total{job="etcd"}[5m])) BY (grpc_method) > 0.05
for: 5m
labels:
severity: critical
annotations:
description: '{{ $value }}% of requests for {{ $labels.grpc_method }} failed
on etcd instance {{ $labels.instance }}'
summary: a high number of gRPC requests are failing
- alert: GRPCRequestsSlow
expr: histogram_quantile(0.99, rate(etcd_grpc_unary_requests_duration_seconds_bucket[5m]))
> 0.15
for: 10m
labels:
severity: critical
annotations:
description: on etcd instance {{ $labels.instance }} gRPC requests to {{ $labels.grpc_method
}} are slow
summary: slow gRPC requests
- alert: HighNumberOfFailedHTTPRequests
expr: sum(rate(etcd_http_failed_total{job="etcd"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job="etcd"}[5m]))
BY (method) > 0.01
for: 10m
labels:
severity: warning
annotations:
description: '{{ $value }}% of requests for {{ $labels.method }} failed on etcd
instance {{ $labels.instance }}'
summary: a high number of HTTP requests are failing
- alert: HighNumberOfFailedHTTPRequests
expr: sum(rate(etcd_http_failed_total{job="etcd"}[5m])) BY (method) / sum(rate(etcd_http_received_total{job="etcd"}[5m]))
BY (method) > 0.05
for: 5m
labels:
severity: critical
annotations:
description: '{{ $value }}% of requests for {{ $labels.method }} failed on etcd
instance {{ $labels.instance }}'
summary: a high number of HTTP requests are failing
- alert: HTTPRequestsSlow
expr: histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[5m]))
> 0.15
for: 10m
labels:
severity: warning
annotations:
description: on etcd instance {{ $labels.instance }} HTTP requests to {{ $labels.method
}} are slow
summary: slow HTTP requests
- alert: EtcdMemberCommunicationSlow
expr: histogram_quantile(0.99, rate(etcd_network_member_round_trip_time_seconds_bucket[5m]))
> 0.15
for: 10m
labels:
severity: warning
annotations:
description: etcd instance {{ $labels.instance }} member communication with
{{ $labels.To }} is slow
summary: etcd member communication is slow
- alert: HighNumberOfFailedProposals
expr: increase(etcd_server_proposals_failed_total{job="etcd"}[1h]) > 5
labels:
severity: warning
annotations:
description: etcd instance {{ $labels.instance }} has seen {{ $value }} proposal
failures within the last hour
summary: a high number of proposals within the etcd cluster are failing
- alert: HighFsyncDurations
expr: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m]))
> 0.5
for: 10m
labels:
severity: warning
annotations:
description: etcd instance {{ $labels.instance }} fync durations are high
summary: high fsync durations
- alert: HighCommitDurations
expr: histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[5m]))
> 0.25
for: 10m
labels:
severity: warning
annotations:
description: etcd instance {{ $labels.instance }} commit durations are high
summary: high commit durations
general.rules.yaml: |+
groups:
- name: general.rules
rules:
- alert: TargetDown
expr: 100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10
for: 10m
labels:
severity: warning
annotations:
description: '{{ $value }}% of {{ $labels.job }} targets are down.'
summary: Targets are down
- record: fd_utilization
expr: process_open_fds / process_max_fds
- alert: FdExhaustionClose
expr: predict_linear(fd_utilization[1h], 3600 * 4) > 1
for: 10m
labels:
severity: warning
annotations:
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
will exhaust in file/socket descriptors within the next 4 hours'
summary: file descriptors soon exhausted
- alert: FdExhaustionClose
expr: predict_linear(fd_utilization[10m], 3600) > 1
for: 10m
labels:
severity: critical
annotations:
description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance
will exhaust in file/socket descriptors within the next hour'
summary: file descriptors soon exhausted
kube-controller-manager.rules.yaml: |+
groups:
- name: kube-controller-manager.rules
rules:
- alert: K8SControllerManagerDown
expr: absent(up{kubernetes_name="kube-controller-manager"} == 1)
for: 5m
labels:
severity: critical
annotations:
description: There is no running K8S controller manager. Deployments and replication
controllers are not making progress.
summary: Controller manager is down
kube-scheduler.rules.yaml: |+
groups:
- name: kube-scheduler.rules
rules:
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
expr: histogram_quantile(0.99, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
BY (le, cluster)) / 1e+06
labels:
quantile: "0.99"
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
expr: histogram_quantile(0.9, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
BY (le, cluster)) / 1e+06
labels:
quantile: "0.9"
- record: cluster:scheduler_e2e_scheduling_latency_seconds:quantile
expr: histogram_quantile(0.5, sum(scheduler_e2e_scheduling_latency_microseconds_bucket)
BY (le, cluster)) / 1e+06
labels:
quantile: "0.5"
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
expr: histogram_quantile(0.99, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
BY (le, cluster)) / 1e+06
labels:
quantile: "0.99"
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
expr: histogram_quantile(0.9, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
BY (le, cluster)) / 1e+06
labels:
quantile: "0.9"
- record: cluster:scheduler_scheduling_algorithm_latency_seconds:quantile
expr: histogram_quantile(0.5, sum(scheduler_scheduling_algorithm_latency_microseconds_bucket)
BY (le, cluster)) / 1e+06
labels:
quantile: "0.5"
- record: cluster:scheduler_binding_latency_seconds:quantile
expr: histogram_quantile(0.99, sum(scheduler_binding_latency_microseconds_bucket)
BY (le, cluster)) / 1e+06
labels:
quantile: "0.99"
- record: cluster:scheduler_binding_latency_seconds:quantile
expr: histogram_quantile(0.9, sum(scheduler_binding_latency_microseconds_bucket)
BY (le, cluster)) / 1e+06
labels:
quantile: "0.9"
- record: cluster:scheduler_binding_latency_seconds:quantile
expr: histogram_quantile(0.5, sum(scheduler_binding_latency_microseconds_bucket)
BY (le, cluster)) / 1e+06
labels:
quantile: "0.5"
- alert: K8SSchedulerDown
expr: absent(up{kubernetes_name="kube-scheduler"} == 1)
for: 5m
labels:
severity: critical
annotations:
description: There is no running K8S scheduler. New pods are not being assigned
to nodes.
summary: Scheduler is down
kube-state-metrics.rules.yaml: |+
groups:
- name: kube-state-metrics.rules
rules:
- alert: DeploymentGenerationMismatch
expr: kube_deployment_status_observed_generation != kube_deployment_metadata_generation
for: 15m
labels:
severity: warning
annotations:
description: Observed deployment generation does not match expected one for
deployment {{$labels.namespaces}}{{$labels.deployment}}
- alert: DeploymentReplicasNotUpdated
expr: ((kube_deployment_status_replicas_updated != kube_deployment_spec_replicas)
or (kube_deployment_status_replicas_available != kube_deployment_spec_replicas))
unless (kube_deployment_spec_paused == 1)
for: 15m
labels:
severity: warning
annotations:
description: Replicas are not updated and available for deployment {{$labels.namespaces}}/{{$labels.deployment}}
- alert: DaemonSetRolloutStuck
expr: kube_daemonset_status_current_number_ready / kube_daemonset_status_desired_number_scheduled
* 100 < 100
for: 15m
labels:
severity: warning
annotations:
description: Only {{$value}}% of desired pods scheduled and ready for daemon
set {{$labels.namespaces}}/{{$labels.daemonset}}
- alert: K8SDaemonSetsNotScheduled
expr: kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled
> 0
for: 10m
labels:
severity: warning
annotations:
description: A number of daemonsets are not scheduled.
summary: Daemonsets are not scheduled correctly
- alert: DaemonSetsMissScheduled
expr: kube_daemonset_status_number_misscheduled > 0
for: 10m
labels:
severity: warning
annotations:
description: A number of daemonsets are running where they are not supposed
to run.
summary: Daemonsets are not scheduled correctly
- alert: PodFrequentlyRestarting
expr: increase(kube_pod_container_status_restarts[1h]) > 5
for: 10m
labels:
severity: warning
annotations:
description: Pod {{$labels.namespaces}}/{{$labels.pod}} is was restarted {{$value}}
times within the last hour
kubelet.rules.yaml: |+
groups:
- name: kubelet.rules
rules:
- alert: K8SNodeNotReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 1h
labels:
severity: warning
annotations:
description: The Kubelet on {{ $labels.node }} has not checked in with the API,
or has set itself to NotReady, for more than an hour
summary: Node status is NotReady
- alert: K8SManyNodesNotReady
expr: count(kube_node_status_condition{condition="Ready",status="true"} == 0)
> 1 and (count(kube_node_status_condition{condition="Ready",status="true"} ==
0) / count(kube_node_status_condition{condition="Ready",status="true"})) > 0.2
for: 1m
labels:
severity: critical
annotations:
description: '{{ $value }}% of Kubernetes nodes are not ready'
- alert: K8SKubeletDown
expr: count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"}) * 100 > 3
for: 1h
labels:
severity: warning
annotations:
description: Prometheus failed to scrape {{ $value }}% of kubelets.
- alert: K8SKubeletDown
expr: (absent(up{job="kubernetes-nodes"} == 1) or count(up{job="kubernetes-nodes"} == 0) / count(up{job="kubernetes-nodes"}))
* 100 > 1
for: 1h
labels:
severity: critical
annotations:
description: Prometheus failed to scrape {{ $value }}% of kubelets, or all Kubelets
have disappeared from service discovery.
summary: Many Kubelets cannot be scraped
- alert: K8SKubeletTooManyPods
expr: kubelet_running_pod_count > 100
for: 10m
labels:
severity: warning
annotations:
description: Kubelet {{$labels.instance}} is running {{$value}} pods, close
to the limit of 110
summary: Kubelet is close to pod limit
kubernetes.rules.yaml: |+
groups:
- name: kubernetes.rules
rules:
- record: pod_name:container_memory_usage_bytes:sum
expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
(pod_name)
- record: pod_name:container_spec_cpu_shares:sum
expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) BY (pod_name)
- record: pod_name:container_cpu_usage:sum
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
BY (pod_name)
- record: pod_name:container_fs_usage_bytes:sum
expr: sum(container_fs_usage_bytes{container_name!="POD",pod_name!=""}) BY (pod_name)
- record: namespace:container_memory_usage_bytes:sum
expr: sum(container_memory_usage_bytes{container_name!=""}) BY (namespace)
- record: namespace:container_spec_cpu_shares:sum
expr: sum(container_spec_cpu_shares{container_name!=""}) BY (namespace)
- record: namespace:container_cpu_usage:sum
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m]))
BY (namespace)
- record: cluster:memory_usage:ratio
expr: sum(container_memory_usage_bytes{container_name!="POD",pod_name!=""}) BY
(cluster) / sum(machine_memory_bytes) BY (cluster)
- record: cluster:container_spec_cpu_shares:ratio
expr: sum(container_spec_cpu_shares{container_name!="POD",pod_name!=""}) / 1000
/ sum(machine_cpu_cores)
- record: cluster:container_cpu_usage:ratio
expr: sum(rate(container_cpu_usage_seconds_total{container_name!="POD",pod_name!=""}[5m]))
/ sum(machine_cpu_cores)
- record: apiserver_latency_seconds:quantile
expr: histogram_quantile(0.99, rate(apiserver_request_latencies_bucket[5m])) /
1e+06
labels:
quantile: "0.99"
- record: apiserver_latency:quantile_seconds
expr: histogram_quantile(0.9, rate(apiserver_request_latencies_bucket[5m])) /
1e+06
labels:
quantile: "0.9"
- record: apiserver_latency_seconds:quantile
expr: histogram_quantile(0.5, rate(apiserver_request_latencies_bucket[5m])) /
1e+06
labels:
quantile: "0.5"
- alert: APIServerLatencyHigh
expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
> 1
for: 10m
labels:
severity: warning
annotations:
description: the API server has a 99th percentile latency of {{ $value }} seconds
for {{$labels.verb}} {{$labels.resource}}
- alert: APIServerLatencyHigh
expr: apiserver_latency_seconds:quantile{quantile="0.99",subresource!="log",verb!~"^(?:WATCH|WATCHLIST|PROXY|CONNECT)$"}
> 4
for: 10m
labels:
severity: critical
annotations:
description: the API server has a 99th percentile latency of {{ $value }} seconds
for {{$labels.verb}} {{$labels.resource}}
- alert: APIServerErrorsHigh
expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
* 100 > 2
for: 10m
labels:
severity: warning
annotations:
description: API server returns errors for {{ $value }}% of requests
- alert: APIServerErrorsHigh
expr: rate(apiserver_request_count{code=~"^(?:5..)$"}[5m]) / rate(apiserver_request_count[5m])
* 100 > 5
for: 10m
labels:
severity: critical
annotations:
description: API server returns errors for {{ $value }}% of requests
- alert: K8SApiserverDown
expr: absent(up{job="kubernetes-apiservers"} == 1)
for: 20m
labels:
severity: critical
annotations:
description: No API servers are reachable or all have disappeared from service
discovery
node.rules.yaml: |+
groups:
- name: node.rules
rules:
- record: instance:node_cpu:rate:sum
expr: sum(rate(node_cpu{mode!="idle",mode!="iowait",mode!~"^(?:guest.*)$"}[3m]))
BY (instance)
- record: instance:node_filesystem_usage:sum
expr: sum((node_filesystem_size{mountpoint="/"} - node_filesystem_free{mountpoint="/"}))
BY (instance)
- record: instance:node_network_receive_bytes:rate:sum
expr: sum(rate(node_network_receive_bytes[3m])) BY (instance)
- record: instance:node_network_transmit_bytes:rate:sum
expr: sum(rate(node_network_transmit_bytes[3m])) BY (instance)
- record: instance:node_cpu:ratio
expr: sum(rate(node_cpu{mode!="idle"}[5m])) WITHOUT (cpu, mode) / ON(instance)
GROUP_LEFT() count(sum(node_cpu) BY (instance, cpu)) BY (instance)
- record: cluster:node_cpu:sum_rate5m
expr: sum(rate(node_cpu{mode!="idle"}[5m]))
- record: cluster:node_cpu:ratio
expr: cluster:node_cpu:rate5m / count(sum(node_cpu) BY (instance, cpu))
- alert: NodeExporterDown
expr: absent(up{kubernetes_name="node-exporter"} == 1)
for: 10m
labels:
severity: warning
annotations:
description: Prometheus could not scrape a node-exporter for more than 10m,
or node-exporters have disappeared from discovery
- alert: NodeDiskRunningFull
expr: predict_linear(node_filesystem_free[6h], 3600 * 24) < 0
for: 30m
labels:
severity: warning
annotations:
description: device {{$labels.device}} on node {{$labels.instance}} is running
full within the next 24 hours (mounted at {{$labels.mountpoint}})
- alert: NodeDiskRunningFull
expr: predict_linear(node_filesystem_free[30m], 3600 * 2) < 0
for: 10m
labels:
severity: critical
annotations:
description: device {{$labels.device}} on node {{$labels.instance}} is running
full within the next 2 hours (mounted at {{$labels.mountpoint}})
prometheus.rules.yaml: |+
groups:
- name: prometheus.rules
rules:
- alert: PrometheusConfigReloadFailed
expr: prometheus_config_last_reload_successful == 0
for: 10m
labels:
severity: warning
annotations:
description: Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}}
- alert: PrometheusNotificationQueueRunningFull
expr: predict_linear(prometheus_notifications_queue_length[5m], 60 * 30) > prometheus_notifications_queue_capacity
for: 10m
labels:
severity: warning
annotations:
description: Prometheus' alert notification queue is running full for {{$labels.namespace}}/{{
$labels.pod}}
- alert: PrometheusErrorSendingAlerts
expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
> 0.01
for: 10m
labels:
severity: warning
annotations:
description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
$labels.pod}} to Alertmanager {{$labels.Alertmanager}}
- alert: PrometheusErrorSendingAlerts
expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])
> 0.03
for: 10m
labels:
severity: critical
annotations:
description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{
$labels.pod}} to Alertmanager {{$labels.Alertmanager}}
- alert: PrometheusNotConnectedToAlertmanagers
expr: prometheus_notifications_alertmanagers_discovered < 1
for: 10m
labels:
severity: warning
annotations:
description: Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected
to any Alertmanagers

View File

@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
spec:
type: ClusterIP
selector:
name: prometheus
phase: prod
ports:
- name: web
protocol: TCP
port: 80
targetPort: 9090

View File

@ -0,0 +1,23 @@
The MIT License (MIT)
Copyright (c) 2017 Typhoon Authors
Copyright (c) 2017 Dalton Hubble
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@ -0,0 +1,22 @@
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
Typhoon is a minimal and free Kubernetes distribution.
* Minimal, stable base Kubernetes distribution
* Declarative infrastructure and configuration
* Free (freedom and cost) and privacy-respecting
* Practical for labs, datacenters, and clouds
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.9.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs
Please see the [official docs](https://typhoon.psdn.io) and the AWS [tutorial](https://typhoon.psdn.io/aws/).

View File

@ -0,0 +1,19 @@
data "aws_ami" "coreos" {
most_recent = true
owners = ["595879546273"]
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "name"
values = ["CoreOS-${var.os_channel}-*"]
}
}

View File

@ -0,0 +1,14 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.10.0"
cluster_name = "${var.cluster_name}"
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
etcd_servers = ["${aws_route53_record.etcds.*.fqdn}"]
asset_dir = "${var.asset_dir}"
networking = "${var.networking}"
network_mtu = "${var.network_mtu}"
pod_cidr = "${var.pod_cidr}"
service_cidr = "${var.service_cidr}"
cluster_domain_suffix = "${var.cluster_domain_suffix}"
}

View File

@ -1,15 +1,33 @@
---
systemd:
units:
- name: etcd-member.service
enable: true
dropins:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.2.14"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380"
Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}"
Environment="ETCD_STRICT_RECONFIG_CHECK=true"
Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt"
Environment="ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt"
Environment="ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key"
Environment="ETCD_CLIENT_CERT_AUTH=true"
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt"
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt"
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key"
Environment="ETCD_PEER_CLIENT_CERT_AUTH=true"
- name: docker.service
enable: true
- name: locksmithd.service
dropins:
- name: 40-etcd-lock.conf
contents: |
[Service]
Environment="REBOOT_STRATEGY=etcd-lock"
Environment="LOCKSMITHD_ENDPOINT=http://${k8s_etcd_service_ip}:2379"
mask: true
- name: wait-for-dns.service
enable: true
contents: |
@ -23,43 +41,48 @@ systemd:
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
enable: true
contents: |
[Unit]
Description=Kubelet via Hyperkube ACI
Description=Kubelet via Hyperkube
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/run/kubelet-pod.uuid \
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log"
--mount volume=var-log,target=/var/log \
--insecure-options=image"
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--kubeconfig=/etc/kubernetes/kubeconfig \
--require-kubeconfig \
--client-ca-file=/etc/kubernetes/ca.crt \
--anonymous-auth=false \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--network-plugin=cni \
--lock-file=/var/run/lock/kubelet.lock \
--exit-on-lock-contention \
--pod-manifest-path=/etc/kubernetes/manifests \
--allow-privileged \
--node-labels=node-role.kubernetes.io/master \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
--anonymous-auth=false \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${k8s_dns_service_ip} \
--cluster_domain=cluster.local
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/run/kubelet-pod.uuid
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/master \
--pod-manifest-path=/etc/kubernetes/manifests \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=10
[Install]
@ -105,8 +128,8 @@ storage:
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=quay.io/coreos/hyperkube
KUBELET_IMAGE_TAG=v1.6.4_coreos.0
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
KUBELET_IMAGE_TAG=v1.9.2
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
@ -125,10 +148,9 @@ storage:
# Wrapper for bootkube start
set -e
# Move experimental manifests
[ -d /opt/bootkube/assets/experimental/manifests ] && mv /opt/bootkube/assets/experimental/manifests/* /opt/bootkube/assets/manifests && rm -r /opt/bootkube/assets/experimental/manifests
[ -d /opt/bootkube/assets/experimental/bootstrap-manifests ] && mv /opt/bootkube/assets/experimental/bootstrap-manifests/* /opt/bootkube/assets/bootstrap-manifests && rm -r /opt/bootkube/assets/experimental/bootstrap-manifests
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.4.4}"
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.10.0}"
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
exec /usr/bin/rkt run \
--trust-keys-from-https \
@ -145,4 +167,4 @@ passwd:
users:
- name: core
ssh_authorized_keys:
- "${ssh_authorized_keys}"
- "${ssh_authorized_key}"

View File

@ -4,12 +4,7 @@ systemd:
- name: docker.service
enable: true
- name: locksmithd.service
dropins:
- name: 40-etcd-lock.conf
contents: |
[Service]
Environment="REBOOT_STRATEGY=etcd-lock"
Environment="LOCKSMITHD_ENDPOINT=http://${k8s_etcd_service_ip}:2379"
mask: true
- name: wait-for-dns.service
enable: true
contents: |
@ -27,38 +22,42 @@ systemd:
enable: true
contents: |
[Unit]
Description=Kubelet via Hyperkube ACI
Description=Kubelet via Hyperkube
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/run/kubelet-pod.uuid \
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log"
--mount volume=var-log,target=/var/log \
--insecure-options=image"
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--kubeconfig=/etc/kubernetes/kubeconfig \
--require-kubeconfig \
--client-ca-file=/etc/kubernetes/ca.crt \
--anonymous-auth=false \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--network-plugin=cni \
--lock-file=/var/run/lock/kubelet.lock \
--exit-on-lock-contention \
--pod-manifest-path=/etc/kubernetes/manifests \
--allow-privileged \
--node-labels=node-role.kubernetes.io/node \
--anonymous-auth=false \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${k8s_dns_service_ip} \
--cluster_domain=cluster.local
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/run/kubelet-pod.uuid
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/node \
--pod-manifest-path=/etc/kubernetes/manifests
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=5
[Install]
@ -103,8 +102,8 @@ storage:
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=quay.io/coreos/hyperkube
KUBELET_IMAGE_TAG=v1.6.4_coreos.0
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
KUBELET_IMAGE_TAG=v1.9.2
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
@ -121,7 +120,8 @@ storage:
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
quay.io/coreos/hyperkube:v1.6.4_coreos.0 \
--insecure-options=image \
docker://gcr.io/google_containers/hyperkube:v1.9.2 \
--net=host \
--dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)

View File

@ -0,0 +1,262 @@
# Discrete DNS records for each controller's private IPv4 for etcd usage
resource "aws_route53_record" "etcds" {
count = "${var.controller_count}"
# DNS Zone where record should be created
zone_id = "${var.dns_zone_id}"
name = "${format("%s-etcd%d.%s.", var.cluster_name, count.index, var.dns_zone)}"
type = "A"
ttl = 300
# private IPv4 address for etcd
records = ["${element(aws_instance.controllers.*.private_ip, count.index)}"]
}
# Controller instances
resource "aws_instance" "controllers" {
count = "${var.controller_count}"
tags = {
Name = "${var.cluster_name}-controller-${count.index}"
}
instance_type = "${var.controller_type}"
ami = "${data.aws_ami.coreos.image_id}"
user_data = "${element(data.ct_config.controller_ign.*.rendered, count.index)}"
# storage
root_block_device {
volume_type = "standard"
volume_size = "${var.disk_size}"
}
# network
associate_public_ip_address = true
subnet_id = "${element(aws_subnet.public.*.id, count.index)}"
vpc_security_group_ids = ["${aws_security_group.controller.id}"]
}
# Controller Container Linux Config
data "template_file" "controller_config" {
count = "${var.controller_count}"
template = "${file("${path.module}/cl/controller.yaml.tmpl")}"
vars = {
# Cannot use cyclic dependencies on controllers or their DNS records
etcd_name = "etcd${count.index}"
etcd_domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
# etcd0=https://cluster-etcd0.example.com,etcd1=https://cluster-etcd1.example.com,...
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", null_resource.repeat.*.triggers.name, null_resource.repeat.*.triggers.domain))}"
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
ssh_authorized_key = "${var.ssh_authorized_key}"
cluster_domain_suffix = "${var.cluster_domain_suffix}"
kubeconfig_ca_cert = "${module.bootkube.ca_cert}"
kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
kubeconfig_kubelet_key = "${module.bootkube.kubelet_key}"
kubeconfig_server = "${module.bootkube.server}"
}
}
# Horrible hack to generate a Terraform list of a desired length without dependencies.
# Ideal ${repeat("etcd", 3) -> ["etcd", "etcd", "etcd"]}
resource null_resource "repeat" {
count = "${var.controller_count}"
triggers {
name = "etcd${count.index}"
domain = "${var.cluster_name}-etcd${count.index}.${var.dns_zone}"
}
}
data "ct_config" "controller_ign" {
count = "${var.controller_count}"
content = "${element(data.template_file.controller_config.*.rendered, count.index)}"
pretty_print = false
}
# Security Group (instance firewall)
resource "aws_security_group" "controller" {
name = "${var.cluster_name}-controller"
description = "${var.cluster_name} controller security group"
vpc_id = "${aws_vpc.network.id}"
tags = "${map("Name", "${var.cluster_name}-controller")}"
}
resource "aws_security_group_rule" "controller-icmp" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "icmp"
from_port = 0
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "controller-ssh" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "tcp"
from_port = 22
to_port = 22
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "controller-apiserver" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "tcp"
from_port = 443
to_port = 443
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "controller-etcd" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "tcp"
from_port = 2379
to_port = 2380
self = true
}
resource "aws_security_group_rule" "controller-flannel" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
source_security_group_id = "${aws_security_group.worker.id}"
}
resource "aws_security_group_rule" "controller-flannel-self" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
self = true
}
resource "aws_security_group_rule" "controller-node-exporter" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "tcp"
from_port = 9100
to_port = 9100
source_security_group_id = "${aws_security_group.worker.id}"
}
resource "aws_security_group_rule" "controller-kubelet-self" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "tcp"
from_port = 10250
to_port = 10250
self = true
}
resource "aws_security_group_rule" "controller-kubelet-read" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "tcp"
from_port = 10255
to_port = 10255
source_security_group_id = "${aws_security_group.worker.id}"
}
resource "aws_security_group_rule" "controller-kubelet-read-self" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "tcp"
from_port = 10255
to_port = 10255
self = true
}
resource "aws_security_group_rule" "controller-bgp" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "tcp"
from_port = 179
to_port = 179
source_security_group_id = "${aws_security_group.worker.id}"
}
resource "aws_security_group_rule" "controller-bgp-self" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = "tcp"
from_port = 179
to_port = 179
self = true
}
resource "aws_security_group_rule" "controller-ipip" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = 4
from_port = 0
to_port = 0
source_security_group_id = "${aws_security_group.worker.id}"
}
resource "aws_security_group_rule" "controller-ipip-self" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = 4
from_port = 0
to_port = 0
self = true
}
resource "aws_security_group_rule" "controller-ipip-legacy" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = 94
from_port = 0
to_port = 0
source_security_group_id = "${aws_security_group.worker.id}"
}
resource "aws_security_group_rule" "controller-ipip-legacy-self" {
security_group_id = "${aws_security_group.controller.id}"
type = "ingress"
protocol = 94
from_port = 0
to_port = 0
self = true
}
resource "aws_security_group_rule" "controller-egress" {
security_group_id = "${aws_security_group.controller.id}"
type = "egress"
protocol = "-1"
from_port = 0
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}

View File

@ -0,0 +1,43 @@
# kube-apiserver Network Load Balancer DNS Record
resource "aws_route53_record" "apiserver" {
zone_id = "${var.dns_zone_id}"
name = "${format("%s.%s.", var.cluster_name, var.dns_zone)}"
type = "A"
# AWS recommends their special "alias" records for ELBs
alias {
name = "${aws_elb.apiserver.dns_name}"
zone_id = "${aws_elb.apiserver.zone_id}"
evaluate_target_health = true
}
}
# Controller Network Load Balancer
resource "aws_elb" "apiserver" {
name = "${var.cluster_name}-apiserver"
subnets = ["${aws_subnet.public.*.id}"]
security_groups = ["${aws_security_group.controller.id}"]
listener {
lb_port = 443
lb_protocol = "tcp"
instance_port = 443
instance_protocol = "tcp"
}
instances = ["${aws_instance.controllers.*.id}"]
# Kubelet HTTP health check
health_check {
target = "SSL:443"
healthy_threshold = 2
unhealthy_threshold = 4
timeout = 5
interval = 6
}
idle_timeout = 3600
connection_draining = true
connection_draining_timeout = 300
}

View File

@ -0,0 +1,32 @@
# Ingress Network Load Balancer
resource "aws_elb" "ingress" {
name = "${var.cluster_name}-ingress"
subnets = ["${aws_subnet.public.*.id}"]
security_groups = ["${aws_security_group.worker.id}"]
listener {
lb_port = 80
lb_protocol = "tcp"
instance_port = 80
instance_protocol = "tcp"
}
listener {
lb_port = 443
lb_protocol = "tcp"
instance_port = 443
instance_protocol = "tcp"
}
# Ingress Controller HTTP health check
health_check {
target = "HTTP:10254/healthz"
healthy_threshold = 2
unhealthy_threshold = 4
timeout = 5
interval = 6
}
connection_draining = true
connection_draining_timeout = 300
}

View File

@ -0,0 +1,57 @@
data "aws_availability_zones" "all" {}
# Network VPC, gateway, and routes
resource "aws_vpc" "network" {
cidr_block = "${var.host_cidr}"
assign_generated_ipv6_cidr_block = true
enable_dns_support = true
enable_dns_hostnames = true
tags = "${map("Name", "${var.cluster_name}")}"
}
resource "aws_internet_gateway" "gateway" {
vpc_id = "${aws_vpc.network.id}"
tags = "${map("Name", "${var.cluster_name}")}"
}
resource "aws_route_table" "default" {
vpc_id = "${aws_vpc.network.id}"
route {
cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.gateway.id}"
}
route {
ipv6_cidr_block = "::/0"
gateway_id = "${aws_internet_gateway.gateway.id}"
}
tags = "${map("Name", "${var.cluster_name}")}"
}
# Subnets (one per availability zone)
resource "aws_subnet" "public" {
count = "${length(data.aws_availability_zones.all.names)}"
vpc_id = "${aws_vpc.network.id}"
availability_zone = "${data.aws_availability_zones.all.names[count.index]}"
cidr_block = "${cidrsubnet(var.host_cidr, 4, count.index)}"
ipv6_cidr_block = "${cidrsubnet(aws_vpc.network.ipv6_cidr_block, 8, count.index)}"
map_public_ip_on_launch = true
assign_ipv6_address_on_creation = true
tags = "${map("Name", "${var.cluster_name}-public-${count.index}")}"
}
resource "aws_route_table_association" "public" {
count = "${length(data.aws_availability_zones.all.names)}"
route_table_id = "${aws_route_table.default.id}"
subnet_id = "${element(aws_subnet.public.*.id, count.index)}"
}

View File

@ -0,0 +1,4 @@
output "ingress_dns_name" {
value = "${aws_elb.ingress.dns_name}"
description = "DNS name of the ELB for distributing traffic to Ingress controllers"
}

View File

@ -0,0 +1,25 @@
# Terraform version and plugin versions
terraform {
required_version = ">= 0.10.4"
}
provider "aws" {
version = "~> 1.0"
}
provider "local" {
version = "~> 1.0"
}
provider "null" {
version = "~> 1.0"
}
provider "template" {
version = "~> 1.0"
}
provider "tls" {
version = "~> 1.0"
}

View File

@ -0,0 +1,92 @@
# Secure copy etcd TLS assets and kubeconfig to controllers. Activates kubelet.service
resource "null_resource" "copy-secrets" {
count = "${var.controller_count}"
connection {
type = "ssh"
host = "${element(aws_instance.controllers.*.public_ip, count.index)}"
user = "core"
timeout = "15m"
}
provisioner "file" {
content = "${module.bootkube.kubeconfig}"
destination = "$HOME/kubeconfig"
}
provisioner "file" {
content = "${module.bootkube.etcd_ca_cert}"
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = "${module.bootkube.etcd_client_cert}"
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = "${module.bootkube.etcd_client_key}"
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = "${module.bootkube.etcd_server_cert}"
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = "${module.bootkube.etcd_server_key}"
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = "${module.bootkube.etcd_peer_cert}"
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = "${module.bootkube.etcd_peer_key}"
destination = "$HOME/etcd-peer.key"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
"sudo mv /home/core/kubeconfig /etc/kubernetes/kubeconfig",
]
}
}
# Secure copy bootkube assets to ONE controller and start bootkube to perform
# one-time self-hosted cluster bootstrapping.
resource "null_resource" "bootkube-start" {
depends_on = ["module.bootkube", "null_resource.copy-secrets", "aws_route53_record.apiserver"]
connection {
type = "ssh"
host = "${aws_instance.controllers.0.public_ip}"
user = "core"
timeout = "15m"
}
provisioner "file" {
source = "${var.asset_dir}"
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mv /home/core/assets /opt/bootkube",
"sudo systemctl start bootkube",
]
}
}

View File

@ -0,0 +1,102 @@
variable "cluster_name" {
type = "string"
description = "Cluster name"
}
variable "dns_zone" {
type = "string"
description = "AWS DNS Zone (e.g. aws.dghubble.io)"
}
variable "dns_zone_id" {
type = "string"
description = "AWS DNS Zone ID (e.g. Z3PAABBCFAKEC0)"
}
variable "ssh_authorized_key" {
type = "string"
description = "SSH public key for user 'core'"
}
variable "os_channel" {
type = "string"
default = "stable"
description = "Container Linux AMI channel (stable, beta, alpha)"
}
variable "disk_size" {
type = "string"
default = "40"
description = "The size of the disk in Gigabytes"
}
variable "host_cidr" {
description = "CIDR IPv4 range to assign to EC2 nodes"
type = "string"
default = "10.0.0.0/16"
}
variable "controller_count" {
type = "string"
default = "1"
description = "Number of controllers"
}
variable "controller_type" {
type = "string"
default = "t2.small"
description = "Controller EC2 instance type"
}
variable "worker_count" {
type = "string"
default = "1"
description = "Number of workers"
}
variable "worker_type" {
type = "string"
default = "t2.small"
description = "Worker EC2 instance type"
}
# bootkube assets
variable "asset_dir" {
description = "Path to a directory where generated assets should be placed (contains secrets)"
type = "string"
}
variable "networking" {
description = "Choice of networking provider (calico or flannel)"
type = "string"
default = "calico"
}
variable "network_mtu" {
description = "CNI interface MTU (applies to calico only). Use 8981 if using instances types with Jumbo frames."
type = "string"
default = "1480"
}
variable "pod_cidr" {
description = "CIDR IPv4 range to assign Kubernetes pods"
type = "string"
default = "10.2.0.0/16"
}
variable "service_cidr" {
description = <<EOD
CIDR IPv4 range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for kube-dns.
EOD
type = "string"
default = "10.3.0.0/16"
}
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = "string"
default = "cluster.local"
}

View File

@ -0,0 +1,275 @@
# Workers AutoScaling Group
resource "aws_autoscaling_group" "workers" {
name = "${var.cluster_name}-worker ${aws_launch_configuration.worker.name}"
load_balancers = ["${aws_elb.ingress.id}"]
# count
desired_capacity = "${var.worker_count}"
min_size = "${var.worker_count}"
max_size = "${var.worker_count + 2}"
default_cooldown = 30
health_check_grace_period = 30
# network
vpc_zone_identifier = ["${aws_subnet.public.*.id}"]
# template
launch_configuration = "${aws_launch_configuration.worker.name}"
lifecycle {
# override the default destroy and replace update behavior
create_before_destroy = true
ignore_changes = ["image_id"]
}
tags = [{
key = "Name"
value = "${var.cluster_name}-worker"
propagate_at_launch = true
}]
}
# Worker template
resource "aws_launch_configuration" "worker" {
image_id = "${data.aws_ami.coreos.image_id}"
instance_type = "${var.worker_type}"
user_data = "${data.ct_config.worker_ign.rendered}"
# storage
root_block_device {
volume_type = "standard"
volume_size = "${var.disk_size}"
}
# network
security_groups = ["${aws_security_group.worker.id}"]
lifecycle {
// Override the default destroy and replace update behavior
create_before_destroy = true
}
}
# Worker Container Linux Config
data "template_file" "worker_config" {
template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
vars = {
k8s_dns_service_ip = "${cidrhost(var.service_cidr, 10)}"
k8s_etcd_service_ip = "${cidrhost(var.service_cidr, 15)}"
ssh_authorized_key = "${var.ssh_authorized_key}"
cluster_domain_suffix = "${var.cluster_domain_suffix}"
kubeconfig_ca_cert = "${module.bootkube.ca_cert}"
kubeconfig_kubelet_cert = "${module.bootkube.kubelet_cert}"
kubeconfig_kubelet_key = "${module.bootkube.kubelet_key}"
kubeconfig_server = "${module.bootkube.server}"
}
}
data "ct_config" "worker_ign" {
content = "${data.template_file.worker_config.rendered}"
pretty_print = false
}
# Security Group (instance firewall)
resource "aws_security_group" "worker" {
name = "${var.cluster_name}-worker"
description = "${var.cluster_name} worker security group"
vpc_id = "${aws_vpc.network.id}"
tags = "${map("Name", "${var.cluster_name}-worker")}"
}
resource "aws_security_group_rule" "worker-icmp" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "icmp"
from_port = 0
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "worker-ssh" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 22
to_port = 22
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "worker-http" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 80
to_port = 80
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "worker-https" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 443
to_port = 443
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "worker-flannel" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
source_security_group_id = "${aws_security_group.controller.id}"
}
resource "aws_security_group_rule" "worker-flannel-self" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "udp"
from_port = 8472
to_port = 8472
self = true
}
resource "aws_security_group_rule" "worker-node-exporter" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 9100
to_port = 9100
self = true
}
resource "aws_security_group_rule" "worker-kubelet" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 10250
to_port = 10250
source_security_group_id = "${aws_security_group.controller.id}"
}
resource "aws_security_group_rule" "worker-kubelet-self" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 10250
to_port = 10250
self = true
}
resource "aws_security_group_rule" "worker-kubelet-read" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 10255
to_port = 10255
source_security_group_id = "${aws_security_group.controller.id}"
}
resource "aws_security_group_rule" "worker-kubelet-read-self" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 10255
to_port = 10255
self = true
}
resource "aws_security_group_rule" "ingress-health-self" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 10254
to_port = 10254
self = true
}
resource "aws_security_group_rule" "worker-bgp" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 179
to_port = 179
source_security_group_id = "${aws_security_group.controller.id}"
}
resource "aws_security_group_rule" "worker-bgp-self" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = "tcp"
from_port = 179
to_port = 179
self = true
}
resource "aws_security_group_rule" "worker-ipip" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = 4
from_port = 0
to_port = 0
source_security_group_id = "${aws_security_group.controller.id}"
}
resource "aws_security_group_rule" "worker-ipip-self" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = 4
from_port = 0
to_port = 0
self = true
}
resource "aws_security_group_rule" "worker-ipip-legacy" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = 94
from_port = 0
to_port = 0
source_security_group_id = "${aws_security_group.controller.id}"
}
resource "aws_security_group_rule" "worker-ipip-legacy-self" {
security_group_id = "${aws_security_group.worker.id}"
type = "ingress"
protocol = 94
from_port = 0
to_port = 0
self = true
}
resource "aws_security_group_rule" "worker-egress" {
security_group_id = "${aws_security_group.worker.id}"
type = "egress"
protocol = "-1"
from_port = 0
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}

View File

@ -0,0 +1,23 @@
The MIT License (MIT)
Copyright (c) 2017 Typhoon Authors
Copyright (c) 2017 Dalton Hubble
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@ -0,0 +1,22 @@
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
Typhoon is a minimal and free Kubernetes distribution.
* Minimal, stable base Kubernetes distribution
* Declarative infrastructure and configuration
* Free (freedom and cost) and privacy-respecting
* Practical for labs, datacenters, and clouds
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.9.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs
Please see the [official docs](https://typhoon.psdn.io) and the bare-metal [tutorial](https://typhoon.psdn.io/bare-metal/).

View File

@ -0,0 +1,14 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.10.0"
cluster_name = "${var.cluster_name}"
api_servers = ["${var.k8s_domain_name}"]
etcd_servers = ["${var.controller_domains}"]
asset_dir = "${var.asset_dir}"
networking = "${var.networking}"
network_mtu = "${var.network_mtu}"
pod_cidr = "${var.pod_cidr}"
service_cidr = "${var.service_cidr}"
cluster_domain_suffix = "${var.cluster_domain_suffix}"
}

View File

@ -0,0 +1,42 @@
---
systemd:
units:
- name: installer.service
enable: true
contents: |
[Unit]
Requires=network-online.target
After=network-online.target
[Service]
Type=simple
ExecStart=/opt/installer
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /opt/installer
filesystem: root
mode: 0500
contents:
inline: |
#!/bin/bash -ex
curl --retry 10 "${ignition_endpoint}?{{.request.raw_query}}&os=installed" -o ignition.json
coreos-install \
-d ${install_disk} \
-C ${container_linux_channel} \
-V ${container_linux_version} \
-o "${container_linux_oem}" \
${baseurl_flag} \
-i ignition.json
udevadm settle
systemctl reboot
passwd:
users:
# Avoid using standard name "core" so terraform apply cannot SSH until post-install.
- name: debug
create:
groups:
- sudo
- docker
ssh_authorized_keys:
- {{.ssh_authorized_key}}

View File

@ -0,0 +1,166 @@
---
systemd:
units:
- name: etcd-member.service
enable: true
dropins:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.2.14"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${domain_name}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${domain_name}:2380"
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380"
Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}"
Environment="ETCD_STRICT_RECONFIG_CHECK=true"
Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt"
Environment="ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt"
Environment="ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key"
Environment="ETCD_CLIENT_CERT_AUTH=true"
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt"
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt"
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key"
Environment="ETCD_PEER_CLIENT_CERT_AUTH=true"
- name: docker.service
enable: true
- name: locksmithd.service
mask: true
- name: kubelet.path
enable: true
contents: |
[Unit]
Description=Watch for kubeconfig
[Path]
PathExists=/etc/kubernetes/kubeconfig
[Install]
WantedBy=multi-user.target
- name: wait-for-dns.service
enable: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
contents: |
[Unit]
Description=Kubelet via Hyperkube
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--insecure-options=image"
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--allow-privileged \
--anonymous-auth=false \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${k8s_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--hostname-override=${domain_name} \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/master \
--pod-manifest-path=/etc/kubernetes/manifests \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: bootkube.service
contents: |
[Unit]
Description=Bootstrap a Kubernetes control plane with a temp api-server
ConditionPathExists=!/opt/bootkube/init_bootkube.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootkube
ExecStart=/opt/bootkube/bootkube-start
ExecStartPost=/bin/touch /opt/bootkube/init_bootkube.done
storage:
files:
- path: /etc/kubernetes/kubelet.env
filesystem: root
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
KUBELET_IMAGE_TAG=v1.9.2
- path: /etc/hostname
filesystem: root
mode: 0644
contents:
inline:
${domain_name}
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /opt/bootkube/bootkube-start
filesystem: root
mode: 0544
user:
id: 500
group:
id: 500
contents:
inline: |
#!/bin/bash
# Wrapper for bootkube start
set -e
# Move experimental manifests
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.10.0}"
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
exec /usr/bin/rkt run \
--trust-keys-from-https \
--volume assets,kind=host,source=$BOOTKUBE_ASSETS \
--mount volume=assets,target=/assets \
--volume bootstrap,kind=host,source=/etc/kubernetes \
--mount volume=bootstrap,target=/etc/kubernetes \
$$RKT_OPTS \
$${BOOTKUBE_ACI}:$${BOOTKUBE_VERSION} \
--net=host \
--dns=host \
--exec=/bootkube -- start --asset-dir=/assets "$@"
networkd:
${networkd_content}
passwd:
users:
- name: core
ssh_authorized_keys:
- ${ssh_authorized_key}

View File

@ -0,0 +1,104 @@
---
systemd:
units:
- name: docker.service
enable: true
- name: locksmithd.service
mask: true
- name: kubelet.path
enable: true
contents: |
[Unit]
Description=Watch for kubeconfig
[Path]
PathExists=/etc/kubernetes/kubeconfig
[Install]
WantedBy=multi-user.target
- name: wait-for-dns.service
enable: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
- name: kubelet.service
contents: |
[Unit]
Description=Kubelet via Hyperkube
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--insecure-options=image"
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--allow-privileged \
--anonymous-auth=false \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${k8s_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--hostname-override=${domain_name} \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/node \
--pod-manifest-path=/etc/kubernetes/manifests \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubelet.env
filesystem: root
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
KUBELET_IMAGE_TAG=v1.9.2
- path: /etc/hostname
filesystem: root
mode: 0644
contents:
inline:
${domain_name}
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
inline: |
fs.inotify.max_user_watches=16184
networkd:
${networkd_content}
passwd:
users:
- name: core
ssh_authorized_keys:
- ${ssh_authorized_key}

View File

@ -0,0 +1,37 @@
// Install Container Linux to disk
resource "matchbox_group" "container-linux-install" {
count = "${length(var.controller_names) + length(var.worker_names)}"
name = "${format("container-linux-install-%s", element(concat(var.controller_names, var.worker_names), count.index))}"
profile = "${var.cached_install == "true" ? element(matchbox_profile.cached-container-linux-install.*.name, count.index) : element(matchbox_profile.container-linux-install.*.name, count.index)}"
selector {
mac = "${element(concat(var.controller_macs, var.worker_macs), count.index)}"
}
metadata {
ssh_authorized_key = "${var.ssh_authorized_key}"
}
}
resource "matchbox_group" "controller" {
count = "${length(var.controller_names)}"
name = "${format("%s-%s", var.cluster_name, element(var.controller_names, count.index))}"
profile = "${element(matchbox_profile.controllers.*.name, count.index)}"
selector {
mac = "${element(var.controller_macs, count.index)}"
os = "installed"
}
}
resource "matchbox_group" "worker" {
count = "${length(var.worker_names)}"
name = "${format("%s-%s", var.cluster_name, element(var.worker_names, count.index))}"
profile = "${element(matchbox_profile.workers.*.name, count.index)}"
selector {
mac = "${element(var.worker_macs, count.index)}"
os = "installed"
}
}

View File

@ -0,0 +1,3 @@
output "kubeconfig" {
value = "${module.bootkube.kubeconfig}"
}

View File

@ -0,0 +1,128 @@
// Container Linux Install profile (from release.core-os.net)
resource "matchbox_profile" "container-linux-install" {
count = "${length(var.controller_names) + length(var.worker_names)}"
name = "${format("%s-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}"
kernel = "http://${var.container_linux_channel}.release.core-os.net/amd64-usr/${var.container_linux_version}/coreos_production_pxe.vmlinuz"
initrd = [
"http://${var.container_linux_channel}.release.core-os.net/amd64-usr/${var.container_linux_version}/coreos_production_pxe_image.cpio.gz",
]
args = [
"initrd=coreos_production_pxe_image.cpio.gz",
"coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
"coreos.first_boot=yes",
"console=tty0",
"console=ttyS0",
"${var.kernel_args}",
]
container_linux_config = "${element(data.template_file.container-linux-install-configs.*.rendered, count.index)}"
}
data "template_file" "container-linux-install-configs" {
count = "${length(var.controller_names) + length(var.worker_names)}"
template = "${file("${path.module}/cl/container-linux-install.yaml.tmpl")}"
vars {
container_linux_channel = "${var.container_linux_channel}"
container_linux_version = "${var.container_linux_version}"
ignition_endpoint = "${format("%s/ignition", var.matchbox_http_endpoint)}"
install_disk = "${var.install_disk}"
container_linux_oem = "${var.container_linux_oem}"
# only cached-container-linux profile adds -b baseurl
baseurl_flag = ""
}
}
// Container Linux Install profile (from matchbox /assets cache)
// Note: Admin must have downloaded container_linux_version into matchbox assets.
resource "matchbox_profile" "cached-container-linux-install" {
count = "${length(var.controller_names) + length(var.worker_names)}"
name = "${format("%s-cached-container-linux-install-%s", var.cluster_name, element(concat(var.controller_names, var.worker_names), count.index))}"
kernel = "/assets/coreos/${var.container_linux_version}/coreos_production_pxe.vmlinuz"
initrd = [
"/assets/coreos/${var.container_linux_version}/coreos_production_pxe_image.cpio.gz",
]
args = [
"initrd=coreos_production_pxe_image.cpio.gz",
"coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
"coreos.first_boot=yes",
"console=tty0",
"console=ttyS0",
"${var.kernel_args}",
]
container_linux_config = "${element(data.template_file.cached-container-linux-install-configs.*.rendered, count.index)}"
}
data "template_file" "cached-container-linux-install-configs" {
count = "${length(var.controller_names) + length(var.worker_names)}"
template = "${file("${path.module}/cl/container-linux-install.yaml.tmpl")}"
vars {
container_linux_channel = "${var.container_linux_channel}"
container_linux_version = "${var.container_linux_version}"
ignition_endpoint = "${format("%s/ignition", var.matchbox_http_endpoint)}"
install_disk = "${var.install_disk}"
container_linux_oem = "${var.container_linux_oem}"
# profile uses -b baseurl to install from matchbox cache
baseurl_flag = "-b ${var.matchbox_http_endpoint}/assets/coreos"
}
}
// Kubernetes Controller profiles
resource "matchbox_profile" "controllers" {
count = "${length(var.controller_names)}"
name = "${format("%s-controller-%s", var.cluster_name, element(var.controller_names, count.index))}"
container_linux_config = "${element(data.template_file.controller-configs.*.rendered, count.index)}"
}
data "template_file" "controller-configs" {
count = "${length(var.controller_names)}"
template = "${file("${path.module}/cl/controller.yaml.tmpl")}"
vars {
domain_name = "${element(var.controller_domains, count.index)}"
etcd_name = "${element(var.controller_names, count.index)}"
etcd_initial_cluster = "${join(",", formatlist("%s=https://%s:2380", var.controller_names, var.controller_domains))}"
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
cluster_domain_suffix = "${var.cluster_domain_suffix}"
ssh_authorized_key = "${var.ssh_authorized_key}"
# Terraform evaluates both sides regardless and element cannot be used on 0 length lists
networkd_content = "${length(var.controller_networkds) == 0 ? "" : element(concat(var.controller_networkds, list("")), count.index)}"
}
}
// Kubernetes Worker profiles
resource "matchbox_profile" "workers" {
count = "${length(var.worker_names)}"
name = "${format("%s-worker-%s", var.cluster_name, element(var.worker_names, count.index))}"
container_linux_config = "${element(data.template_file.worker-configs.*.rendered, count.index)}"
}
data "template_file" "worker-configs" {
count = "${length(var.worker_names)}"
template = "${file("${path.module}/cl/worker.yaml.tmpl")}"
vars {
domain_name = "${element(var.worker_domains, count.index)}"
k8s_dns_service_ip = "${module.bootkube.kube_dns_service_ip}"
cluster_domain_suffix = "${var.cluster_domain_suffix}"
ssh_authorized_key = "${var.ssh_authorized_key}"
# Terraform evaluates both sides regardless and element cannot be used on 0 length lists
networkd_content = "${length(var.worker_networkds) == 0 ? "" : element(concat(var.worker_networkds, list("")), count.index)}"
}
}

View File

@ -0,0 +1,21 @@
# Terraform version and plugin versions
terraform {
required_version = ">= 0.10.4"
}
provider "local" {
version = "~> 1.0"
}
provider "null" {
version = "~> 1.0"
}
provider "template" {
version = "~> 1.0"
}
provider "tls" {
version = "~> 1.0"
}

View File

@ -0,0 +1,118 @@
# Secure copy etcd TLS assets and kubeconfig to controllers. Activates kubelet.service
resource "null_resource" "copy-etcd-secrets" {
count = "${length(var.controller_names)}"
connection {
type = "ssh"
host = "${element(var.controller_domains, count.index)}"
user = "core"
timeout = "60m"
}
provisioner "file" {
content = "${module.bootkube.kubeconfig}"
destination = "$HOME/kubeconfig"
}
provisioner "file" {
content = "${module.bootkube.etcd_ca_cert}"
destination = "$HOME/etcd-client-ca.crt"
}
provisioner "file" {
content = "${module.bootkube.etcd_client_cert}"
destination = "$HOME/etcd-client.crt"
}
provisioner "file" {
content = "${module.bootkube.etcd_client_key}"
destination = "$HOME/etcd-client.key"
}
provisioner "file" {
content = "${module.bootkube.etcd_server_cert}"
destination = "$HOME/etcd-server.crt"
}
provisioner "file" {
content = "${module.bootkube.etcd_server_key}"
destination = "$HOME/etcd-server.key"
}
provisioner "file" {
content = "${module.bootkube.etcd_peer_cert}"
destination = "$HOME/etcd-peer.crt"
}
provisioner "file" {
content = "${module.bootkube.etcd_peer_key}"
destination = "$HOME/etcd-peer.key"
}
provisioner "remote-exec" {
inline = [
"sudo mkdir -p /etc/ssl/etcd/etcd",
"sudo mv etcd-client* /etc/ssl/etcd/",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/server-ca.crt",
"sudo mv etcd-server.crt /etc/ssl/etcd/etcd/server.crt",
"sudo mv etcd-server.key /etc/ssl/etcd/etcd/server.key",
"sudo cp /etc/ssl/etcd/etcd-client-ca.crt /etc/ssl/etcd/etcd/peer-ca.crt",
"sudo mv etcd-peer.crt /etc/ssl/etcd/etcd/peer.crt",
"sudo mv etcd-peer.key /etc/ssl/etcd/etcd/peer.key",
"sudo chown -R etcd:etcd /etc/ssl/etcd",
"sudo chmod -R 500 /etc/ssl/etcd",
"sudo mv /home/core/kubeconfig /etc/kubernetes/kubeconfig",
]
}
}
# Secure copy kubeconfig to all workers. Activates kubelet.service
resource "null_resource" "copy-kubeconfig" {
count = "${length(var.worker_names)}"
connection {
type = "ssh"
host = "${element(var.worker_domains, count.index)}"
user = "core"
timeout = "60m"
}
provisioner "file" {
content = "${module.bootkube.kubeconfig}"
destination = "$HOME/kubeconfig"
}
provisioner "remote-exec" {
inline = [
"sudo mv /home/core/kubeconfig /etc/kubernetes/kubeconfig",
]
}
}
# Secure copy bootkube assets to ONE controller and start bootkube to perform
# one-time self-hosted cluster bootstrapping.
resource "null_resource" "bootkube-start" {
# Without depends_on, this remote-exec may start before the kubeconfig copy.
# Terraform only does one task at a time, so it would try to bootstrap
# while no Kubelets are running.
depends_on = ["null_resource.copy-etcd-secrets", "null_resource.copy-kubeconfig"]
connection {
type = "ssh"
host = "${element(var.controller_domains, 0)}"
user = "core"
timeout = "30m"
}
provisioner "file" {
source = "${var.asset_dir}"
destination = "$HOME/assets"
}
provisioner "remote-exec" {
inline = [
"sudo mv /home/core/assets /opt/bootkube",
"sudo systemctl start bootkube",
]
}
}

View File

@ -0,0 +1,137 @@
variable "matchbox_http_endpoint" {
type = "string"
description = "Matchbox HTTP read-only endpoint (e.g. http://matchbox.example.com:8080)"
}
variable "container_linux_channel" {
type = "string"
description = "Container Linux channel corresponding to the container_linux_version"
}
variable "container_linux_version" {
type = "string"
description = "Container Linux version of the kernel/initrd to PXE or the image to install"
}
variable "cluster_name" {
type = "string"
description = "Cluster name"
}
variable "ssh_authorized_key" {
type = "string"
description = "SSH public key to set as an authorized_key on machines"
}
# Machines
# Terraform's crude "type system" does properly support lists of maps so we do this.
variable "controller_names" {
type = "list"
}
variable "controller_macs" {
type = "list"
}
variable "controller_domains" {
type = "list"
}
variable "worker_names" {
type = "list"
}
variable "worker_macs" {
type = "list"
}
variable "worker_domains" {
type = "list"
}
# bootkube assets
variable "k8s_domain_name" {
description = "Controller DNS name which resolves to a controller instance. Workers and kubeconfig's will communicate with this endpoint (e.g. cluster.example.com)"
type = "string"
}
variable "asset_dir" {
description = "Path to a directory where generated assets should be placed (contains secrets)"
type = "string"
}
variable "networking" {
description = "Choice of networking provider (flannel or calico)"
type = "string"
default = "calico"
}
variable "network_mtu" {
description = "CNI interface MTU (applies to calico only)"
type = "string"
default = "1480"
}
variable "pod_cidr" {
description = "CIDR IP range to assign Kubernetes pods"
type = "string"
default = "10.2.0.0/16"
}
variable "service_cidr" {
description = <<EOD
CIDR IP range to assign Kubernetes services.
The 1st IP will be reserved for kube_apiserver, the 10th IP will be reserved for kube-dns.
EOD
type = "string"
default = "10.3.0.0/16"
}
# optional
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = "string"
default = "cluster.local"
}
variable "cached_install" {
type = "string"
default = "false"
description = "Whether Container Linux should PXE boot and install from matchbox /assets cache. Note that the admin must have downloaded the container_linux_version into matchbox assets."
}
variable "install_disk" {
type = "string"
default = "/dev/sda"
description = "Disk device to which the install profiles should install Container Linux (e.g. /dev/sda)"
}
variable "container_linux_oem" {
type = "string"
default = ""
description = "Specify an OEM image id to use as base for the installation (e.g. ami, vmware_raw, xen) or leave blank for the default image"
}
variable "kernel_args" {
description = "Additional kernel arguments to provide at PXE boot."
type = "list"
default = []
}
# unofficial, undocumented, unsupported, temporary
variable "controller_networkds" {
type = "list"
description = "Controller Container Linux config networkd section"
default = []
}
variable "worker_networkds" {
type = "list"
description = "Worker Container Linux config networkd section"
default = []
}

View File

@ -0,0 +1,117 @@
---
systemd:
units:
- name: docker.service
enable: true
- name: locksmithd.service
mask: true
- name: kubelet.path
enable: true
contents: |
[Unit]
Description=Watch for kubeconfig
[Path]
PathExists=/etc/kubernetes/kubeconfig
[Install]
WantedBy=multi-user.target
- name: wait-for-dns.service
enable: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
- name: kubelet.service
contents: |
[Unit]
Description=Kubelet via Hyperkube
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--insecure-options=image"
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volumeplugins
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--allow-privileged \
--anonymous-auth=false \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns={{.k8s_dns_service_ip}} \
--cluster_domain={{.cluster_domain_suffix}} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--hostname-override={{.domain_name}} \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/node \
--pod-manifest-path=/etc/kubernetes/manifests \
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
storage:
{{ if index . "pxe" }}
disks:
- device: /dev/sda
wipe_table: true
partitions:
- label: ROOT
filesystems:
- name: root
mount:
device: "/dev/sda1"
format: "ext4"
create:
force: true
options:
- "-LROOT"
{{end}}
files:
- path: /etc/kubernetes/kubelet.env
filesystem: root
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
KUBELET_IMAGE_TAG=v1.9.2
- path: /etc/hostname
filesystem: root
mode: 0644
contents:
inline:
{{.domain_name}}
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
inline: |
fs.inotify.max_user_watches=16184
passwd:
users:
- name: core
ssh_authorized_keys:
- {{.ssh_authorized_key}}

View File

@ -0,0 +1,22 @@
resource "matchbox_group" "workers" {
count = "${length(var.worker_names)}"
name = "${format("%s-%s", var.cluster_name, element(var.worker_names, count.index))}"
profile = "${matchbox_profile.bootkube-worker-pxe.name}"
selector {
mac = "${element(var.worker_macs, count.index)}"
}
metadata {
pxe = "true"
domain_name = "${element(var.worker_domains, count.index)}"
etcd_endpoints = "${join(",", formatlist("%s:2379", var.controller_domains))}"
# TODO
etcd_on_host = "true"
k8s_etcd_service_ip = "10.3.0.15"
k8s_dns_service_ip = "${var.kube_dns_service_ip}"
cluster_domain_suffix = "${var.cluster_domain_suffix}"
ssh_authorized_key = "${var.ssh_authorized_key}"
}
}

View File

@ -0,0 +1,20 @@
// Container Linux Install profile (from release.core-os.net)
resource "matchbox_profile" "bootkube-worker-pxe" {
name = "bootkube-worker-pxe"
kernel = "http://${var.container_linux_channel}.release.core-os.net/amd64-usr/${var.container_linux_version}/coreos_production_pxe.vmlinuz"
initrd = [
"http://${var.container_linux_channel}.release.core-os.net/amd64-usr/${var.container_linux_version}/coreos_production_pxe_image.cpio.gz",
]
args = [
"initrd=coreos_production_pxe_image.cpio.gz",
"coreos.config.url=${var.matchbox_http_endpoint}/ignition?uuid=$${uuid}&mac=$${mac:hexhyp}",
"coreos.first_boot=yes",
"console=tty0",
"console=ttyS0",
"${var.kernel_args}",
]
container_linux_config = "${file("${path.module}/cl/bootkube-worker.yaml.tmpl")}"
}

View File

@ -0,0 +1,22 @@
# Secure copy kubeconfig to all nodes to activate kubelet.service
resource "null_resource" "copy-kubeconfig" {
count = "${length(var.worker_names)}"
connection {
type = "ssh"
host = "${element(var.worker_domains, count.index)}"
user = "core"
timeout = "60m"
}
provisioner "file" {
content = "${var.kubeconfig}"
destination = "$HOME/kubeconfig"
}
provisioner "remote-exec" {
inline = [
"sudo mv /home/core/kubeconfig /etc/kubernetes/kubeconfig",
]
}
}

View File

@ -0,0 +1,72 @@
variable "cluster_name" {
description = "Cluster name"
type = "string"
}
variable "matchbox_http_endpoint" {
type = "string"
description = "Matchbox HTTP read-only endpoint (e.g. http://matchbox.example.com:8080)"
}
variable "container_linux_channel" {
type = "string"
description = "Container Linux channel corresponding to the container_linux_version"
}
variable "container_linux_version" {
type = "string"
description = "Container Linux version of the kernel/initrd to PXE or the image to install"
}
variable "ssh_authorized_key" {
type = "string"
description = "SSH public key to set as an authorized key"
}
# machines
# Terraform's crude "type system" does properly support lists of maps so we do this.
variable "controller_domains" {
type = "list"
}
variable "worker_names" {
type = "list"
}
variable "worker_macs" {
type = "list"
}
variable "worker_domains" {
type = "list"
}
# bootkube
variable "kubeconfig" {
type = "string"
}
variable "kube_dns_service_ip" {
description = "Kubernetes service IP for kube-dns (must be within server_cidr)"
type = "string"
default = "10.3.0.10"
}
# optional
variable "kernel_args" {
description = "Additional kernel arguments to provide at PXE boot."
type = "list"
default = [
"root=/dev/sda1",
]
}
variable "cluster_domain_suffix" {
description = "Queries for domains with the suffix will be answered by kube-dns. Default is cluster.local (e.g. foo.default.svc.cluster.local) "
type = "string"
default = "cluster.local"
}

View File

@ -0,0 +1,23 @@
The MIT License (MIT)
Copyright (c) 2017 Typhoon Authors
Copyright (c) 2017 Dalton Hubble
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@ -0,0 +1,22 @@
# Typhoon <img align="right" src="https://storage.googleapis.com/poseidon/typhoon-logo.png">
Typhoon is a minimal and free Kubernetes distribution.
* Minimal, stable base Kubernetes distribution
* Declarative infrastructure and configuration
* Free (freedom and cost) and privacy-respecting
* Practical for labs, datacenters, and clouds
Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.
## Features <a href="https://www.cncf.io/certification/software-conformance/"><img align="right" src="https://storage.googleapis.com/poseidon/certified-kubernetes.png"></a>
* Kubernetes v1.9.2 (upstream, via [kubernetes-incubator/bootkube](https://github.com/kubernetes-incubator/bootkube))
* Single or multi-master, workloads isolated on workers, [Calico](https://www.projectcalico.org/) or [flannel](https://github.com/coreos/flannel) networking
* On-cluster etcd with TLS, [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/)-enabled, [network policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
* Ready for Ingress, Dashboards, Metrics, and other optional [addons](https://typhoon.psdn.io/addons/overview/)
## Docs
Please see the [official docs](https://typhoon.psdn.io) and the Digital Ocean [tutorial](https://typhoon.psdn.io/digital-ocean/).

View File

@ -0,0 +1,14 @@
# Self-hosted Kubernetes assets (kubeconfig, manifests)
module "bootkube" {
source = "git::https://github.com/poseidon/terraform-render-bootkube.git?ref=v0.10.0"
cluster_name = "${var.cluster_name}"
api_servers = ["${format("%s.%s", var.cluster_name, var.dns_zone)}"]
etcd_servers = "${digitalocean_record.etcds.*.fqdn}"
asset_dir = "${var.asset_dir}"
networking = "${var.networking}"
network_mtu = 1440
pod_cidr = "${var.pod_cidr}"
service_cidr = "${var.service_cidr}"
cluster_domain_suffix = "${var.cluster_domain_suffix}"
}

View File

@ -0,0 +1,156 @@
---
systemd:
units:
- name: etcd-member.service
enable: true
dropins:
- name: 40-etcd-cluster.conf
contents: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.2.14"
Environment="ETCD_NAME=${etcd_name}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${etcd_domain}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://${etcd_domain}:2380"
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380"
Environment="ETCD_INITIAL_CLUSTER=${etcd_initial_cluster}"
Environment="ETCD_STRICT_RECONFIG_CHECK=true"
Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/server-ca.crt"
Environment="ETCD_CERT_FILE=/etc/ssl/certs/etcd/server.crt"
Environment="ETCD_KEY_FILE=/etc/ssl/certs/etcd/server.key"
Environment="ETCD_CLIENT_CERT_AUTH=true"
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/etcd/peer-ca.crt"
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/certs/etcd/peer.crt"
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/certs/etcd/peer.key"
Environment="ETCD_PEER_CLIENT_CERT_AUTH=true"
- name: docker.service
enable: true
- name: locksmithd.service
mask: true
- name: kubelet.path
enable: true
contents: |
[Unit]
Description=Watch for kubeconfig
[Path]
PathExists=/etc/kubernetes/kubeconfig
[Install]
WantedBy=multi-user.target
- name: wait-for-dns.service
enable: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
RequiredBy=etcd-member.service
- name: kubelet.service
contents: |
[Unit]
Description=Kubelet via Hyperkube
Requires=coreos-metadata.service
After=coreos-metadata.service
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
EnvironmentFile=/run/metadata/coreos
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--insecure-options=image"
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--allow-privileged \
--anonymous-auth=false \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${k8s_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/master \
--pod-manifest-path=/etc/kubernetes/manifests \
--register-with-taints=node-role.kubernetes.io/master=:NoSchedule
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
- name: bootkube.service
contents: |
[Unit]
Description=Bootstrap a Kubernetes cluster
ConditionPathExists=!/opt/bootkube/init_bootkube.done
[Service]
Type=oneshot
RemainAfterExit=true
WorkingDirectory=/opt/bootkube
ExecStart=/opt/bootkube/bootkube-start
ExecStartPost=/bin/touch /opt/bootkube/init_bootkube.done
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubelet.env
filesystem: root
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
KUBELET_IMAGE_TAG=v1.9.2
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /opt/bootkube/bootkube-start
filesystem: root
mode: 0544
user:
id: 500
group:
id: 500
contents:
inline: |
#!/bin/bash
# Wrapper for bootkube start
set -e
# Move experimental manifests
[ -n "$(ls /opt/bootkube/assets/manifests-*/* 2>/dev/null)" ] && mv /opt/bootkube/assets/manifests-*/* /opt/bootkube/assets/manifests && rm -rf /opt/bootkube/assets/manifests-*
BOOTKUBE_ACI="$${BOOTKUBE_ACI:-quay.io/coreos/bootkube}"
BOOTKUBE_VERSION="$${BOOTKUBE_VERSION:-v0.10.0}"
BOOTKUBE_ASSETS="$${BOOTKUBE_ASSETS:-/opt/bootkube/assets}"
exec /usr/bin/rkt run \
--trust-keys-from-https \
--volume assets,kind=host,source=$${BOOTKUBE_ASSETS} \
--mount volume=assets,target=/assets \
--volume bootstrap,kind=host,source=/etc/kubernetes \
--mount volume=bootstrap,target=/etc/kubernetes \
$${RKT_OPTS} \
$${BOOTKUBE_ACI}:$${BOOTKUBE_VERSION} \
--net=host \
--dns=host \
--exec=/bootkube -- start --asset-dir=/assets "$@"

View File

@ -0,0 +1,118 @@
---
systemd:
units:
- name: docker.service
enable: true
- name: locksmithd.service
mask: true
- name: kubelet.path
enable: true
contents: |
[Unit]
Description=Watch for kubeconfig
[Path]
PathExists=/etc/kubernetes/kubeconfig
[Install]
WantedBy=multi-user.target
- name: wait-for-dns.service
enable: true
contents: |
[Unit]
Description=Wait for DNS entries
Wants=systemd-resolved.service
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
- name: kubelet.service
contents: |
[Unit]
Description=Kubelet via Hyperkube
Requires=coreos-metadata.service
After=coreos-metadata.service
Wants=rpc-statd.service
[Service]
EnvironmentFile=/etc/kubernetes/kubelet.env
EnvironmentFile=/run/metadata/coreos
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/cache/kubelet-pod.uuid \
--volume=resolv,kind=host,source=/etc/resolv.conf \
--mount volume=resolv,target=/etc/resolv.conf \
--volume var-lib-cni,kind=host,source=/var/lib/cni \
--mount volume=var-lib-cni,target=/var/lib/cni \
--volume opt-cni-bin,kind=host,source=/opt/cni/bin \
--mount volume=opt-cni-bin,target=/opt/cni/bin \
--volume var-log,kind=host,source=/var/log \
--mount volume=var-log,target=/var/log \
--insecure-options=image"
ExecStartPre=/bin/mkdir -p /opt/cni/bin
ExecStartPre=/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/bin/mkdir -p /etc/kubernetes/cni/net.d
ExecStartPre=/bin/mkdir -p /etc/kubernetes/checkpoint-secrets
ExecStartPre=/bin/mkdir -p /etc/kubernetes/inactive-manifests
ExecStartPre=/bin/mkdir -p /var/lib/cni
ExecStartPre=/usr/bin/bash -c "grep 'certificate-authority-data' /etc/kubernetes/kubeconfig | awk '{print $2}' | base64 -d > /etc/kubernetes/ca.crt"
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/cache/kubelet-pod.uuid
ExecStart=/usr/lib/coreos/kubelet-wrapper \
--allow-privileged \
--anonymous-auth=false \
--client-ca-file=/etc/kubernetes/ca.crt \
--cluster_dns=${k8s_dns_service_ip} \
--cluster_domain=${cluster_domain_suffix} \
--cni-conf-dir=/etc/kubernetes/cni/net.d \
--exit-on-lock-contention \
--hostname-override=$${COREOS_DIGITALOCEAN_IPV4_PRIVATE_0} \
--kubeconfig=/etc/kubernetes/kubeconfig \
--lock-file=/var/run/lock/kubelet.lock \
--network-plugin=cni \
--node-labels=node-role.kubernetes.io/node \
--pod-manifest-path=/etc/kubernetes/manifests
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/cache/kubelet-pod.uuid
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
- name: delete-node.service
enable: true
contents: |
[Unit]
Description=Waiting to delete Kubernetes node on shutdown
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/etc/kubernetes/delete-node
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubelet.env
filesystem: root
mode: 0644
contents:
inline: |
KUBELET_IMAGE_URL=docker://gcr.io/google_containers/hyperkube
KUBELET_IMAGE_TAG=v1.9.2
- path: /etc/sysctl.d/max-user-watches.conf
filesystem: root
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/kubernetes/delete-node
filesystem: root
mode: 0744
contents:
inline: |
#!/bin/bash
set -e
exec /usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://gcr.io/google_containers/hyperkube:v1.9.2 \
--net=host \
--dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)

Some files were not shown because too many files have changed in this diff Show More