This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.
In Kubernetes there are two different machine types:
Control plane (called "Masters")
Workers
Control plane machines are responsible for running the main Kubernetes components, this includes:
etcd
etcd is a distributed key value store. It’s where all data from the cluster is persisted.
API server
The API server is responsible for serving all the API endpoints that are used by the different Kubernetes components and clients.
Main controllers
The main controllers are responsible for most of the core functionality from Kubernetes. When you create a Deployment, a controller will be responsible for creating the ReplicaSet, and in the same way, Pods will be created out of the ReplicaSet by a controller as well.
Scheduler
The scheduler is the component that assigns pods to the different nodes based on a number of restrictions and is aware of individual and collective resource requirements.
Both the control plane and worker nodes run an agent called kubelet and a container runtime (cri-o). The kubelet is responsible for talking to the container runtime, managing the Pod lifecycle that were assigned to this machine. The container runtime, is the component that will create the containers themselves.
The default deployment aims for a "High Availability" (HA) Kubernetes service. In order to achieve HA it’s required to run several control planes.
Not any number greater than 1 is optimal for HA, and this can impact the fault tolerance. The reason for this is the etcd distributed key value store:
| Cluster size | Failure Tolerance |
|---|---|
1 | 0 |
2 | 0 |
3 | 1 |
4 | 1 |
5 | 2 |
6 | 2 |
7 | 3 |
8 | 3 |
9 | 4 |
Given that etcd runs only on the control plane nodes, having 2 control plane nodes provides a HA solution that is fault tolerant for the Kubernetes components but not for the etcd cluster. If one of those two control planes nodes goes down, the cluster storage will be unavailable, and the cluster won’t be able to accept new changes (already running workloads won’t suffer any changes, but the cluster won’t be able to react to new changes from that point on).
In order to provide a fault tolerant HA environment you must have an odd number of control planes.
A minimum of 3 master nodes is required in order to tolerate a complete loss
of one control plane node.
Control planes are only part of the whole picture. Most components will talk to the API Server, and the API Server must be exposed on all master nodes to communicate to the clients and the rest of the cluster components.
The most reasonable way to achieve fault tolerant exposure of the API servers is a load balancer. The load balancer will point to all the control plane nodes. The clients and Kubernetes components will talk to the load balancer: which will perform health checks against all the control planes and maintain an active pool of backends.
If only one load balancer is deployed this creates a single point of failure. For a complete HA solution, more than one load balancer is required.
If your environment only contains one load balancer it cannot be considered highly available or fault tolerant, since the load balancer becomes the single point of failure.
The smallest possible deployment comes without a load balancer and the minimum amount of nodes to be considered a cluster. This deployment type is in no way suitable for production use and has no fault tolerance whatsoever.
One master machine
One worker machine
Despite not recommended, it’s also possible to create a POC or testing environment with a single master machine.
The default minimal HA scenario requires 5 nodes:
2 Load Balancers
3 Masters
plus, the number of workers necessary to host your workloads.
Requires:
Persistent IP addresses on all nodes.
NTP server provided on the host network.
DNS entry that resolves to the load balancer VIP.
LDAP server or OIDC provider (Active Directory, GitLab, GitHub, etc.)
(Optional) "Infrastructure node"
LDAP server if LDAP integration is desired and your organization does not have an LDAP server.
Local RMT server to synchronize RPMs.
Local mirror of the SUSE container registry (registry.suse.com)
Local mirror of the SUSE helm chart repository.
In air gapped environments, the "Infrastructure node" is mandatory, as it’s needed to have a local RMT server mirroring the SUSE CaaS Platform repositories, a mirror of the SUSE container registry and a mirror of the SUSE helm chart repository.
Certificates are stored under /etc/kubernetes/pki on the control plane nodes.
| Path | Valid for | Common Name | Description |
|---|---|---|---|
ca.crt | 1 year | kubernetes | Kubernetes global CA |
etcd/ca.crt | 1 year | etcd-ca | etcd global CA |
| Path | Key type | Key length (bits) |
|---|---|---|
ca.key | RSA | 2048 |
etcd/ca.key | RSA | 2048 |
| Path | Valid for | CN | Parent CA | O (Subject) | Kind | Extra SANs |
|---|---|---|---|---|---|---|
apiserver-kubelet-client.crt | 1 year | kube-apiserver-kubelet-client | kubernetes | system:masters | client | - |
etcd/healthcheck-client.crt | 1 year | kube-etcd-healthcheck-client | etcd-ca | system:masters | client | - |
etcd/server.crt | 1 year | master | etcd-ca | - | server | Hostname, IP address, localhost, 127.0.0.1, 0:0:0:0:0:0:0:1 |
etcd/peer.crt | 1 year | master | etcd-ca | - | server, client | Hostname, IP address, localhost, 127.0.0.1, 0:0:0:0:0:0:0:1 |
apiserver.crt | 1 year | kube-apiserver | kubernetes | - | server | Hostname, IP address, Control Plane Address, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local |
apiserver-etcd-client.crt | 1 year | kube-apiserver-etcd-client | etcd-ca | system:masters | client | - |
| Path | Key type | Key length (bits) |
|---|---|---|
apiserver.key | RSA | 2048 |
apiserver-kubelet-client.key | RSA | 2048 |
apiserver-etcd-client.key | RSA | 2048 |
etcd/server.key | RSA | 2048 |
etcd/healthcheck-client.key | RSA | 2048 |
etcd/peer.key | RSA | 2048 |
Please refer to the SUSE CaaS Platform Administration Guide for more information on the rotation of certificates.
The CA certificate for the cluster is stored under /etc/kubernetes/pki on the worker nodes.
When a new worker machine joins the cluster, the kubelet performs a TLS bootstrap. It requests a certificate to the cluster using a bootstrap token, this request is automatically approved by the cluster, and a certificate is created. The new worker downloads this certificate and writes it to disk, so the kubelet uses this client certificate from now on to contact the apiserver.
The CA certificate is downloaded from the cluster (present in the cluster-info secret inside the kube-public namespace). Since this information is public, there’s no restriction to download the CA certificate.
This certificate is saved under /etc/kubernetes/pki/ca.crt.
| Path | Valid for | Common Name | Description |
|---|---|---|---|
/var/lib/kubelet/pki/kubelet.crt | 1 year | worker-ca@random-id | Kubelet CA |
| Path | Key type | Key length (bits) |
|---|---|---|
kubelet.key | RSA | 2048 |
Certificates are stored under /var/lib/kubelet/pki on the worker nodes.
| Path | Valid for | CN | Parent CA | O (Subject) | Kind | Extra SANs | Notes |
|---|---|---|---|---|---|---|---|
kubelet-client-current.pem | 1 year | system:node:worker | kubernetes | system:nodes | client | - | Symlink to kubelet-client-timestamp.pem |
kubelet.crt | 1 year | worker@random-id | worker-ca@random-id | - | server | Hostname | - |
| Path | Key type | Key length (bits) |
|---|---|---|
kubelet.key | RSA | 2048 |
Please refer to the SUSE CaaS Platform Administration Guide for more information on the rotation of certificates.
Cluster lifecycle is managed using skuba. It enables you to
manage nodes in your cluster:
Bootstrap a new cluster
Join new machines to the cluster
Master nodes
Worker nodes
Remove nodes from the cluster
Master nodes
Worker nodes
Creating a cluster definition is the first step to initialize your cluster. You can execute this task as follows:
~/clusters$ skuba cluster init --control-plane 10.86.3.149 <CLUSTER_NAME> [init] configuration files written to /home/my-user/clusters/<CLUSTER_NAME>
This operation happens strictly offline and will generate a folder structure like the following:
~/clusters > tree <CLUSTER_NAME>/
<CLUSTER_NAME>/
├── addons
│ ├── cilium
│ │ ├── base
│ │ │ └── cilium.yaml
│ │ └── patches
│ ├── cri
│ │ └── default_flags
│ ├── dex
│ │ ├── base
│ │ │ └── dex.yaml
│ │ └── patches
│ ├── gangway
│ │ ├── base
│ │ │ └── gangway.yaml
│ │ └── patches
│ ├── kured
│ │ ├── base
│ │ │ └── kured.yaml
│ │ └── patches
│ └── psp
│ ├── base
│ │ └── psp.yaml
│ └── patches
├── kubeadm-init.conf
└── kubeadm-join.conf.d
├── master.conf.template
└── worker.conf.template
18 directories, 9 filesAt this point, you can inspect all generated files, and if desired you can experiment by providing your custom settings using with declarative management of Kubernetes objects using Kustomize.
To provide custom settings of the form of strategic merge patch or a JSON 6902 patch go to addons patches directory for example addons/dex/pathes and create custom settings file for example addons/dex/pathes/custom.yaml
Read https://github.com/kubernetes-sigs/kustomize/blob/master/docs/glossary.md#patchstrategicmerge and https://github.com/kubernetes-sigs/kustomize/blob/master/docs/glossary.md#patchjson6902 to get more information.
apiVersion: apps/v1 kind: Deployment metadata: name: oidc-dex namespace: kube-system spec: replicas: 1 revisionHistoryLimit: 5
From within your cluster definition folder, you can bootstrap your first master machine:
~/clusters/<CLUSTER_NAME>$ skuba node bootstrap --user sles --sudo --target <IP_ADDRESS/FQDN> <NODENAME>
This operation will read the kubeadm-init.conf file inside your
cluster definition, will forcefully set certain settings to the ones
required by SUSE CaaS Platform and will bootstrap the node remotely.
Prior to bootstrap it’s possible for you to tweak the configuration that will be used to create the cluster. You can:
Tweak the default Pod Security Policies or create extra ones. If you
place extra Pod Security Policies in the addons/psp/base folder, those
will be created as well when the bootstrap is completed. You can
also modify the default ones and/or remove them.
Inspect the kubeadm-init.conf and set extra configuration settings
supported by kubeadm. The latest supported version is v1beta1.
After this operation has completed several modifications will have happened on your cluster definition folder:
The kubeadm-init.conf file will contain the final complete
contents used to bootstrap the node, so you can inspect what
exact configuration was used to bootstrap the node.
An admin.conf file will be created on your cluster definition file,
this is a kubeconfig file that has complete access to the cluster
and uses client certificates for authenticating against the cluster.
Adding new master nodes to the cluster can be achieved by executing
the following skuba command:
~/clusters/<CLUSTER_NAME>$ skuba node join --role master --user sles --sudo --target <IP_ADDRESS/FQDN> <NODENAME>
This operation will try to read the kubeadm-join.conf.d/<IP_ADDRESS/FQDN>.conf
file if it exists. This allows you to set specific settings for this
new node prior to joining it (a similar procedure to how
kubeadm-init.conf file behaves when bootstrapping). If this file
does not exist, skuba will read the kubeadm-join.conf.d/master.conf.template
instead and will create this file automatically when skuba node
join is called.
This operation will increase the etcd member count by one, so it’s
recommended to always keep an odd number of master nodes because as
described in previous sections an even number of nodes does not
improve fault tolerance.
Adding new worker nodes to the cluster can be achieved by executing
the following skuba command:
~/clusters/<CLUSTER_NAME>$ skuba node join --role worker --user sles --sudo --target <IP_ADDRESS/FQDN> <NODENAME>
This operation will try to read the kubeadm-join.conf.d/<IP_ADDRESS/FQDN>.conf
file if it exists. This allows you to set specific settings for this
new node prior to joining it (a similar procedure to how
kubeadm-init.conf file behaves when bootstrapping). If this file
does not exist, skuba will read the kubeadm-join.conf.d/worker.conf.template
instead and will create this file automatically when skuba node
join is called.
Removing nodes from the cluster requires you to execute skuba
from a folder containing an admin.conf file, because this operation
is performed exclusively using Kubernetes, and no access to the node
being removed or other nodes for that matter is required.
For removing a node, the following command has to be executed:
~/clusters/<CLUSTER_NAME>$ skuba node remove <NODENAME>
If the node to be removed is a master, specific actions will be automatically executed, like removing the etcd member from the cluster. Note that this node cannot be added back to the cluster or any other skuba-initiated kubernetes cluster without reinstalling first.