This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.
Currently we support the following platforms to deploy on:
SUSE OpenStack Cloud 8
VMware ESXi 6.7
KVM
Bare Metal x86_64
Amazon Web Services (technological preview)
SUSE CaaS Platform itself is based on SLE 15 SP2.
The steps for obtaining the correct installation image for each platform type are detailed in the respective platform deployment instructions.
SUSE CaaS Platform consists of a number of (virtual) machines that run as a cluster.
You will need at least two machines:
1 master node
1 worker node
SUSE CaaS Platform 4.5.2 supports deployments with a single or multiple master nodes. Production environments must be deployed with multiple master nodes for resilience.
All communication to the cluster is done through a load balancer talking to the respective nodes. For that reason any failure tolerant environment must provide at least two load balancers for incoming communication.
The minimal viable failure tolerant production environment configuration consists of:
3 master nodes
2 worker nodes
All cluster nodes must be dedicated (virtual) machines reserved for the purpose of running SUSE CaaS Platform.
Fault tolerant load balancing solution
(for example SUSE Linux Enterprise High Availability Extension with pacemaker and haproxy)
1 management workstation
In order to deploy and control a SUSE CaaS Platform cluster you will need at least one
machine capable of running skuba. This typically is a regular desktop workstation or laptop
running SLE 15 SP2 or later.
The skuba CLI package is available from the SUSE CaaS Platform module.
You will need a valid SUSE Linux Enterprise and SUSE CaaS Platform subscription to install this tool on the workstation.
It is vital that the management workstation runs an NTP client and that time synchronization is configured to the same NTP servers, which you will use later to synchronize the cluster nodes.
The storage sizes in the following lists are absolute minimum configurations.
Sizing of the storage for worker nodes depends largely on the expected amount of container images, their size and change rate.
The basic operating system for all nodes might also include snapshots (when using btrfs) that can quickly fill up existing space.
We recommend provisioning a separate storage partition for container images on each (worker) node that can be adjusted in size when needed.
Storage for /var/lib/containers on the worker nodes should be approximately 50GB in addition to the base OS storage.
Up to 5 worker nodes (minimum):
Storage: 50 GB+
(v)CPU: 2
RAM: 4 GB
Network: Minimum 1Gb/s (faster is preferred)
Up to 10 worker nodes:
Storage: 50 GB+
(v)CPU: 2
RAM: 8 GB
Network: Minimum 1Gb/s (faster is preferred)
Up to 100 worker nodes:
Storage: 50 GB+
(v)CPU: 4
RAM: 16 GB
Network: Minimum 1Gb/s (faster is preferred)
Up to 250 worker nodes:
Storage: 50 GB+
(v)CPU: 8
RAM: 16 GB
Network: Minimum 1Gb/s (faster is preferred)
Using a minimum of 2 (v)CPUs is a hard requirement, deploying a cluster with less processing units is not possible.
The worker nodes must have sufficient memory, CPU and disk space for the Pods/containers/applications that are planned to be hosted on these workers.
A worker node requires the following resources:
CPU cores: 1.250
RAM: 1.2 GB
Based on these values, the minimal configuration of a worker node is:
Storage: Depending on workloads, minimum 20-30 GB to hold the base OS and required packages. Mount additional storage volumes as needed.
(v)CPU: 2
RAM: 2 GB
Network: Minimum 1Gb/s (faster is preferred)
Calculate the size of the required (v)CPU by adding up the base requirements, the estimated additional essential cluster components (logging agent, monitoring agent, configuration management, etc.) and the estimated CPU workloads:
1.250 (base requirements) + 0.250 (estimated additional cluster components) + estimated workload CPU requirements
Calculate the size of the RAM using a similar formula:
1.2 GB (base requirements) + 500 MB (estimated additional cluster components) + estimated workload RAM requirements
These values are provided as a guide to work in most cases. They may vary based on the type of the running workloads.
For master nodes you must ensure storage performance of at least 50 to 500 sequential IOPS with disk bandwidth depending on your cluster size. It is highly recommended to use SSD.
"Typically 50 sequential IOPS (for example, a 7200 RPM disk) is required. For heavily loaded clusters, 500 sequential IOPS (for example, a typical local SSD or a high performance virtualized block device) is recommended."
"Typically 10MB/s will recover 100MB data within 15 seconds. For large clusters, 100MB/s or higher is suggested for recovering 1GB data within 15 seconds."
https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#disks
This is extremely important to ensure a proper functioning of the critical component etcd.
It is possible to preliminary validate these requirements by using fio. This tool allows us to simulate etcd I/O (input/output) and to find out from the output statistics whether or not the storage is suitable.
Install the tool:
zypper in -y fioRun the testing:
fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-etcd-dir --size=22m --bs=2300 --name=test-etcd-ioReplace test-etcd-dir with a directory located on the same disk as the incoming etcd data under /var/lib/etcd
From the outputs, the interesting part is fsync/fdatasync/sync_file_range where the values are expressed in microseconds (usec). A disk is considered sufficient when the value of the 99.00th percentile is below 10000usec (10ms).
Be careful though, this benchmark is for etcd only and does not take into consideration external disk usage. This means that a value slightly under 10ms should be taken with precaution as other workloads will have an impact on the disks.
If the storage is very slow, the values can be expressed directly in milliseconds.
Let’s see two different examples:
[...]
fsync/fdatasync/sync_file_range:
sync (usec): min=251, max=1894, avg=377.78, stdev=69.89
sync percentiles (usec):
| 1.00th=[ 273], 5.00th=[ 285], 10.00th=[ 297], 20.00th=[ 330],
| 30.00th=[ 343], 40.00th=[ 355], 50.00th=[ 367], 60.00th=[ 379],
| 70.00th=[ 400], 80.00th=[ 424], 90.00th=[ 465], 95.00th=[ 506],
| 99.00th=[ 594], 99.50th=[ 635], 99.90th=[ 725], 99.95th=[ 742], 1
| 99.99th=[ 1188]
[...]Here we get a value of 594usec (0.5ms) so the storage meets the requirements. |
[...]
fsync/fdatasync/sync_file_range:
sync (msec): min=10, max=124, avg=17.62, stdev= 3.38
sync percentiles (usec):
| 1.00th=[11731], 5.00th=[11994], 10.00th=[12911], 20.00th=[16712],
| 30.00th=[17695], 40.00th=[17695], 50.00th=[17695], 60.00th=[17957],
| 70.00th=[17957], 80.00th=[17957], 90.00th=[19530], 95.00th=[22676],
| 99.00th=[28705], 99.50th=[30016], 99.90th=[41681], 99.95th=[59507], 1
| 99.99th=[89654]
[...]Here we get a value of 28705usec (28ms) so the storage clearly does not meet the requirements. |
The management workstation needs at least the following networking permissions:
SSH access to all machines in the cluster
Access to the apiserver (the load balancer should expose it, port 6443), that will in turn talk to any master in the cluster
Access to Dex on the configured NodePort (the load balancer should expose it, port 32000) so when the OIDC token has expired, kubectl can request a new token using the refresh token
It is good security practice not to expose the kubernetes API server on the public internet. Use network firewalls that only allow access from trusted subnets.
The service subnet and pod subnet must not overlap.
Please plan generously for workload and the expected size of the networks before bootstrapping.
The default pod subnet is 10.244.0.0/16. It allows for 65536 IP addresses overall.
Assignment of CIDR’s is by default /24 (254 usable IP addresses per node).
The default node allocation of /24 means a hard cluster node limit of 256 since this is the number of /24 ranges that fit in a /16 range.
Depending on the size of the nodes that you are planning to use (in terms of resources), or on the number of nodes you are planning to have, the CIDR can be adjusted to be bigger on a per node basis but the cluster would accommodate less nodes overall.
If you are planning to use more or less pods per node or have a higher number of nodes, you can adjust these settings to match your requirements. Please make sure that the networks are suitably sized to adjust to future changes in the cluster.
You can also adjust the service subnet size, this subnet must not overlap with the pod CIDR, and it should be big enough to accommodate all services.
For more advanced network requirements please refer to: https://docs.cilium.io/en/v1.6/concepts/ipam/#address-management
| Node | Port | Protocol | Accessibility | Description |
|---|---|---|---|---|
All nodes | 22 | TCP | Internal | SSH (required in public clouds) |
4240 | TCP | Internal | Cilium health check | |
8472 | UDP | Internal | Cilium VXLAN | |
10250 | TCP | Internal | Kubelet (API server → kubelet communication) | |
10256 | TCP | Internal | kube-proxy health check | |
30000 - 32767 | TCP + UDP | Internal | Range of ports used by Kubernetes when allocating services of type | |
32000 | TCP | External | Dex (OIDC Connect) | |
32001 | TCP | External | Gangway (RBAC Authenticate) | |
Masters | 2379 | TCP | Internal | etcd (client communication) |
2380 | TCP | Internal | etcd (server-to-server traffic) | |
6443 | TCP | Internal / External | Kubernetes API server |
Using IPv6 addresses is currently not supported.
All nodes must be assigned static IPv4 addresses, which must not be changed manually afterwards.
Plan carefully for required IP ranges and future scenarios as it is not possible to reconfigure the IP ranges once the deployment is complete.
The Kubernetes networking model requires that your nodes have IP forwarding enabled in the kernel.
skuba checks this value when installing your cluster and installs a rule in /etc/sysctl.d/90-skuba-net-ipv4-ip-forward.conf to make it persistent.
Other software can potentially install rules with higher priority overriding this value and causing machines to not behave as expected after rebooting.
You can manually check if this is enabled using the following command:
# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1net.ipv4.ip_forward must be set to 1. Additionally, you can check in what order persisted rules are processed by running sysctl --system -a.
Besides the SUSE provided packages and containers, SUSE CaaS Platform is typically used with third party provided containers and charts.
The following SUSE provided resources must be available:
| URL | Name | Purpose |
|---|---|---|
scc.suse.com | SUSE Customer Center | Allow registration and license activation |
registry.suse.com | SUSE container registry | Provide container images |
*.cloudfront.net | Cloudfront | CDN/distribution backend for |
kubernetes-charts.suse.com | SUSE helm charts repository | Provide helm charts |
updates.suse.com | SUSE package update channel | Provide package updates |
If you wish to use Upstream / Third-Party resources, please also allow the following:
| URL | Name | Purpose |
|---|---|---|
k8s.gcr.io | Google Container Registry | Provide container images |
kubernetes-charts.storage.googleapis.com | Google Helm charts repository | Provide helm charts |
docker.io | Docker Container Registry | Provide container images |
quay.io | Red Hat Container Registry | Provide container images |
Please note that not all installation scenarios will need all of these resources.
If you are deploying into an air gap scenario, you must ensure that the resources required from these locations are present and available on your internal mirror server.
Please make sure that all your Kubernetes components can communicate with each other. This might require the configuration of routing when using multiple network adapters per node.
Refer to: https://v1-18.docs.kubernetes.io/docs/setup/independent/install-kubeadm/#check-network-adapters.
Configure firewall and other network security to allow communication on the default ports required by Kubernetes: https://v1-18.docs.kubernetes.io/docs/setup/independent/install-kubeadm/#check-required-ports
All master nodes of the cluster must have a minimum 1Gb/s network connection to fulfill the requirements for etcd.
"1GbE is sufficient for common etcd deployments. For large etcd clusters, a 10GbE network will reduce mean time to recovery."
https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#network
Do not grant access to the kubeconfig file or any workstation configured with this configuration to unauthorized personnel. In the current state, full administrative access is granted to the cluster.
Authentication is done via the kubeconfig file generated during deployment. This file will grant full access to the cluster and all workloads. Apply best practices for access control to workstations configured to administer the SUSE CaaS Platform cluster.
The SUSE CaaS Platform leverages Kubernetes role-based access control (RBAC) for authentication and will need to have an external authentication server such as LDAP, Active Directory, or similar to validate the user’s entity and grant different user roles or cluster role permission.
Some addon services are desired to be highly available. These services require enough cluster nodes available to run replicas of their services.
When the cluster is deployed with enough nodes for replica sizes, those service distributions will be balanced across the cluster.
For clusters deployed with a node number lower than the default replica sizes, services will still try to find a suitable node to run on. However it is likely you will see services all running on the same nodes, defeating the purpose of high availability.
You can check the deployment replica size after node bootstrap. The number of cluster nodes should be equal or greater than the DESIRED replica size.
kubectl get rs -n kube-system
After deployment, if the number of healthy nodes falls below the number required for fulfilling the replica sizing, service replicas will show in Pending state until either the unhealthy node recovers or a new node is joined to cluster.
The following describes two methods for replica management if you wish to work with a cluster below the default replica size requirement.
One method is to update the number of overall replicas being created by a service. Please consult the documentation of your respective service what the replica limits for proper high availability are. In case the replica number is too high for the cluster, you must increase the cluster size to provide more resources.
Update deployment replica size before node joining.
You can use the same steps to increase the replica size again if more resources become available later on.
kubectl -n kube-system scale --replicas=<DESIRED_REPLICAS> deployment <NAME>
Join new nodes.
When multiple replicas are running on the same pod you will want to redistribute those manually to ensure proper high availability,
Find the pod for re-distribution.
Check the NAME and NODE column for duplicated pods.
kubectl -n kube-system get pod -o wide
Delete duplicated pod. This will trigger another pod creation.
kubectl -n kube-system delete pod <POD_NAME>