This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.
In order to deploy SUSE CaaS Platform you need a workstation running SUSE Linux Enterprise Server 15 SP2 or similar openSUSE equivalent. This workstation is called the "Management machine". Important files are generated and must be maintained on this machine, but it is not a member of the SUSE CaaS Platform cluster.
In order to successfully deploy SUSE CaaS Platform, you need to have SSH keys loaded into an SSH agent. This is important, because it is required in order to use the installation tools skuba and terraform.
The use of ssh-agent comes with some implications for security that you should take into consideration.
The pitfalls of using ssh-agent
To avoid these risks please make sure to either use ssh-agent -t <TIMEOUT> and specify a time
after which the agent will self-terminate, or terminate the agent yourself before logging out by running ssh-agent -k.
To log in to the created cluster nodes from the Management machine, you need to configure an SSH key pair.
This key pair needs to be trusted by the user account you will log in with into each cluster node; that user is called "sles" by default.
In order to use the installation tools terraform and skuba, this trusted keypair must be loaded into the SSH agent.
If you do not have an existing ssh keypair to use, run:
ssh-keygen -t ecdsa
The ssh-agent or a compatible program is sometimes started automatically by graphical desktop
environments. If that is not your situation, run:
eval "$(ssh-agent)"
This will start the agent and set environment variables used for agent
communication within the current session. This has to be the same terminal session
that you run the skuba commands in. A new terminal usually requires a new ssh-agent.
In some desktop environments the ssh-agent will also automatically load the SSH keys.
To add an SSH key manually, use the ssh-add command:
ssh-add <PATH_TO_KEY>
If you are adding the SSH key manually, specify the full path.
For example: /home/sles/.ssh/id_rsa
You can load multiple keys into your agent using the ssh-add <PATH_TO_KEY> command.
Keys should be password protected as a security measure. The
ssh-add command will prompt for your password, then the agent caches the
decrypted key material for a configurable lifetime. The -t lifetime option to
ssh-add specifies a maximum time to cache the specific key. See man ssh-add for
more information.
The ssh key is decrypted when loaded into the key agent. Though the key itself is not
accesible from the agent, anyone with access to the agent’s control socket file can use
the private key contents to impersonate the key owner. By default, socket access is
limited to the user who launched the agent. None the less, it is good security practice
to specify an expiration time for the decrypted key using the -t option.
For example: ssh-add -t 1h30m $HOME/.ssh/id.ecdsa would expire the decrypted key in
1.5 hours.
.
Alternatively, ssh-agent can also be launched with -t to specify a default timeout.
For example: eval $( ssh-agent -t 120s ) would default to a two minute (120 second)
timeout for keys added. If timeouts are specified for both programs, the timeout from
ssh-add is used.
See man ssh-agent and man ssh-add for more information.
ssh-agentSkuba will try all the identities loaded into the ssh-agent until one of
them grants access to the node, or until the SSH server’s maximum authentication attempts are exhausted.
This could lead to undesired messages in SSH or other security/authentication logs on your local machine.
It is also possible to forward the authentication agent connection from a
host to another, which can be useful if you intend to run skuba on
a "jump host" and don’t want to copy your private key to this node.
This can be achieved using the ssh -A command. Please refer to the man page
of ssh to learn about the security implications of using this feature.
The registration code for SUSE CaaS Platform.4 also contains the activation permissions for the underlying SUSE Linux Enterprise operating system. You can use your SUSE CaaS Platform registration code to activate the SUSE Linux Enterprise Server 15 SP2 subscription during installation.
You need a subscription registration code to use SUSE CaaS Platform. You can retrieve your registration code from SUSE Customer Center.
Login to https://scc.suse.com
Navigate to
Select the tab from the menu bar at the top
Search for "CaaS Platform"
Select the version you wish to deploy (should be the highest available version)
Click on the Link in the column
The registration code should be displayed as the first line under "Subscription Information"
If you can not find SUSE CaaS Platform in the list of subscriptions please contact your local administrator responsible for software subscriptions or SUSE support.
During deployment of the cluster nodes, each machine will be assigned a unique ID in the /etc/machine-id file by Terraform or AutoYaST.
If you are using any (semi-)manual methods of deployments that involve cloning of machines and deploying from templates,
you must make sure to delete this file before creating the template.
If two nodes are deployed with the same machine-id, they will not be correctly recognized by skuba.
In case you are not using Terraform or AutoYaST you must regenerate machine IDs manually.
During the template preparation you will have removed the machine ID from the template image. This ID is required for proper functionality in the cluster and must be (re-)generated on each machine.
Log in to each virtual machine created from the template and run:
rm /etc/machine-id dbus-uuidgen --ensure systemd-machine-id-setup systemctl restart systemd-journald
This will regenerate the machine id values for DBUS (/var/lib/dbus/machine-id) and systemd (/etc/machine-id) and restart the logging service to make use of the new IDs.
For any deployment type you will need skuba and Terraform. These packages are
available from the SUSE CaaS Platform package sources. They are provided as an installation
"pattern" that will install dependencies and other required packages in one simple step.
Access to the packages requires the SUSE CaaS Platform, Containers and Public Cloud extension modules.
Enable the modules during the operating system installation or activate them using SUSE Connect.
sudo SUSEConnect -r <CAASP_REGISTRATION_CODE> 1
sudo SUSEConnect -p sle-module-containers/15.2/x86_64 2
sudo SUSEConnect -p sle-module-public-cloud/15.2/x86_64 3
sudo SUSEConnect -p caasp/4.5/x86_64 -r <CAASP_REGISTRATION_CODE> 4Activate SUSE Linux Enterprise | |
Add the free | |
Add the free | |
Add the SUSE CaaS Platform extension with your registration code |
Install the required tools:
sudo zypper in -t pattern SUSE-CaaSP-Management
This will install the skuba command line tool and Terraform; as well
as various default configurations and examples.
Sometimes you need a proxy server to be able to connect to the SUSE Customer Center. If you have not already configured a system-wide proxy, you can temporarily do so for the duration of the current shell session like this:
Expose the environmental variable http_proxy:
export http_proxy=http://PROXY_IP_FQDN:PROXY_PORT
Replace <PROXY_IP_FQDN> by the IP address or a fully qualified domain name (FQDN) of the
proxy server and <PROXY_PORT> by its port.
If you use a proxy server with basic authentication, create the file $HOME/.curlrc
with the following content:
--proxy-user "<USER>:<PASSWORD>"
Replace <USER> and <PASSWORD> with the credentials of an allowed user for the proxy server, and consider limiting access to the file (chmod 0600).
Setting up a load balancer is mandatory in any production environment.
SUSE CaaS Platform requires a load balancer to distribute workload between the deployed master nodes of the cluster. A failure-tolerant SUSE CaaS Platform cluster will always use more than one control plane node as well as more than one load balancer, so there isn’t a single point of failure.
There are many ways to configure a load balancer. This documentation cannot describe all possible combinations of load balancer configurations and thus does not aim to do so. Please apply your organization’s load balancing best practices.
For SUSE OpenStack Cloud, the Terraform configurations shipped with this version will automatically deploy a suitable load balancer for the cluster.
For bare metal, KVM, or VMware, you must configure a load balancer manually and allow it access to all master nodes created during Section 3.5, “Bootstrapping the Cluster”.
The load balancer should be configured before the actual deployment. It is needed during the cluster bootstrap, and also during upgrades. To simplify configuration, you can reserve the IPs needed for the cluster nodes and pre-configure these in the load balancer.
The load balancer needs access to port 6443 on the apiserver (all master nodes)
in the cluster. It also needs access to Gangway port 32001 and Dex port 32000
on all master and worker nodes in the cluster for RBAC authentication.
We recommend performing regular HTTPS health checks on each master node /healthz
endpoint to verify that the node is responsive. This is particularly important during
upgrades, when a master node restarts the apiserver. During this rather short time
window, all requests have to go to another master node’s apiserver. The master node
that is being upgraded will have to be marked INACTIVE on the load balancer pool
at least during the restart of the apiserver. We provide reasonable defaults
for that on our default openstack load balancer Terraform configuration.
The following contains examples for possible load balancer configurations based on SUSE Linux Enterprise Server 15 SP2 and nginx or HAProxy.
For TCP load balancing, we can use the ngx_stream_module module (available since version 1.9.0). In this mode, nginx will just forward the TCP packets to the master nodes.
The default mechanism is round-robin so each request will be distributed to a different server.
The open source version of Nginx referred to in this guide only allows the use of
passive health checks. nginx will mark a node as unresponsive only after
a failed request. The original request is lost and not forwarded to an available
alternative server.
This load balancer configuration is therefore only suitable for testing and proof-of-concept (POC) environments.
For production environments, we recommend the use of SUSE Linux Enterprise High Availability Extension 15
Register SLES and enable the "Server Applications" module:
SUSEConnect -r CAASP_REGISTRATION_CODE
SUSEConnect --product sle-module-server-applications/15.2/x86_64Install Nginx:
zypper in nginxWrite the configuration in /etc/nginx/nginx.conf:
user nginx;
worker_processes auto;
load_module /usr/lib64/nginx/modules/ngx_stream_module.so;
error_log /var/log/nginx/error.log;
error_log /var/log/nginx/error.log notice;
error_log /var/log/nginx/error.log info;
events {
worker_connections 1024;
use epoll;
}
stream {
log_format proxy '$remote_addr [$time_local] '
'$protocol $status $bytes_sent $bytes_received '
'$session_time "$upstream_addr"';
error_log /var/log/nginx/k8s-masters-lb-error.log;
access_log /var/log/nginx/k8s-masters-lb-access.log proxy;
upstream k8s-masters {
#hash $remote_addr consistent; 1
server master00:6443 weight=1 max_fails=2 fail_timeout=5s; 2
server master01:6443 weight=1 max_fails=2 fail_timeout=5s;
server master02:6443 weight=1 max_fails=2 fail_timeout=5s;
}
server {
listen 6443;
proxy_connect_timeout 5s;
proxy_timeout 30s;
proxy_pass k8s-masters;
}
upstream dex-backends {
#hash $remote_addr consistent; 3
server master00:32000 weight=1 max_fails=2 fail_timeout=5s; 4
server master01:32000 weight=1 max_fails=2 fail_timeout=5s;
server master02:32000 weight=1 max_fails=2 fail_timeout=5s;
}
server {
listen 32000;
proxy_connect_timeout 5s;
proxy_timeout 30s;
proxy_pass dex-backends; 5
}
upstream gangway-backends {
#hash $remote_addr consistent; 6
server master00:32001 weight=1 max_fails=2 fail_timeout=5s; 7
server master01:32001 weight=1 max_fails=2 fail_timeout=5s;
server master02:32001 weight=1 max_fails=2 fail_timeout=5s;
}
server {
listen 32001;
proxy_connect_timeout 5s;
proxy_timeout 30s;
proxy_pass gangway-backends; 8
}
}Note: To enable session persistence, uncomment the | |
Replace the individual | |
Dex port |
Configure firewalld to open up port 6443. As root, run:
firewall-cmd --zone=public --permanent --add-port=6443/tcp
firewall-cmd --zone=public --permanent --add-port=32000/tcp
firewall-cmd --zone=public --permanent --add-port=32001/tcp
firewall-cmd --reloadStart and enable Nginx. As root, run:
systemctl enable --now nginxThe SUSE CaaS Platform cluster must be up and running for this to produce any useful results. This step can only be performed after Section 3.5, “Bootstrapping the Cluster” is completed successfully.
To verify that the load balancer works, you can run a simple command to repeatedly retrieve cluster information from the master nodes. Each request should be forwarded to a different master node.
From your workstation, run:
while true; do skuba cluster status; sleep 1; done;There should be no interruption in the skuba cluster status running command.
On the load balancer virtual machine, check the logs to validate that each request is correctly distributed in a round robin way.
# tail -f /var/log/nginx/k8s-masters-lb-access.log
10.0.0.47 [17/May/2019:13:49:06 +0000] TCP 200 2553 1613 1.136 "10.0.0.145:6443"
10.0.0.47 [17/May/2019:13:49:08 +0000] TCP 200 2553 1613 0.981 "10.0.0.148:6443"
10.0.0.47 [17/May/2019:13:49:10 +0000] TCP 200 2553 1613 0.891 "10.0.0.7:6443"
10.0.0.47 [17/May/2019:13:49:12 +0000] TCP 200 2553 1613 0.895 "10.0.0.145:6443"
10.0.0.47 [17/May/2019:13:49:15 +0000] TCP 200 2553 1613 1.157 "10.0.0.148:6443"
10.0.0.47 [17/May/2019:13:49:17 +0000] TCP 200 2553 1613 0.897 "10.0.0.7:6443"HAProxy is available as a supported package with a SUSE Linux Enterprise High Availability Extension 15 subscription.
Alternatively, you can install HAProxy from SUSE Package Hub but you will not receive product support for this component.
HAProxy is a very powerful load balancer application which is suitable for production environments.
Unlike the open source version of nginx mentioned in the example above, HAProxy supports active health checking which is a vital function for reliable cluster health monitoring.
The version used at this date is the 1.8.7.
The configuration of an HA cluster is out of the scope of this document.
The default mechanism is round-robin so each request will be distributed to a different server.
The health-checks are executed every two seconds. If a connection fails, the check will be retried two times with a timeout of five seconds for each request.
If no connection succeeds within this interval (2x5s), the node will be marked as DOWN and no traffic will be sent until the checks succeed again.
Register SLES and enable the "Server Applications" module:
SUSEConnect -r CAASP_REGISTRATION_CODE
SUSEConnect --product sle-module-server-applications/15.2/x86_64Enable the source for the haproxy package:
If you are using the SUSE Linux Enterprise High Availability Extension
SUSEConnect --product sle-ha/15.2/x86_64 -r ADDITIONAL_REGCODE
If you want the free (unsupported) package:
SUSEConnect --product PackageHub/15.2/x86_64
Configure /dev/log for HAProxy chroot (optional)
This step is only required when HAProxy is configured to run in a jail directory (chroot). This is highly recommended since it increases the security of HAProxy.
Since HAProxy is chrooted, it’s necessary to make the log socket available inside the jail directory so HAProxy can send logs to the socket.
mkdir -p /var/lib/haproxy/dev/ && touch /var/lib/haproxy/dev/logThis systemd service will take care of mounting the socket in the jail directory.
cat > /etc/systemd/system/bindmount-dev-log-haproxy-chroot.service <<EOF
[Unit]
Description=Mount /dev/log in HAProxy chroot
After=systemd-journald-dev-log.socket
Before=haproxy.service
[Service]
Type=oneshot
ExecStart=/bin/mount --bind /dev/log /var/lib/haproxy/dev/log
[Install]
WantedBy=multi-user.target
EOFEnabling the service will make the changes persistent after a reboot.
systemctl enable --now bindmount-dev-log-haproxy-chroot.serviceInstall HAProxy:
zypper in haproxyWrite the configuration in /etc/haproxy/haproxy.cfg:
Replace the individual <MASTER_XX_IP_ADDRESS> with the IP of your actual master nodes (one entry each) in the server lines.
Feel free to leave the name argument in the server lines (master00 and etc.) as is - it only serves as a label that will show up in the haproxy logs.
global log /dev/log local0 info 1 chroot /var/lib/haproxy 2 user haproxy group haproxy daemon defaults mode tcp log global option tcplog option redispatch option tcpka retries 2 http-check expect status 200 3 default-server check check-ssl verify none timeout connect 5s timeout client 5s timeout server 5s timeout tunnel 86400s 4 listen stats 5 bind *:9000 mode http stats hide-version stats uri /stats listen apiserver 6 bind *:6443 option httpchk GET /healthz server master00 <MASTER_00_IP_ADDRESS>:6443 server master01 <MASTER_01_IP_ADDRESS>:6443 server master02 <MASTER_02_IP_ADDRESS>:6443 listen dex 7 bind *:32000 option httpchk GET /healthz server master00 <MASTER_00_IP_ADDRESS>:32000 server master01 <MASTER_01_IP_ADDRESS>:32000 server masetr02 <MASTER_02_IP_ADDRESS>:32000 listen gangway 8 bind *:32001 option httpchk GET / server master00 <MASTER_00_IP_ADDRESS>:32001 server master01 <MASTER_01_IP_ADDRESS>:32001 server master02 <MASTER_02_IP_ADDRESS>:32001
Forward the logs to systemd journald, the log level can be set to | |
Define if it will run in a | |
This timeout is set to | |
URL to expose | |
The performed health checks will expect a | |
Kubernetes apiserver listening on port | |
Dex listening on port | |
Gangway listening on port |
Configure firewalld to open up port 6443. As root, run:
firewall-cmd --zone=public --permanent --add-port=6443/tcp
firewall-cmd --zone=public --permanent --add-port=32000/tcp
firewall-cmd --zone=public --permanent --add-port=32001/tcp
firewall-cmd --reloadStart and enable HAProxy. As root, run:
systemctl enable --now haproxyThe SUSE CaaS Platform cluster must be up and running for this to produce any useful results. This step can only be performed after Section 3.5, “Bootstrapping the Cluster” is completed successfully.
To verify that the load balancer works, you can run a simple command to repeatedly retrieve cluster information from the master nodes. Each request should be forwarded to a different master node.
From your workstation, run:
while true; do skuba cluster status; sleep 1; done;There should be no interruption in the skuba cluster status running command.
On the load balancer virtual machine, check the logs to validate that each request is correctly distributed in a round robin way.
# journalctl -flu haproxy
haproxy[2525]: 10.0.0.47:59664 [30/Sep/2019:13:33:20.578] apiserver apiserver/master00 1/0/578 9727 -- 18/18/17/3/0 0/0
haproxy[2525]: 10.0.0.47:59666 [30/Sep/2019:13:33:22.476] apiserver apiserver/master01 1/0/747 9727 -- 18/18/17/7/0 0/0
haproxy[2525]: 10.0.0.47:59668 [30/Sep/2019:13:33:24.522] apiserver apiserver/master02 1/0/575 9727 -- 18/18/17/7/0 0/0
haproxy[2525]: 10.0.0.47:59670 [30/Sep/2019:13:33:26.386] apiserver apiserver/master00 1/0/567 9727 -- 18/18/17/3/0 0/0
haproxy[2525]: 10.0.0.47:59678 [30/Sep/2019:13:33:28.279] apiserver apiserver/master01 1/0/575 9727 -- 18/18/17/7/0 0/0
haproxy[2525]: 10.0.0.47:59682 [30/Sep/2019:13:33:30.174] apiserver apiserver/master02 1/0/571 9727 -- 18/18/17/7/0 0/0