2 Hardware Requirements and Recommendations #
The hardware requirements of Ceph are heavily dependent on the IO workload. The following hardware requirements and recommendations should be considered as a starting point for detailed planning.
In general, the recommendations given in this section are on a per-process basis. If several processes are located on the same machine, the CPU, RAM, disk and network requirements need to be added up.
2.1 Object Storage Nodes #
2.1.1 Minimum Requirements #
At least 4 OSD nodes, with 8 OSD disks each, are required.
For OSDs that do not use BlueStore, 1 GB of RAM per terabyte of raw OSD capacity is minimally required for each OSD storage node. 1.5 GB of RAM per terabyte of raw OSD capacity is recommended. During recovery, 2 GB of RAM per terabyte of raw OSD capacity may be optimal.
For OSDs that use BlueStore, first calculate the size of RAM that is recommended for OSDs that do not use BlueStore, then calculate 2 GB plus the size of the BlueStore cache of RAM is recommended for each OSD process, and choose the bigger value of RAM of the two results. Note that the default BlueStore cache is 1 GB for HDD and 3 GB for SSD drives by default. In summary, pick the greater of:
[1GB * OSD count * OSD size]
or
[(2 + BS cache) * OSD count]
1.5 GHz of a logical CPU core per OSD is minimally required for each OSD daemon process. 2 GHz per OSD daemon process is recommended. Note that Ceph runs one OSD daemon process per storage disk; do not count disks reserved solely for use as OSD journals, WAL journals, omap metadata, or any combination of these three cases.
10 Gb Ethernet (two network interfaces bonded to multiple switches).
OSD disks in JBOD configurations.
OSD disks should be exclusively used by SUSE Enterprise Storage 5.5.
Dedicated disk/SSD for the operating system, preferably in a RAID 1 configuration.
If this OSD host will host part of a cache pool used for cache tiering, allocate at least an additional 4 GB of RAM.
For disk performance reasons, we recommend using bare metal for OSD nodes and not virtual machines.
2.1.2 Minimum Disk Size #
There are two types of disk space needed to run on OSD: the space for the disk journal (for FileStore) or WAL/DB device (for BlueStore), and the primary space for the stored data. The minimum (and default) value for the journal/WAL/DB is 6 GB. The minimum space for data is 5 GB, as partitions smaller than 5 GB are automatically assigned the weight of 0.
So although the minimum disk space for an OSD is 11 GB, we do not recommend a disk smaller than 20 GB, even for testing purposes.
2.1.3 Recommended Size for the BlueStore's WAL and DB Device #
Following are several rules for WAL/DB device sizing. When using DeepSea to deploy OSDs with BlueStore, it applies the recommended rules automatically and notifies the administrator about the fact.
10GB of the DB device for each Terabyte of the OSD capacity (1/100th of the OSD).
Between 500MB and 2GB for the WAL device. The WAL size depends on the data traffic and workload, not on the OSD size. If you know that an OSD is physically able to handle small writes and overwrites at a very high throughput, more WAL is preferred rather than less WAL. 1GB WAL device is a good compromise that fulfills most deployments.
If you intend to put the WAL and DB device on the same disk, then we recommend using a single partition for both devices, rather than having a separate partition for each. This allows Ceph to use the DB device for the WAL operation as well. Management of the disk space is therefore more effective as Ceph uses the DB partition for the WAL only if there is a need for it. Another advantage is that the probability that the WAL partition gets full is very small, and when it is not entirely used then its space is not wasted but used for DB operation.
To share the DB device with the WAL, do not specify the WAL device, and specify only the DB device:
bluestore_block_db_path = "/path/to/db/device" bluestore_block_db_size = 10737418240 bluestore_block_wal_path = "" bluestore_block_wal_size = 0
Alternatively, you can put the WAL on its own separate device. In such case, we recommend the fastest device for the WAL operation.
2.1.5 Maximum Recommended Number of Disks #
You can have as many disks in one server as it allows. There are a few things to consider when planning the number of disks per server:
Network bandwidth. The more disks you have in a server, the more data must be transferred via the network card(s) for the disk write operations.
Memory. For optimum performance, reserve at least 2 GB of RAM per terabyte of disk space installed.
Fault tolerance. If the complete server fails, the more disks it has, the more OSDs the cluster temporarily loses. Moreover, to keep the replication rules running, you need to copy all the data from the failed server among the other nodes in the cluster.
2.2 Monitor Nodes #
At least three Ceph Monitor nodes are required. The number of monitors should always be odd (1+2n).
4 GB of RAM.
Processor with four logical cores.
An SSD or other sufficiently fast storage type is highly recommended for monitors, specifically for the
/var/lib/cephpath on each monitor node, as quorum may be unstable with high disk latencies. Two disks in RAID 1 configuration is recommended for redundancy. It is recommended that separate disks or at least separate disk partitions are used for the monitor processes to protect the monitor's available disk space from things like log file creep.There must only be one monitor process per node.
Mixing OSD, monitor, or Object Gateway nodes is only supported if sufficient hardware resources are available. That means that the requirements for all services need to be added up.
Two network interfaces bonded to multiple switches.
2.3 Object Gateway Nodes #
Object Gateway nodes should have six to eight CPU cores and 32 GB of RAM (64 GB recommended). When other processes are co-located on the same machine, their requirements need to be added up.
2.4 Metadata Server Nodes #
Proper sizing of the Metadata Server nodes depends on the specific use case. Generally, the more open files the Metadata Server is to handle, the more CPU and RAM it needs. Following are the minimal requirements:
3G of RAM per one Metadata Server daemon.
Bonded network interface.
2.5 GHz CPU with at least 2 cores.
2.5 Salt Master #
At least 4 GB of RAM and a quad-core CPU are required. This is includes running openATTIC on the Salt master. For large clusters with hundreds of nodes, 6 GB of RAM is suggested.
2.6 iSCSI Nodes #
iSCSI nodes should have six to eight CPU cores and 16 GB of RAM.
2.7 Network Recommendations #
The network environment where you intend to run Ceph should ideally be a bonded set of at least two network interfaces that is logically split into a public part and a trusted internal part using VLANs. The bonding mode is recommended to be 802.3ad if possible to provide maximum bandwidth and resiliency.
The public VLAN serves to provide the service to the customers, while the internal part provides for the authenticated Ceph network communication. The main reason for this is that although Ceph provides authentication and protection against attacks once secret keys are in place, the messages used to configure these keys may be transferred openly and are vulnerable.
Important: Administration Network not Supported
Additional administration network setup—that enables for example separating SSH, Salt, or DNS networking—is neither tested nor supported.
Tip: Nodes Configured via DHCP
If your storage nodes are configured via DHCP, the default timeouts may not
be sufficient for the network to be configured correctly before the various
Ceph daemons start. If this happens, the Ceph MONs and OSDs will not
start correctly (running systemctl status ceph\* will
result in "unable to bind" errors) To avoid this issue, we recommend
increasing the DHCP client timeout to at least 30 seconds on each node in
your storage cluster. This can be done by changing the following settings
on each node:
In /etc/sysconfig/network/dhcp, set
DHCLIENT_WAIT_AT_BOOT="30"
In /etc/sysconfig/network/config, set
WAIT_FOR_INTERFACES="60"
2.7.1 Adding a Private Network to a Running Cluster #
If you do not specify a cluster network during Ceph deployment, it assumes a single public network environment. While Ceph operates fine with a public network, its performance and security improves when you set a second private cluster network. To support two networks, each Ceph node needs to have at least two network cards.
You need to apply the following changes to each Ceph node. It is relatively quick to do for a small cluster, but can be very time consuming if you have a cluster consisting of hundreds or thousands of nodes.
Stop Ceph related services on each cluster node.
Add a line to
/etc/ceph/ceph.confto define the cluster network, for example:cluster network = 10.0.0.0/24
If you need to specifically assign static IP addresses or override
cluster networksettings, you can do so with the optionalcluster addr.Check that the private cluster network works as expected on the OS level.
Start Ceph related services on each cluster node.
root #systemctl start ceph.target
2.7.2 Monitor Nodes on Different Subnets #
If the monitor nodes are on multiple subnets, for example they are located
in different rooms and served by different switches, you need to adjust the
ceph.conf file accordingly. For example if the nodes
have IP addresses 192.168.123.12, 1.2.3.4, and 242.12.33.12, add the
following lines to its global section:
[global] [...] mon host = 192.168.123.12, 1.2.3.4, 242.12.33.12 mon initial members = MON1, MON2, MON3 [...]
Additionally, if you need to specify a per-monitor public address or
network, you need to add a
[mon.X] section per each
monitor:
[mon.MON1] public network = 192.168.123.0/24 [mon.MON2] public network = 1.2.3.0/24 [mon.MON3] public network = 242.12.33.12/0
2.8 Naming Limitations #
Ceph does not generally support non-ASCII characters in configuration files, pool names, user names and so forth. When configuring a Ceph cluster we recommend using only simple alphanumeric characters (A-Z, a-z, 0-9) and minimal punctuation ('.', '-', '_') in all Ceph object/configuration names.
2.10 Minimum Cluster Configuration #
Four Object Storage Nodes
10 Gb Ethernet (two networks bonded to multiple switches)
32 OSDs per storage cluster
OSD journal can reside on OSD disk
Dedicated OS disk for each Object Storage Node
1 GB of RAM per TB of raw OSD capacity for each Object Storage Node
1.5 GHz per OSD for each Object Storage Node
Ceph Monitors, gateway and Metadata Servers can reside on Object Storage Nodes
Three Ceph Monitor nodes (requires SSD for dedicated OS drive)
Ceph Monitors, Object Gateways and Metadata Servers nodes require redundant deployment
iSCSI Gateways, Object Gateways and Metadata Servers require incremental 4 GB RAM and four cores
Separate management node with 4 GB RAM, four cores, 1 TB capacity
2.11 Recommended Production Cluster Configuration #
Seven Object Storage Nodes
No single node exceeds ~15% of total storage
10 Gb Ethernet (four physical networks bonded to multiple switches)
56+ OSDs per storage cluster
RAID 1 OS disks for each OSD storage node
SSDs for Journal with 6:1 ratio SSD journal to OSD
1.5 GB of RAM per TB of raw OSD capacity for each Object Storage Node
2 GHz per OSD for each Object Storage Node
Dedicated physical infrastructure nodes
Three Ceph Monitor nodes: 4 GB RAM, 4 core processor, RAID 1 SSDs for disk
One SES management node: 4 GB RAM, 4 core processor, RAID 1 SSDs for disk
Redundant physical deployment of gateway or Metadata Server nodes:
Object Gateway nodes: 32 GB RAM, 8 core processor, RAID 1 SSDs for disk
iSCSI Gateway nodes: 16 GB RAM, 4 core processor, RAID 1 SSDs for disk
Metadata Server nodes (one active/one hot standby): 32 GB RAM, 8 core processor, RAID 1 SSDs for disk
2.12 SUSE Enterprise Storage 5.5 and Other SUSE Products #
This section contains important information about integrating SUSE Enterprise Storage 5.5 with other SUSE products.
2.12.1 SUSE Manager #
SUSE Manager and SUSE Enterprise Storage are not integrated, therefore SUSE Manager cannot currently manage a SUSE Enterprise Storage 5.5 cluster.