This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.

Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Deployment Guide / Installation of Additional Services / Installation of CephFS
Applies to SUSE Enterprise Storage 5.5 (SES 5 & SES 5.5)

11 Installation of CephFS

The Ceph file system (CephFS) is a POSIX-compliant file system that uses a Ceph storage cluster to store its data. CephFS uses the same cluster system as Ceph block devices, Ceph object storage with its S3 and Swift APIs, or native bindings (librados).

To use CephFS, you need to have a running Ceph storage cluster, and at least one running Ceph metadata server.

11.1 Supported CephFS Scenarios and Guidance

With SUSE Enterprise Storage, SUSE introduces official support for many scenarios in which the scale-out and distributed component CephFS is used. This entry describes hard limits and provides guidance for the suggested use cases.

A supported CephFS deployment must meet these requirements:

  • A minimum of one Metadata Server. SUSE recommends to deploy several nodes with the MDS role. Only one will be 'active' and the rest will be 'passive'. Remember to mention all the MON nodes in the mount command when mounting the CephFS from a client.

  • CephFS snapshots are disabled (default) and not supported in this version.

  • Clients are SUSE Linux Enterprise Server 12 SP2 or SP3 based, using the cephfs kernel module driver. The FUSE module is not supported.

  • CephFS quotas are not supported in SUSE Enterprise Storage, as support for quotas is implemented in the FUSE client only.

  • CephFS supports file layout changes as documented in http://docs.ceph.com/docs/jewel/cephfs/file-layouts/. However, while the file system is mounted by any client, new data pools may not be added to an existing CephFS file system (ceph mds add_data_pool). They may only be added while the file system is unmounted.

11.2 Ceph Metadata Server

Ceph metadata server (MDS) stores metadata for the CephFS. Ceph block devices and Ceph object storage do not use MDS. MDSs make it possible for POSIX file system users to execute basic commands—such as ls or find—without placing an enormous burden on the Ceph storage cluster.

11.2.1 Adding a Metadata Server

You can deploy MDS either during the initial cluster deployment process as described in Section 4.3, “Cluster Deployment”, or add it to an already deployed cluster as described in Book “Administration Guide”, Chapter 1 “Salt Cluster Administration”, Section 1.1 “Adding New Cluster Nodes”.

After you deploy your MDS, allow the Ceph OSD/MDS service in the firewall setting of the server where MDS is deployed: Start yast, navigate to Security and Users / Firewall / Allowed Services and in the Service to Allow drop–down menu select Ceph OSD/MDS. If the Ceph MDS node is not allowed full traffic, mounting of a file system fails, even though other operations may work properly.

11.2.2 Configuring a Metadata Server

You can fine-tune the MDS behavior by inserting relevant options in the ceph.conf configuration file.

MDS Cache Size
mds cache memory limit

The soft memory limit (in bytes) that the MDS will enforce for its cache. Administrators should use this instead of the old mds cache size setting. Defaults to 1GB.

mds cache reservation

The cache reservation (memory or inodes) for the MDS cache to maintain. When the MDS begins touching its reservation, it will recall client state until its cache size shrinks to restore the reservation. Defaults to 0.05.

For a detailed list of MDS related configuration options, see http://docs.ceph.com/docs/master/cephfs/mds-config-ref/.

For a detailed list of MDS journaler configuration options, see http://docs.ceph.com/docs/master/cephfs/journaler/.

11.3 CephFS

When you have a healthy Ceph storage cluster with at least one Ceph metadata server, you can create and mount your Ceph file system. Ensure that your client has network connectivity and a proper authentication keyring.

11.3.1 Creating CephFS

A CephFS requires at least two RADOS pools: one for data and one for metadata. When configuring these pools, you might consider:

  • Using a higher replication level for the metadata pool, as any data loss in this pool can render the whole file system inaccessible.

  • Using lower-latency storage such as SSDs for the metadata pool, as this will improve the observed latency of file system operations on clients.

When assigning a role-mds in the policy.cfg, the required pools are automatically created. You can manually create the pools cephfs_data and cephfs_metadata for manual performance tuning before setting up the Metadata Server. DeepSea will not create these pools if they already exist.

For more information on managing pools, see Book “Administration Guide”, Chapter 8 “Managing Storage Pools”.

To create the two required pools—for example, 'cephfs_data' and 'cephfs_metadata'—with default settings for use with CephFS, run the following commands:

cephadm > ceph osd pool create cephfs_data pg_num
cephadm > ceph osd pool create cephfs_metadata pg_num

It is possible to use EC pools instead of replicated pools. We recommend to only use EC pools for low performance requirements and infrequent random access, for example cold storage, backups, archiving. CephFS on EC pools requires BlueStore to be enabled and the pool must have the allow_ec_overwrite option set. This option can be set by running ceph osd pool set ec_pool allow_ec_overwrites true.

Erasure coding adds significant overhead to file system operations, especially small updates. This overhead is inherent to using erasure coding as a fault tolerance mechanism. This penalty is the trade off for significantly reduced storage space overhead.

When the pools are created, you may enable the file system with the ceph fs new command:

cephadm > ceph fs new fs_name metadata_pool_name data_pool_name

For example:

cephadm > ceph fs new cephfs cephfs_metadata cephfs_data

You can check that the file system was created by listing all available CephFSs:

cephadm > ceph fs ls
 name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

When the file system has been created, your MDS will be able to enter an active state. For example, in a single MDS system:

cephadm > ceph mds stat
e5: 1/1/1 up
Tip
Tip: More Topics

You can find more information of specific tasks—for example mounting, unmounting, and advanced CephFS setup—in Book “Administration Guide”, Chapter 15 “Clustered File System”.

11.3.2 MDS Cluster Size

A CephFS instance can be served by multiple active MDS daemons. All active MDS daemons that are assigned to a CephFS instance will distribute the file system's directory tree between themselves, and thus spread the load of concurrent clients. In order to add an active MDS daemon to a CephFS instance, a spare standby is needed. Either start an additional daemon or use an existing standby instance.

The following command will display the current number of active and passive MDS daemons.

cephadm > ceph mds stat

The following command sets the number of active MDS's to two in a file system instance.

cephadm > ceph fs set fs_name max_mds 2

In order to shrink the MDS cluster prior to an update, two steps are necessary. First set max_mds so that only one instance remains:

cephadm > ceph fs set fs_name max_mds 1

and after that explicitly deactivate the other active MDS daemons:

cephadm > ceph mds deactivate fs_name:rank

where rank is the number of an active MDS daemon of a file system instance, ranging from 0 to max_mds-1. See http://docs.ceph.com/docs/luminous/cephfs/multimds/ for additional information.

11.3.3 MDS Cluster and Updates

During Ceph updates, the feature flags on a file system instance may change (usually by adding new features). Incompatible daemons (such as the older versions) are not able to function with an incompatible feature set and will refuse to start. This means that updating and restarting one daemon can cause all other not yet updated daemons to stop and refuse to start. For this reason we, recommend shrinking the active MDS cluster to size one and stopping all standby daemons before updating Ceph. The manual steps for this update procedure are as follows:

  1. Update the Ceph related packages using zypper.

  2. Shrink the active MDS cluster as described above to 1 instance and stop all standby MDS daemons using their systemd units on all other nodes:

    root # systemctl stop ceph-mds\*.service ceph-mds.target
  3. Only then restart the single remaining MDS daemon, causing it to restart using the updated binary.

    root # systemctl restart ceph-mds\*.service ceph-mds.target
  4. Restart all other MDS daemons and re-set the desired max_mds setting.

    root # systemctl start ceph-mds.target

If you use DeepSea, it will follow this procedure in case the ceph package was updated during Stages 0 and 4. It is possible to perform this procedure while clients have the CephFS instance mounted and I/O is ongoing. Note however that there will be a very brief I/O pause while the active MDS restarts. Clients will recover automatically.

It is good practice to reduce the I/O load as much as possible before updating an MDS cluster. An idle MDS cluster will go through this update procedure quicker. Conversely, on a heavily loaded cluster with multiple MDS daemons it is essential to reduce the load in advance to prevent a single MDS daemon from being overwhelmed by ongoing I/O.