D Ceph Maintenance Updates Based on Upstream 'Nautilus' Point Releases #
Several key packages in SUSE Enterprise Storage 6 are based on the Nautilus release series of Ceph. When the Ceph project (https://github.com/ceph/ceph) publishes new point releases in the Nautilus series, SUSE Enterprise Storage 6 is updated to ensure that the product benefits from the latest upstream bugfixes and feature backports.
This chapter contains summaries of notable changes contained in each upstream point release that has been—or is planned to be—included in the product.
Nautilus 14.2.20 Point Release#
  This release includes a security fix that ensures the
  global_id value (a numeric value that should be unique for
  every authenticated client or daemon in the cluster) is reclaimed after a
  network disconnect or ticket renewal in a secure fashion. Two new health
  alerts may appear during the upgrade indicating that there are clients or
  daemons that are not yet patched with the appropriate fix.
 
To temporarily mute the health alerts around insecure clients for the duration of the upgrade, you may want to run:
cephadm@adm >ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1hcephadm@adm >ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1h
When all clients are updated, enable the new secure behavior, not allowing old insecure clients to join the cluster:
cephadm@adm > ceph config set mon auth_allow_insecure_global_id_reclaim falseFor more details, refer ro https://docs.ceph.com/en/latest/security/CVE-2021-20288/.
Nautilus 14.2.18 Point Release#
This release fixes a regression introduced in 14.2.17 in which the manager module tries to use a couple of Python modules that do not exist in some environments.
- This release fixes issues loading the dashboard and volumes manager modules in some environments. 
Nautilus 14.2.17 Point Release#
This release includes the following fixes:
- $pidexpansion in configuration paths such as- admin_socketwill now properly expand to the daemon PID for commands like- ceph-mdsor- ceph-osd. Previously, only- ceph-fuseand- rbd-nbdexpanded- $pidwith the actual daemon PID.
- RADOS: PG removal has been optimized. 
- RADOS: Memory allocations are tracked in finer detail in BlueStore and displayed as a part of the - dump_mempoolscommand.
- CephFS: clients which acquire capabilities too quickly are throttled to prevent instability. See new config option - mds_session_cap_acquisition_throttleto control this behavior.
Nautilus 14.2.16 Point Release#
This release fixes a security flaw in CephFS.
- CVE-2020-27781 : OpenStack Manila use of - ceph_volume_client.pylibrary allowed tenant access to any Ceph credentials' secret.
Nautilus 14.2.15 Point Release#
This release fixes a ceph-volume regression introduced in v14.2.13 and includes few other fixes.
- ceph-volume: Fixes - lvm batch –auto, which breaks backward compatibility when using non rotational devices only (SSD and/or NVMe).
- BlueStore: Fixes a bug in - collection_list_legacywhich makes PGs inconsistent during scrub when running OSDs older than 14.2.12 with newer ones.
- MGR: progress module can now be turned on or off, using the commands - ceph progress onand- ceph progress off.
Nautilus 14.2.14 Point Release#
This releases fixes a security flaw affecting Messenger V2 for Octopus and Nautilus, among other fixes across components.
- CVE 2020-25660: Fix a regression in Messenger V2 replay attacks. 
Nautilus 14.2.13 Point Release#
This release fixes a regression introduced in v14.2.12, and a few ceph-volume amd RGW fixes.
- Fixed a regression that caused breakage in clusters that referred to ceph-mon hosts using dns names instead of IP addresses in the - mon_hostparameter in- ceph.conf.
- ceph-volume: the - lvm batchsubcommand received a major rewrite.
Nautilus 14.2.12 Point Release#
In addition to bug fixes, this major upstream release brought a number of notable changes:
- The - ceph df commandnow lists the number of PGs in each pool.
- MONs now have a config option - mon_osd_warn_num_repaired, 10 by default. If any OSD has repaired more than this many I/O errors in stored data, a- OSD_TOO_MANY_REPAIRShealth warning is generated. In order to allow clearing of the warning, a new command- ceph tell osd.SERVICE_ID clear_shards_repaired COUNThas been added. By default, it will set the repair count to 0. If you want to be warned again if additional repairs are performed, you can provide a value to the command and specify the value of- mon_osd_warn_num_repaired. This command will be replaced in future releases by the health mute/unmute feature.
- It is now possible to specify the initial MON to contact for Ceph tools and daemons using the - mon_host_override configoption or- --mon-host-override IPcommand-line switch. This generally should only be used for debugging and only affects initial communication with Ceph’s MON cluster.
- Fix an issue with osdmaps not being trimmed in a healthy cluster. 
Nautilus 14.2.11 Point Release#
In addition to bug fixes, this major upstream release brought a number of notable changes:
- RGW: The - radosgw-adminsub-commands dealing with orphans –- radosgw-admin orphans find,- radosgw-admin orphans finish,- radosgw-admin orphans list-jobs– have been deprecated. They have not been actively maintained and they store intermediate results on the cluster, which could fill a nearly-full cluster. They have been replaced by a tool, currently considered experimental,- rgw-orphan-list.
- Now, when - noscruband/or- nodeep-scrubflags are set globally or per pool, scheduled scrubs of the type disabled will be aborted. All user initiated scrubs are not interrupted.
- Fixed a ceph-osd crash in committed OSD maps when there is a failure to encode the first incremental map. 
Nautilus 14.2.10 Point Release#
This upstream release patched one security flaw:
- CVE-2020-10753: rgw: sanitize newlines in s3 CORSConfiguration’s ExposeHeader 
In addition to security flaws, this major upstream release brought a number of notable changes:
- The pool parameter - target_size_ratio, used by the PG autoscaler, has changed meaning. It is now normalized across pools, rather than specifying an absolute ratio. If you have set target size ratios on any pools, you may want to set these pools to autoscale- warnmode to avoid data movement during the upgrade:- ceph osd pool set POOL_NAME pg_autoscale_mode warn 
- The behaviour of the - -oargument to the RADOS tool has been reverted to its original behaviour of indicating an output file. This reverts it to a more consistent behaviour when compared to other tools. Specifying object size is now accomplished by using an upper case O- -O.
- The format of MDSs in - ceph fs dumphas changed.
- Ceph will issue a health warning if a RADOS pool’s - sizeis set to 1 or, in other words, the pool is configured with no redundancy. This can be fixed by setting the pool size to the minimum recommended value with:- cephadm@adm >ceph osd pool set pool-name size num-replicas- The warning can be silenced with: - cephadm@adm >ceph config set global mon_warn_on_pool_no_redundancy false
- RGW: bucket listing performance on sharded bucket indexes has been notably improved by heuristically – and significantly, in many cases – reducing the number of entries requested from each bucket index shard. 
Nautilus 14.2.9 Point Release#
This upstream release patched two security flaws:
- CVE-2020-1759: Fixed nonce reuse in msgr V2 secure mode 
- CVE-2020-1760: Fixed XSS due to RGW GetObject header-splitting 
In SES 6, these flaws were patched in Ceph version 14.2.5.389+gb0f23ac248.
Nautilus 14.2.8 Point Release#
In addition to bug fixes, this major upstream release brought a number of notable changes:
- The default value of - bluestore_min_alloc_size_ssdhas been changed to 4K to improve performance across all workloads.
- The following OSD memory config options related to BlueStore cache autotuning can now be configured during runtime: - osd_memory_base (default: 768 MB) osd_memory_cache_min (default: 128 MB) osd_memory_expected_fragmentation (default: 0.15) osd_memory_target (default: 4 GB) - You can set the above options by running: - cephadm@adm >ceph config set osd OPTION VALUE
- The Ceph Manager now accepts - profile rbdand- profile rbd-read-onlyuser capabilities. You can use these capabilities to provide users access to MGR-based RBD functionality such as- rbd perf image iostatand- rbd perf image iotop.
- The configuration value - osd_calc_pg_upmaps_max_stddevused for upmap balancing has been removed. Instead, use the Ceph Manager balancer configuration option- upmap_max_deviationwhich now is an integer number of PGs of deviation from the target PGs per OSD. You can set it with a following command:- cephadm@adm >ceph config set mgr mgr/balancer/upmap_max_deviation 2- The default - upmap_max_deviationis 5. There are situations where crush rules would not allow a pool to ever have completely balanced PGs. For example, if crush requires 1 replica on each of 3 racks, but there are fewer OSDs in 1 of the racks. In those cases, the configuration value can be increased.
- CephFS: multiple active Metadata Server forward scrub is now rejected. Scrub is currently only permitted on a file system with a single rank. Reduce the ranks to one via - ceph fs set FS_NAME max_mds 1.
- Ceph will now issue a health warning if a RADOS pool has a - pg_numvalue that is not a power of two. This can be fixed by adjusting the pool to an adjacent power of two:- cephadm@adm >ceph osd pool set POOL_NAME pg_num NEW_PG_NUM- Alternatively, you can silence the warning with: - cephadm@adm >ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
Nautilus 14.2.7 Point Release#
This upstream release patched two security flaws:
- CVE-2020-1699: a path traversal flaw in Ceph Dashboard that could allow for potential information disclosure. 
- CVE-2020-1700: a flaw in the RGW beast front-end that could lead to denial of service from an unauthenticated client. 
In SES 6, these flaws were patched in Ceph version 14.2.5.382+g8881d33957b.
Nautilus 14.2.6 Point Release#
This release fixed a Ceph Manager bug that caused MGRs becoming unresponsive on larger clusters. SES users were never exposed to the bug.
Nautilus 14.2.5 Point Release#
- Health warnings are now issued if daemons have recently crashed. Ceph will now issue health warnings if daemons have recently crashed. Ceph has been collecting crash reports since the initial Nautilus release, but the health alerts are new. To view new crashes (or all crashes, if you have just upgraded), run: - cephadm@adm >ceph crash ls-new- To acknowledge a particular crash (or all crashes) and silence the health warning, run: - cephadm@adm >ceph crash archive CRASH-ID- cephadm@adm >ceph crash archive-all
- pg_nummust be a power of two, otherwise- HEALTH_WARNis reported. Ceph will now issue a health warning if a RADOS pool has a- pg_numvalue that is not a power of two. You can fix this by adjusting the pool to a nearby power of two:- cephadm@adm >ceph osd pool set POOL-NAME pg_num NEW-PG-NUM- Alternatively, you can silence the warning with: - cephadm@adm >ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
- Pool size needs to be greater than 1 otherwise - HEALTH_WARNis reported. Ceph will issue a health warning if a RADOS pool’s size is set to 1 or if the pool is configured with no redundancy. Ceph will stop issuing the warning if the pool size is set to the minimum recommended value:- cephadm@adm >ceph osd pool set POOL-NAME size NUM-REPLICAS- You can silence the warning with: - cephadm@adm >ceph config set global mon_warn_on_pool_no_redundancy false
- Health warning is reported if average OSD heartbeat ping time exceeds the threshold. A health warning is now generated if the average OSD heartbeat ping time exceeds a configurable threshold for any of the intervals computed. The OSD computes 1 minute, 5 minute and 15 minute intervals with average, minimum, and maximum values. - A new configuration option, - mon_warn_on_slow_ping_ratio, specifies a percentage of- osd_heartbeat_graceto determine the threshold. A value of zero disables the warning.- A new configuration option, - mon_warn_on_slow_ping_time, specified in milliseconds, overrides the computed value and causes a warning when OSD heartbeat pings take longer than the specified amount.- A new command - ceph daemon mgr.MGR-NUMBER dump_osd_network THRESHOLDlists all connections with a ping time longer than the specified threshold or value determined by the configuration options, for the average for any of the 3 intervals.- A new command - ceph daemon osd.# dump_osd_network THRESHOLDwill do the same as the previous one but only including heartbeats initiated by the specified OSD.
- Changes in the telemetry MGR module. - A new 'device' channel (enabled by default) will report anonymized hard disk and SSD health metrics to - telemetry.ceph.comin order to build and improve device failure prediction algorithms.- Telemetry reports information about CephFS file systems, including: - How many MDS daemons (in total and per file system). 
- Which features are (or have been) enabled. 
- How many data pools. 
- Approximate file system age (year and the month of creation). 
- How many files, bytes, and snapshots. 
- How much metadata is being cached. 
 - Other miscellaneous information: - Which Ceph release the monitors are running. 
- Whether msgr v1 or v2 addresses are used for the monitors. 
- Whether IPv4 or IPv6 addresses are used for the monitors. 
- Whether RADOS cache tiering is enabled (and the mode). 
- Whether pools are replicated or erasure coded, and which erasure code profile plug-in and parameters are in use. 
- How many hosts are in the cluster, and how many hosts have each type of daemon. 
- Whether a separate OSD cluster network is being used. 
- How many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled. 
- How many RGW daemons, zones, and zonegroups are present and which RGW frontends are in use. 
- Aggregate stats about the CRUSH Map, such as which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use. 
 - If you had telemetry enabled before 14.2.5, you will need to re-opt-in with: - cephadm@adm >ceph telemetry on- If you are not comfortable sharing device metrics, you can disable that channel first before re-opting-in: - cephadm@adm >ceph config set mgr mgr/telemetry/channel_device false- cephadm@adm >ceph telemetry on- You can view exactly what information will be reported first with: - cephadm@adm >ceph telemetry show # see everything- cephadm@adm >ceph telemetry show device # just the device info- cephadm@adm >ceph telemetry show basic # basic cluster info
- New OSD daemon command - dump_recovery_reservations. It reveals the recovery locks held (- in_progress) and waiting in priority queues. Usage:- cephadm@adm >ceph daemon osd.ID dump_recovery_reservations
- New OSD daemon command - dump_scrub_reservations. It reveals the scrub reservations that are held for local (primary) and remote (replica) PGs. Usage:- cephadm@adm >ceph daemon osd.ID dump_scrub_reservations
- RGW now supports S3 Object Lock set of APIs. RGW now supports S3 Object Lock set of APIs allowing for a WORM model for storing objects. 6 new APIs have been added PUT/GET bucket object lock, PUT/GET object retention, PUT/GET object legal hold. 
- RGW now supports List Objects V2. RGW now supports List Objects V2 as specified at https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html. 
Nautilus 14.2.4 Point Release#
This point release fixes a serious regression that found its way into the 14.2.3 point release. This regression did not affect SUSE Enterprise Storage customers because we did not ship a version based on 14.2.3.
Nautilus 14.2.3 Point Release#
- Fixed a denial of service vulnerability where an unauthenticated client of Ceph Object Gateway could trigger a crash from an uncaught exception. 
- Nautilus-based librbd clients can now open images on Jewel clusters. 
- The Object Gateway - num_rados_handleshas been removed. If you were using a value of- num_rados_handlesgreater than 1, multiply your current- objecter_inflight_opsand- objecter_inflight_op_bytesparameters by the old- num_rados_handlesto get the same throttle behavior.
- The secure mode of Messenger v2 protocol is no longer experimental with this release. This mode is now the preferred mode of connection for monitors. 
- osd_deep_scrub_large_omap_object_key_thresholdhas been lowered to detect an object with a large number of omap keys more easily.
- The Ceph Dashboard now supports silencing Prometheus notifications. 
Nautilus 14.2.2 Point Release#
- The - no{up,down,in,out}related commands have been revamped. There are now two ways to set the- no{up,down,in,out}flags: the old command- ceph osd [un]set FLAG - which sets cluster-wide flags; and the new command - ceph osd [un]set-group FLAGS WHO - which sets flags in batch at the granularity of any crush node or device class. 
- radosgw-adminintroduces two subcommands that allow the managing of expire-stale objects that might be left behind after a bucket reshard in earlier versions of Object Gateway. Expire-stale objects are expired objects that should have been automatically erased but still exist and need to be listed and removed manually. One subcommand lists such objects and the other deletes them.
- Earlier Nautilus releases (14.2.1 and 14.2.0) have an issue where deploying a single new Nautilus BlueStore OSD on an upgraded cluster (i.e. one that was originally deployed pre-Nautilus) breaks the pool utilization statistics reported by - ceph df. Until all OSDs have been reprovisioned or updated (via- ceph-bluestore-tool repair), the pool statistics will show values that are lower than the true value. This is resolved in 14.2.2, such that the cluster only switches to using the more accurate per-pool stats after all OSDs are 14.2.2 or later, are Block Storage, and have been updated via the repair function if they were created prior to Nautilus.
- The default value for - mon_crush_min_required_versionhas been changed from- fireflyto- hammer, which means the cluster will issue a health warning if your CRUSH tunables are older than Hammer. There is generally a small (but non-zero) amount of data that will be re-balanced after making the switch to Hammer tunables.- If possible, we recommend that you set the oldest allowed client to - hammeror later. To display what the current oldest allowed client is, run:- cephadm@adm >ceph osd dump | grep min_compat_client- If the current value is older than - hammer, run the following command to determine whether it is safe to make this change by verifying that there are no clients older than Hammer currently connected to the cluster:- cephadm@adm >ceph features- The newer - straw2CRUSH bucket type was introduced in Hammer. If you verify that all clients are Hammer or newer, it allows new features only supported for- straw2buckets to be used, including the- crush-compatmode for the Balancer (Book “Administration Guide”, Chapter 21 “Ceph Manager Modules”, Section 21.1 “Balancer”).
Find detailed information about the patch at https://download.suse.com/Download?buildid=D38A7mekBz4~
Nautilus 14.2.1 Point Release#
This was the first point release following the original Nautilus release (14.2.0). The original ('General Availability' or 'GA') version of SUSE Enterprise Storage 6 was based on this point release.