This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.

Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Administration and Operations Guide / Accessing Cluster Data / Ceph Object Gateway
Applies to SUSE Enterprise Storage 7

21 Ceph Object Gateway

This chapter introduces details about administration tasks related to Object Gateway, such as checking status of the service, managing accounts, multisite gateways, or LDAP authentication.

21.1 Object Gateway restrictions and naming limitations

Following is a list of important Object Gateway limits:

21.1.1 Bucket limitations

When approaching Object Gateway via the S3 API, bucket names are limited to DNS-compliant names with a dash character '-' allowed. When approaching Object Gateway via the Swift API, you may use any combination of UTF-8 supported characters except for a slash character '/'. The maximum length of a bucket name is 255 characters. Bucket names must be unique.

Tip
Tip: Use DNS-compliant bucket names

Although you may use any UTF-8 based bucket name via the Swift API, it is recommended to name buckets with regard to the S3 naming limitations to avoid problems accessing the same bucket via the S3 API.

21.1.2 Stored object limitations

Maximum number of objects per user

No restriction by default (limited by ~ 2^63).

Maximum number of objects per bucket

No restriction by default (limited by ~ 2^63).

Maximum size of an object to upload/store

Single uploads are restricted to 5 GB. Use multipart for larger object sizes. The maximum number of multipart chunks is 10000.

21.1.3 HTTP header limitations

HTTP header and request limitation depend on the Web front-end used. The default Beast restricts the size of the HTTP header to 16 kB.

21.2 Deploying the Object Gateway

The Ceph Object Gateway deployment follows the same procedure as the deployment of other Ceph services—by means of cephadm. For more details, refer to Book “Deployment Guide”, Chapter 8 “Deploying the remaining core services using cephadm”, Section 8.2 “Service and placement specification”, specifically to Book “Deployment Guide”, Chapter 8 “Deploying the remaining core services using cephadm”, Section 8.3.4 “Deploying Object Gateways”.

21.3 Operating the Object Gateway service

You can operate the Object Gateways same as other Ceph services by first identifying the service name with the ceph orch ps command, and running the following command for operating services, for example:

ceph orch daemon restart OGW_SERVICE_NAME

Refer to Chapter 14, Operation of Ceph services for complete information about operating Ceph services.

21.4 Configuration options

Refer to Section 28.5, “Ceph Object Gateway” for a list of Object Gateway configuration options.

21.5 Managing Object Gateway access

You can communicate with Object Gateway using either S3- or Swift-compatible interface. S3 interface is compatible with a large subset of the Amazon S3 RESTful API. Swift interface is compatible with a large subset of the OpenStack Swift API.

Both interfaces require you to create a specific user, and install the relevant client software to communicate with the gateway using the user's secret key.

21.5.1 Accessing Object Gateway

21.5.1.1 S3 interface access

To access the S3 interface, you need a REST client. S3cmd is a command line S3 client. You can find it in the OpenSUSE Build Service. The repository contains versions for both SUSE Linux Enterprise and openSUSE based distributions.

If you want to test your access to the S3 interface, you can also write a small a Python script. The script will connect to Object Gateway, create a new bucket, and list all buckets. The values for aws_access_key_id and aws_secret_access_key are taken from the values of access_key and secret_key returned by the radosgw_admin command from Section 21.5.2.1, “Adding S3 and Swift users”.

  1. Install the python-boto package:

    # zypper in python-boto
  2. Create a new Python script called s3test.py with the following content:

    import boto
    import boto.s3.connection
    access_key = '11BS02LGFB6AL6H1ADMW'
    secret_key = 'vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANY'
    conn = boto.connect_s3(
    aws_access_key_id = access_key,
    aws_secret_access_key = secret_key,
    host = 'HOSTNAME',
    is_secure=False,
    calling_format = boto.s3.connection.OrdinaryCallingFormat(),
    )
    bucket = conn.create_bucket('my-new-bucket')
    for bucket in conn.get_all_buckets():
      print "NAME\tCREATED".format(
      name = bucket.name,
      created = bucket.creation_date,
      )

    Replace HOSTNAME with the host name of the host where you configured the Object Gateway service, for example gateway_host.

  3. Run the script:

    python s3test.py

    The script outputs something like the following:

    my-new-bucket 2015-07-22T15:37:42.000Z

21.5.1.2 Swift interface access

To access Object Gateway via Swift interface, you need the swift command line client. Its manual page man 1 swift tells you more about its command line options.

The package is included in the 'Public Cloud' module for SUSE Linux Enterprise 12 from SP3 and SUSE Linux Enterprise 15. Before installing the package, you need to activate the module and refresh the software repository:

# SUSEConnect -p sle-module-public-cloud/12/SYSTEM-ARCH
sudo zypper refresh

Or

# SUSEConnect -p sle-module-public-cloud/15/SYSTEM-ARCH
# zypper refresh

To install the swift command, run the following:

# zypper in python-swiftclient

The swift access uses the following syntax:

> swift -A http://IP_ADDRESS/auth/1.0 \
-U example_user:swift -K 'SWIFT_SECRET_KEY' list

Replace IP_ADDRESS with the IP address of the gateway server, and SWIFT_SECRET_KEY with its value from the output of the radosgw-admin key create command executed for the swift user in Section 21.5.2.1, “Adding S3 and Swift users”.

For example:

> swift -A http://gateway.example.com/auth/1.0 -U example_user:swift \
-K 'r5wWIxjOCeEO7DixD1FjTLmNYIViaC6JVhi3013h' list

The output is:

my-new-bucket

21.5.2 Manage S3 and Swift accounts

21.5.2.1 Adding S3 and Swift users

You need to create a user, access key and secret to enable end users to interact with the gateway. There are two types of users: a user and subuser. While users are used when interacting with the S3 interface, subusers are users of the Swift interface. Each subuser is associated to a user.

To create a Swift user, follow the steps:

  1. To create a Swift user—which is a subuser in our terminology—you need to create the associated user first.

    cephuser@adm > radosgw-admin user create --uid=USERNAME \
     --display-name="DISPLAY-NAME" --email=EMAIL

    For example:

    cephuser@adm > radosgw-admin user create \
       --uid=example_user \
       --display-name="Example User" \
       --email=penguin@example.com
  2. To create a subuser (Swift interface) for the user, you must specify the user ID (--uid=USERNAME), a subuser ID, and the access level for the subuser.

    cephuser@adm > radosgw-admin subuser create --uid=UID \
     --subuser=UID \
     --access=[ read | write | readwrite | full ]

    For example:

    cephuser@adm > radosgw-admin subuser create --uid=example_user \
     --subuser=example_user:swift --access=full
  3. Generate a secret key for the user.

    cephuser@adm > radosgw-admin key create \
       --gen-secret \
       --subuser=example_user:swift \
       --key-type=swift
  4. Both commands will output JSON-formatted data showing the user state. Notice the following lines, and remember the secret_key value:

    "swift_keys": [
       { "user": "example_user:swift",
         "secret_key": "r5wWIxjOCeEO7DixD1FjTLmNYIViaC6JVhi3013h"}],

When accessing Object Gateway through the S3 interface you need to create an S3 user by running:

cephuser@adm > radosgw-admin user create --uid=USERNAME \
 --display-name="DISPLAY-NAME" --email=EMAIL

For example:

cephuser@adm > radosgw-admin user create \
   --uid=example_user \
   --display-name="Example User" \
   --email=penguin@example.com

The command also creates the user's access and secret key. Check its output for access_key and secret_key keywords and their values:

[...]
 "keys": [
       { "user": "example_user",
         "access_key": "11BS02LGFB6AL6H1ADMW",
         "secret_key": "vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANY"}],
 [...]

21.5.2.2 Removing S3 and Swift users

The procedure for deleting users is similar for S3 and Swift users. But in case of Swift users you may need to delete the user including its subusers.

To remove a S3 or Swift user (including all its subusers), specify user rm and the user ID in the following command:

cephuser@adm > radosgw-admin user rm --uid=example_user

To remove a subuser, specify subuser rm and the subuser ID.

cephuser@adm > radosgw-admin subuser rm --uid=example_user:swift

You can make use of the following options:

--purge-data

Purges all data associated to the user ID.

--purge-keys

Purges all keys associated to the user ID.

Tip
Tip: Removing a subuser

When you remove a subuser, you are removing access to the Swift interface. The user will remain in the system.

21.5.2.3 Changing S3 and Swift user access and secret keys

The access_key and secret_key parameters identify the Object Gateway user when accessing the gateway. Changing the existing user keys is the same as creating new ones, as the old keys get overwritten.

For S3 users, run the following:

cephuser@adm > radosgw-admin key create --uid=EXAMPLE_USER --key-type=s3 --gen-access-key --gen-secret

For Swift users, run the following:

cephuser@adm > radosgw-admin key create --subuser=EXAMPLE_USER:swift --key-type=swift --gen-secret
--key-type=TYPE

Specifies the type of key. Either swift or s3.

--gen-access-key

Generates a random access key (for S3 user by default).

--gen-secret

Generates a random secret key.

--secret=KEY

Specifies a secret key, for example manually generated.

21.5.2.4 Enabling user quota management

The Ceph Object Gateway enables you to set quotas on users and buckets owned by users. Quotas include the maximum number of objects in a bucket and the maximum storage size in megabytes.

Before you enable a user quota, you first need to set its parameters:

cephuser@adm > radosgw-admin quota set --quota-scope=user --uid=EXAMPLE_USER \
 --max-objects=1024 --max-size=1024
--max-objects

Specifies the maximum number of objects. A negative value disables the check.

--max-size

Specifies the maximum number of bytes. A negative value disables the check.

--quota-scope

Sets the scope for the quota. The options are bucket and user. Bucket quotas apply to buckets a user owns. User quotas apply to a user.

Once you set a user quota, you may enable it:

cephuser@adm > radosgw-admin quota enable --quota-scope=user --uid=EXAMPLE_USER

To disable a quota:

cephuser@adm > radosgw-admin quota disable --quota-scope=user --uid=EXAMPLE_USER

To list quota settings:

cephuser@adm > radosgw-admin user info --uid=EXAMPLE_USER

To update quota statistics:

cephuser@adm > radosgw-admin user stats --uid=EXAMPLE_USER --sync-stats

21.6 HTTP front-ends

The Ceph Object Gateway supports two embedded HTTP front-ends: Beast and Civetweb.

The Beast front-end uses the Boost.Beast library for HTTP parsing and the Boost.Asio library for asynchronous network I/O.

The Civetweb front-end uses the Civetweb HTTP library, which is a fork of Mongoose.

You can configure them with the rgw_frontends option. Refer to Section 28.5, “Ceph Object Gateway” for a list of configuration options.

21.7 Enable HTTPS/SSL for Object Gateways

To enable the Object Gateway to communicate securely using SSL, you need to either have a CA-issued certificate or create a self-signed one.

21.7.1 Creating a self-signed certificate

Tip
Tip

Skip this section if you already have a valid certificate signed by CA.

The following procedure describes how to generate a self-signed SSL certificate on the Salt Master.

  1. If you need your Object Gateway to be known by additional subject identities, add them to the subjectAltName option in the [v3_req] section of the /etc/ssl/openssl.cnf file:

    [...]
    [ v3_req ]
    subjectAltName = DNS:server1.example.com DNS:server2.example.com
    [...]
    Tip
    Tip: IP addresses in subjectAltName

    To use IP addresses instead of domain names in the subjectAltName option, replace the example line with the following:

    subjectAltName = IP:10.0.0.10 IP:10.0.0.11
  2. Create the key and the certificate using openssl. Enter all data you need to include in your certificate. We recommend entering the FQDN as the common name. Before signing the certificate, verify that 'X509v3 Subject Alternative Name:' is included in requested extensions, and that the resulting certificate has "X509v3 Subject Alternative Name:" set.

    root@master # openssl req -x509 -nodes -days 1095 \
     -newkey rsa:4096 -keyout rgw.key
     -out rgw.pem
  3. Append the key to the certificate file:

    root@master # cat rgw.key >> rgw.pem

21.7.2 Configuring Object Gateway with SSL

To configure Object Gateway to use SSL certificates, use the rgw_frontends option. For example:

cephuser@adm > ceph config set WHO rgw_frontends \
 beast ssl_port=443 ssl_certificate=config://CERT ssl_key=config://KEY

If you do not specify the CERT and KEY configuration keys, then the Object Gateway service will look for the SSL certificate and key under the following configuration keys:

rgw/cert/RGW_REALM/RGW_ZONE.key
rgw/cert/RGW_REALM/RGW_ZONE.crt

If you want to override the default SSL key and certificate location, import them to the configuration database by using the following command:

ceph config-key set CUSTOM_CONFIG_KEY -i PATH_TO_CERT_FILE

Then use your custom configuration keys using the config:// directive.

21.8 Synchronization modules

Object Gateway is deployed as a multi-site service while you can mirror data and metadata between the zones. Synchronization modules are built atop of the multisite framework that allows for forwarding data and metadata to a different external tier. A synchronization module allows for a set of actions to be performed whenever a change in data occurs (for example, metadata operations such as bucket or user creation). As the Object Gateway multisite changes are eventually consistent at remote sites, changes are propagated asynchronously. This covers use cases such as backing up the object storage to an external cloud cluster, a custom backup solution using tape drives, or indexing metadata in ElasticSearch.

21.8.1 Configuring synchronization modules

All synchronization modules are configured in a similar way. You need to create a new zone (refer to Section 21.13, “Multisite Object Gateways” for more details) and set its --tier_type option, for example --tier-type=cloud for the cloud synchronization module:

cephuser@adm > radosgw-admin zone create --rgw-zonegroup=ZONE-GROUP-NAME \
 --rgw-zone=ZONE-NAME \
 --endpoints=http://endpoint1.example.com,http://endpoint2.example.com, [...] \
 --tier-type=cloud

You can configure the specific tier by using the following command:

cephuser@adm > radosgw-admin zone modify --rgw-zonegroup=ZONE-GROUP-NAME \
 --rgw-zone=ZONE-NAME \
 --tier-config=KEY1=VALUE1,KEY2=VALUE2

The KEY in the configuration specifies the configuration variable that you want to update, and the VALUE specifies its new value. Nested values can be accessed using period. For example:

cephuser@adm > radosgw-admin zone modify --rgw-zonegroup=ZONE-GROUP-NAME \
 --rgw-zone=ZONE-NAME \
 --tier-config=connection.access_key=KEY,connection.secret=SECRET

You can access array entries by appending square brackets '[]' with the referenced entry. You can add a new array entry by using square brackets '[]'. Index value of -1 references the last entry in the array. It is not possible to create a new entry and reference it again in the same command. For example, a command to create a new profile for buckets starting with PREFIX follows:

cephuser@adm > radosgw-admin zone modify --rgw-zonegroup=ZONE-GROUP-NAME \
 --rgw-zone=ZONE-NAME \
 --tier-config=profiles[].source_bucket=PREFIX'*'
cephuser@adm > radosgw-admin zone modify --rgw-zonegroup=ZONE-GROUP-NAME \
 --rgw-zone=ZONE-NAME \
 --tier-config=profiles[-1].connection_id=CONNECTION_ID,profiles[-1].acls_id=ACLS_ID
Tip
Tip: Adding and removing configuration entries

You can add a new tier configuration entry by using the --tier-config-add=KEY=VALUE parameter.

You can remove an existing entry by using --tier-config-rm=KEY.

21.8.2 Synchronizing zones

A synchronization module configuration is local to a zone. The synchronization module determines whether the zone exports data or can only consume data that was modified in another zone. As of Luminous the supported synchronization plug-ins are ElasticSearch, rgw, which is the default synchronization plug-in that synchronizes data between the zones and log which is a trivial synchronization plug-in that logs the metadata operation that happens in the remote zones. The following sections are written with the example of a zone using ElasticSearch synchronization module. The process would be similar for configuring any other synchronization plug-in.

Note
Note: Default synchronization plug-in

rgw is the default synchronization plug-in and there is no need to explicitly configure this.

21.8.2.1 Requirements and assumptions

Let us assume a simple multisite configuration as described in Section 21.13, “Multisite Object Gateways” consists of 2 zones: us-east and us-west. Now we add a third zone us-east-es which is a zone that only processes metadata from the other sites. This zone can be in the same or a different Ceph cluster than us-east. This zone would only consume metadata from other zones and Object Gateways in this zone will not serve any end user requests directly.

21.8.2.2 Configuring zones

  1. Create the third zone similar to the ones described in Section 21.13, “Multisite Object Gateways”, for example

    cephuser@adm > radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-es \
    --access-key=SYSTEM-KEY --secret=SECRET --endpoints=http://rgw-es:80
  2. A synchronization module can be configured for this zone via the following:

    cephuser@adm > radosgw-admin zone modify --rgw-zone=ZONE-NAME --tier-type=TIER-TYPE \
    --tier-config={set of key=value pairs}
  3. For example in the ElasticSearch synchronization module

    cephuser@adm > radosgw-admin zone modify --rgw-zone=ZONE-NAME --tier-type=elasticsearch \
    --tier-config=endpoint=http://localhost:9200,num_shards=10,num_replicas=1

    For the various supported tier-config options refer to Section 21.8.3, “ElasticSearch synchronization module”.

  4. Finally update the period

    cephuser@adm > radosgw-admin period update --commit
  5. Now start the Object Gateway in the zone

    cephuser@adm > ceph orch start rgw.REALM-NAME.ZONE-NAME

21.8.3 ElasticSearch synchronization module

This synchronization module writes the metadata from other zones to ElasticSearch. As of Luminous this is JSON of data fields we currently store in ElasticSearch.

{
  "_index" : "rgw-gold-ee5863d6",
  "_type" : "object",
  "_id" : "34137443-8592-48d9-8ca7-160255d52ade.34137.1:object1:null",
  "_score" : 1.0,
  "_source" : {
    "bucket" : "testbucket123",
    "name" : "object1",
    "instance" : "null",
    "versioned_epoch" : 0,
    "owner" : {
      "id" : "user1",
      "display_name" : "user1"
    },
    "permissions" : [
      "user1"
    ],
    "meta" : {
      "size" : 712354,
      "mtime" : "2017-05-04T12:54:16.462Z",
      "etag" : "7ac66c0f148de9519b8bd264312c4d64"
    }
  }
}

21.8.3.1 ElasticSearch tier type configuration parameters

endpoint

Specifies the ElasticSearch server endpoint to access.

num_shards

(integer) The number of shards that ElasticSearch will be configured with on data synchronization initialization. Note that this cannot be changed after initialization. Any change here requires rebuild of the ElasticSearch index and reinitialization of the data synchronization process.

num_replicas

(integer) The number of replicas that ElasticSearch will be configured with on data synchronization initialization.

explicit_custom_meta

(true | false) Specifies whether all user custom metadata will be indexed, or whether user will need to configure (at the bucket level) what customer metadata entries should be indexed. This is false by default

index_buckets_list

(comma separated list of strings) If empty, all buckets will be indexed. Otherwise, only buckets specified here will be indexed. It is possible to provide bucket prefixes (for example 'foo*'), or bucket suffixes (for example '*bar').

approved_owners_list

(comma separated list of strings) If empty, buckets of all owners will be indexed (subject to other restrictions), otherwise, only buckets owned by specified owners will be indexed. Suffixes and prefixes can also be provided.

override_index_path

(string) if not empty, this string will be used as the ElasticSearch index path. Otherwise the index path will be determined and generated on synchronization initialization.

username

Specifies a user name for ElasticSearch if authentication is required.

password

Specifies a password for ElasticSearch if authentication is required.

21.8.3.2 Metadata queries

Since the ElasticSearch cluster now stores object metadata, it is important that the ElasticSearch endpoint is not exposed to the public and only accessible to the cluster administrators. For exposing metadata queries to the end user itself this poses a problem since we'd want the user to only query their metadata and not of any other users, this would require the ElasticSearch cluster to authenticate users in a way similar to RGW does which poses a problem.

As of Luminous RGW in the metadata master zone can now service end user requests. This allows for not exposing the ElasticSearch endpoint in public and also solves the authentication and authorization problem since RGW itself can authenticate the end user requests. For this purpose RGW introduces a new query in the bucket APIs that can service ElasticSearch requests. All these requests must be sent to the metadata master zone.

Get an ElasticSearch Query
GET /BUCKET?query=QUERY-EXPR

request params:

  • max-keys: max number of entries to return

  • marker: pagination marker

expression := [(]<arg> <op> <value> [)][<and|or> ...]

op is one of the following: <, <=, ==, >=, >

For example:

GET /?query=name==foo

Will return all the indexed keys that user has read permission to, and are named 'foo'. The output will be a list of keys in XML that is similar to the S3 list buckets response.

Configure custom metadata fields

Define which custom metadata entries should be indexed (under the specified bucket), and what are the types of these keys. If explicit custom metadata indexing is configured, this is needed so that rgw will index the specified custom metadata values. Otherwise it is needed in cases where the indexed metadata keys are of a type other than string.

POST /BUCKET?mdsearch
x-amz-meta-search: <key [; type]> [, ...]

Multiple metadata fields must be comma separated, a type can be forced for a field with a `;`. The currently allowed types are string(default), integer and date, for example, if you want to index a custom object metadata x-amz-meta-year as int, x-amz-meta-date as type date and x-amz-meta-title as string, you would do

POST /mybooks?mdsearch
x-amz-meta-search: x-amz-meta-year;int, x-amz-meta-release-date;date, x-amz-meta-title;string
Delete custom metadata configuration

Delete custom metadata bucket configuration.

DELETE /BUCKET?mdsearch
Get custom metadata configuration

Retrieve custom metadata bucket configuration.

GET /BUCKET?mdsearch

21.8.4 Cloud synchronization module

This section introduces a module that synchronizes the zone data to a remote cloud service. The synchronization is only unidirectional—the date is not synchronized back from the remote zone. The main goal of this module is to enable synchronizing data to multiple cloud service providers. Currently it supports cloud providers that are compatible with AWS (S3).

To synchronize data to a remote cloud service, you need to configure user credentials. Because many cloud services introduce limits on the number of buckets that each user can create, you can configure the mapping of source objects and buckets, different targets to different buckets and bucket prefixes. Note that source access lists (ACLs) will not be preserved. It is possible to map permissions of specific source users to specific destination users.

Because of API limitations, there is no way to preserve original object modification time and HTTP entity tag (ETag). The cloud synchronization module stores these as metadata attributes on the destination objects.

21.8.4.1 Configuring the cloud synchronization module

Following are examples of a trivial and non-trivial configuration for the cloud synchronization module. Note that the trivial configuration can collide with the non-trivial one.

Example 21.1: Trivial configuration
{
  "connection": {
    "access_key": ACCESS,
    "secret": SECRET,
    "endpoint": ENDPOINT,
    "host_style": path | virtual,
  },
  "acls": [ { "type": id | email | uri,
    "source_id": SOURCE_ID,
    "dest_id": DEST_ID } ... ],
  "target_path": TARGET_PATH,
}
Example 21.2: Non-trivial configuration
{
  "default": {
    "connection": {
      "access_key": ACCESS,
      "secret": SECRET,
      "endpoint": ENDPOINT,
      "host_style" path | virtual,
    },
    "acls": [
    {
      "type": id | email | uri,   #  optional, default is id
      "source_id": ID,
      "dest_id": ID
    } ... ]
    "target_path": PATH # optional
  },
  "connections": [
  {
    "connection_id": ID,
    "access_key": ACCESS,
    "secret": SECRET,
    "endpoint": ENDPOINT,
    "host_style": path | virtual,  # optional
  } ... ],
  "acl_profiles": [
  {
    "acls_id": ID, # acl mappings
    "acls": [ {
      "type": id | email | uri,
      "source_id": ID,
      "dest_id": ID
    } ... ]
  }
  ],
  "profiles": [
  {
   "source_bucket": SOURCE,
   "connection_id": CONNECTION_ID,
   "acls_id": MAPPINGS_ID,
   "target_path": DEST,          # optional
  } ... ],
}

Explanation of used configuration terms follows:

connection

Represents a connection to the remote cloud service. Contains 'connection_id', 'access_key', 'secret', 'endpoint', and 'host_style'.

access_key

The remote cloud access key that will be used for the specific connection.

secret

The secret key for the remote cloud service.

endpoint

URL of remote cloud service endpoint.

host_style

Type of host style ('path' or 'virtual') to be used when accessing remote cloud endpoint. Default is 'path'.

acls

Array of access list mappings.

acl_mapping

Each 'acl_mapping' structure contains 'type', 'source_id', and 'dest_id'. These will define the ACL mutation for each object. An ACL mutation allows converting source user ID to a destination ID.

type

ACL type: 'id' defines user ID, 'email' defines user by email, and 'uri' defines user by uri (group).

source_id

ID of user in the source zone.

dest_id

ID of user in the destination.

target_path

A string that defines how the target path is created. The target path specifies a prefix to which the source object name is appended. The target path configurable can include any of the following variables:

SID

A unique string that represents the synchronization instance ID.

ZONEGROUP

Zonegroup name.

ZONEGROUP_ID

Zonegroup ID.

ZONE

Zone name.

ZONE_ID

Zone ID.

BUCKET

Source bucket name.

OWNER

Source bucket owner ID.

For example: target_path = rgwx-ZONE-SID/OWNER/BUCKET

acl_profiles

An array of access list profiles.

acl_profile

Each profile contains 'acls_id' that represents the profile, and an 'acls' array that holds a list of 'acl_mappings'.

profiles

A list of profiles. Each profile contains the following:

source_bucket

Either a bucket name, or a bucket prefix (if ends with *) that defines the source bucket(s) for this profile.

target_path

See above for the explanation.

connection_id

ID of the connection that will be used for this profile.

acls_id

ID of ACL's profile that will be used for this profile.

21.8.4.2 S3 specific configurables

The cloud synchronization module will only work with back-ends that are compatible with AWS S3. There are a few configurables that can be used to tweak its behavior when accessing S3 cloud services:

{
  "multipart_sync_threshold": OBJECT_SIZE,
  "multipart_min_part_size": PART_SIZE
}
multipart_sync_threshold

Objects whose size is equal to or larger than this value will be synchronized with the cloud service using multipart upload.

multipart_min_part_size

Minimum parts size to use when synchronizing objects using multipart upload.

21.8.5 Archive synchronization module

The archive sync module uses the versioning feature of S3 objects in Object Gateway. You can configure an archive zone that captures the different versions of S3 objects as they occur over time in other zones. The history of versions that the archive zone keeps can only be eliminated via gateways associated with the archive zone.

With such an architecture, several non-versioned zones can mirror their data and metadata via their zone gateways providing high availability to the end users, while the archive zone captures all the data updates to consolidate them as versions of S3 objects.

By including the archive zone in a multi-zone configuration, you gain the flexibility of an S3 object history in one zone while saving the space that the replicas of the versioned S3 objects would consume in the remaining zones.

21.8.5.1 Configuring the archive synchronization module

Tip
Tip: More information

Refer to Section 21.13, “Multisite Object Gateways” for details on configuring multisite gateways.

Refer to Section 21.8, “Synchronization modules” for details on configuring synchronization modules.

To use the archive sync module, you need to create a new zone whose tier type is set to archive:

cephuser@adm > radosgw-admin zone create --rgw-zonegroup=ZONE_GROUP_NAME \
 --rgw-zone=OGW_ZONE_NAME \
 --endpoints=http://OGW_ENDPOINT1_URL[,http://OGW_ENDPOINT2_URL,...]
 --tier-type=archive

21.9 LDAP authentication

Apart from the default local user authentication, Object Gateway can use LDAP server services to authenticate users as well.

21.9.1 Authentication mechanism

The Object Gateway extracts the user's LDAP credentials from a token. A search filter is constructed from the user name. The Object Gateway uses the configured service account to search the directory for a matching entry. If an entry is found, the Object Gateway attempts to bind to the found distinguished name with the password from the token. If the credentials are valid, the bind will succeed, and the Object Gateway grants access.

You can limit the allowed users by setting the base for the search to a specific organizational unit or by specifying a custom search filter, for example requiring specific group membership, custom object classes, or attributes.

21.9.2 Requirements

  • LDAP or Active Directory: A running LDAP instance accessible by the Object Gateway.

  • Service account: LDAP credentials to be used by the Object Gateway with search permissions.

  • User account: At least one user account in the LDAP directory.

Important
Important: Do not overlap LDAP and local users

You should not use the same user names for local users and for users being authenticated by using LDAP. The Object Gateway cannot distinguish them and it treats them as the same user.

Tip
Tip: Sanity checks

Use the ldapsearch utility to verify the service account or the LDAP connection. For example:

> ldapsearch -x -D "uid=ceph,ou=system,dc=example,dc=com" -W \
-H ldaps://example.com -b "ou=users,dc=example,dc=com" 'uid=*' dn

Make sure to use the same LDAP parameters as in the Ceph configuration file to eliminate possible problems.

21.9.3 Configuring Object Gateway to use LDAP authentication

The following parameters are related to the LDAP authentication:

rgw_s3_auth_use_ldap

Set this option to true to enable S3 authentication with LDAP.

rgw_ldap_uri

Specifies the LDAP server to use. Make sure to use the ldaps://FQDN:PORT parameter to avoid transmitting the plain text credentials openly.

rgw_ldap_binddn

The Distinguished Name (DN) of the service account used by the Object Gateway.

rgw_ldap_secret

The password for the service account.

rgw_ldap_searchdn

Specifies the base in the directory information tree for searching users. This might be your users organizational unit or some more specific Organizational Unit (OU).

rgw_ldap_dnattr

The attribute being used in the constructed search filter to match a user name. Depending on your Directory Information Tree (DIT) this would probably be uid or cn.

rgw_search_filter

If not specified, the Object Gateway automatically constructs the search filter with the rgw_ldap_dnattr setting. Use this parameter to narrow the list of allowed users in very flexible ways. Consult Section 21.9.4, “Using a custom search filter to limit user access” for details.

21.9.4 Using a custom search filter to limit user access

There are two ways you can use the rgw_search_filter parameter.

21.9.4.1 Partial filter to further limit the constructed search filter

An example of a partial filter:

"objectclass=inetorgperson"

The Object Gateway will generate the search filter as usual with the user name from the token and the value of rgw_ldap_dnattr. The constructed filter is then combined with the partial filter from the rgw_search_filter attribute. Depending on the user name and the settings the final search filter may become:

"(&(uid=hari)(objectclass=inetorgperson))"

In that case, user 'hari' will only be granted access if he is found in the LDAP directory, has an object class of 'inetorgperson', and did specify a valid password.

21.9.4.2 Complete filter

A complete filter must contain a USERNAME token which will be substituted with the user name during the authentication attempt. The rgw_ldap_dnattr parameter is not used anymore in this case. For example, to limit valid users to a specific group, use the following filter:

"(&(uid=USERNAME)(memberOf=cn=ceph-users,ou=groups,dc=mycompany,dc=com))"
Note
Note: memberOf attribute

Using the memberOf attribute in LDAP searches requires server side support from you specific LDAP server implementation.

21.9.5 Generating an access token for LDAP authentication

The radosgw-token utility generates the access token based on the LDAP user name and password. It outputs a base-64 encoded string which is the actual access token. Use your favorite S3 client (refer to Section 21.5.1, “Accessing Object Gateway”) and specify the token as the access key and use an empty secret key.

> export RGW_ACCESS_KEY_ID="USERNAME"
> export RGW_SECRET_ACCESS_KEY="PASSWORD"
cephuser@adm > radosgw-token --encode --ttype=ldap
Important
Important: Clear text credentials

The access token is a base-64 encoded JSON structure and contains the LDAP credentials as a clear text.

Note
Note: Active Directory

For Active Directory, use the --ttype=ad parameter.

21.10 Bucket index sharding

The Object Gateway stores bucket index data in an index pool, which defaults to .rgw.buckets.index. If you put too many (hundreds of thousands) objects into a single bucket and the quota for maximum number of objects per bucket (rgw bucket default quota max objects) is not set, the performance of the index pool may degrade. Bucket index sharding prevents such performance decreases and allows a high number of objects per bucket.

21.10.1 Bucket index resharding

If a bucket has grown large and its initial configuration is not sufficient anymore, the bucket's index pool needs to be resharded. You can either use automatic online bucket index resharding (refer to Section 21.10.1.1, “Dynamic resharding”), or reshard the bucket index offline manually (refer to Section 21.10.1.2, “Resharding manually”).

21.10.1.1 Dynamic resharding

From SUSE Enterprise Storage 5, we support online bucket resharding. This detects if the number of objects per bucket reaches a certain threshold, and automatically increases the number of shards used by the bucket index. This process reduces the number of entries in each bucket index shard.

The detection process runs:

  • When new objects are added to the bucket.

  • In a background process that periodically scans all the buckets. This is needed in order to deal with existing buckets that are not being updated.

A bucket that requires resharding is added to the reshard_log queue and will be scheduled to be resharded later. The reshard threads run in the background and execute the scheduled resharding, one at a time.

Configuring dynamic resharding
rgw_dynamic_resharding

Enables or disables dynamic bucket index resharding. Possible values are 'true' or 'false'. Defaults to 'true'.

rgw_reshard_num_logs

Number of shards for the resharding log. Defaults to 16.

rgw_reshard_bucket_lock_duration

Duration of lock on the bucket object during resharding. Defaults to 120 seconds.

rgw_max_objs_per_shard

Maximum number of objects per bucket index shard. Defaults to 100000 objects.

rgw_reshard_thread_interval

Maximum time between rounds of reshard thread processing. Defaults to 600 seconds.

Commands to administer the resharding process
Add a bucket to the resharding queue:
cephuser@adm > radosgw-admin reshard add \
 --bucket BUCKET_NAME \
 --num-shards NEW_NUMBER_OF_SHARDS
List resharding queue:
cephuser@adm > radosgw-admin reshard list
Process/schedule a bucket resharding:
cephuser@adm > radosgw-admin reshard process
Display the bucket resharding status:
cephuser@adm > radosgw-admin reshard status --bucket BUCKET_NAME
Cancel pending bucket resharding:
cephuser@adm > radosgw-admin reshard cancel --bucket BUCKET_NAME

21.10.1.2 Resharding manually

Dynamic resharding as mentioned in Section 21.10.1.1, “Dynamic resharding” is supported only for simple Object Gateway configurations. For multisite configurations, use manual resharding as described in this section.

To reshard the bucket index manually offline, use the following command:

cephuser@adm > radosgw-admin bucket reshard

The bucket reshard command performs the following:

  • Creates a new set of bucket index objects for the specified object.

  • Spreads all entries of these index objects.

  • Creates a new bucket instance.

  • Links the new bucket instance with the bucket so that all new index operations go through the new bucket indexes.

  • Prints the old and the new bucket ID to the standard output.

Tip
Tip

When choosing a number of shards, note the following: aim for no more than 100000 entries per shard. Bucket index shards that are prime numbers tend to work better in evenly distributing bucket index entries across the shards. For example, 503 bucket index shards is better than 500 since the former is prime.

Warning
Warning

Multi-site configurations do not support resharding a bucket index.

For multi-site configurations, resharding a bucket index requires resynchronizing all data from the master zone to all slave zones. Depending on the bucket size, this can take a considerable amount of time and resources.

Procedure 21.1: Resharding the bucket index
  1. Make sure that all operations to the bucket are stopped.

  2. Back up the original bucket index:

    cephuser@adm > radosgw-admin bi list \
     --bucket=BUCKET_NAME \
     > BUCKET_NAME.list.backup
  3. Reshard the bucket index:

     cephuser@adm > radosgw-admin bucket reshard \
     --bucket=BUCKET_NAME \
     --num-shards=NEW_SHARDS_NUMBER
    Tip
    Tip: Old bucket ID

    As part of its output, this command also prints the new and the old bucket ID.

21.10.2 Bucket index sharding for new buckets

There are two options that affect bucket index sharding:

  • Use the rgw_override_bucket_index_max_shards option for simple configurations.

  • Use the bucket_index_max_shards option for multisite configurations.

Setting the options to 0 disables bucket index sharding. A value greater than 0 enables bucket index sharding and sets the maximum number of shards.

The following formula helps you calculate the recommended number of shards:

number_of_objects_expected_in_a_bucket / 100000

Be aware that the maximum number of shards is 7877.

21.10.2.1 Multisite configurations

Multisite configurations can have a different index pool to manage failover. To configure a consistent shard count for zones in one zone group, set the bucket_index_max_shards option in the zone group's configuration:

  1. Export the zonegroup configuration to the zonegroup.json file:

    cephuser@adm > radosgw-admin zonegroup get > zonegroup.json
  2. Edit the zonegroup.json file and set the bucket_index_max_shards option for each named zone.

  3. Reset the zonegroup:

    cephuser@adm > radosgw-admin zonegroup set < zonegroup.json
  4. Update the period. See Section 21.13.3.6, “Update the period”.

21.11 OpenStack Keystone integration

OpenStack Keystone is an identity service for the OpenStack product. You can integrate the Object Gateway with Keystone to set up a gateway that accepts a Keystone authentication token. A user authorized by Keystone to access the gateway will be verified on the Ceph Object Gateway side and automatically created if needed. The Object Gateway queries Keystone periodically for a list of revoked tokens.

21.11.1 Configuring OpenStack

Before configuring the Ceph Object Gateway, you need to configure the OpenStack Keystone to enable the Swift service and point it to the Ceph Object Gateway:

  1. Set the Swift service. To use OpenStack to validate Swift users, first create the Swift service:

    > openstack service create \
     --name=swift \
     --description="Swift Service" \
     object-store
  2. Set the endpoints. After you create the Swift service, point to the Ceph Object Gateway. Replace REGION_NAME with the name of the gateway’s zonegroup name or region name.

    > openstack endpoint create --region REGION_NAME \
     --publicurl   "http://radosgw.example.com:8080/swift/v1" \
     --adminurl    "http://radosgw.example.com:8080/swift/v1" \
     --internalurl "http://radosgw.example.com:8080/swift/v1" \
     swift
  3. Verify the settings. After you create the Swift service and set the endpoints, show the endpoints to verify that all the settings are correct.

    > openstack endpoint show object-store

21.11.2 Configuring the Ceph Object Gateway

21.11.2.1 Configure SSL certificates

The Ceph Object Gateway queries Keystone periodically for a list of revoked tokens. These requests are encoded and signed. Keystone may be also configured to provide self-signed tokens, which are also encoded and signed. You need to configure the gateway so that it can decode and verify these signed messages. Therefore, the OpenSSL certificates that Keystone uses to create the requests need to be converted to the 'nss db' format:

# mkdir /var/ceph/nss
# openssl x509 -in /etc/keystone/ssl/certs/ca.pem \
 -pubkey | certutil -d /var/ceph/nss -A -n ca -t "TCu,Cu,Tuw"
rootopenssl x509 -in /etc/keystone/ssl/certs/signing_cert.pem \
 -pubkey | certutil -A -d /var/ceph/nss -n signing_cert -t "P,P,P"

To allow Ceph Object Gateway to interact with OpenStack Keystone, OpenStack Keystone can use a self-signed SSL certificate. Either install Keystone’s SSL certificate on the node running the Ceph Object Gateway, or alternatively set the value of the option rgw keystone verify ssl to 'false'. Setting rgw keystone verify ssl to 'false' means that the gateway will not attempt to verify the certificate.

21.11.2.2 Configure the Object Gateway's options

You can configure Keystone integration using the following options:

rgw keystone api version

Version of the Keystone API. Valid options are 2 or 3. Defaults to 2.

rgw keystone url

The URL and port number of the administrative RESTful API on the Keystone server. Follows the pattern SERVER_URL:PORT_NUMBER.

rgw keystone admin token

The token or shared secret that is configured internally in Keystone for administrative requests.

rgw keystone accepted roles

The roles required to serve requests. Defaults to 'Member, admin'.

rgw keystone accepted admin roles

The list of roles allowing a user to gain administrative privileges.

rgw keystone token cache size

The maximum number of entries in the Keystone token cache.

rgw keystone revocation interval

The number of seconds before checking revoked tokens. Defaults to 15 * 60.

rgw keystone implicit tenants

Create new users in their own tenants of the same name. Defaults to 'false'.

rgw s3 auth use keystone

If set to 'true', the Ceph Object Gateway will authenticate users using Keystone. Defaults to 'false'.

nss db path

The path to the NSS database.

It is also possible to configure the Keystone service tenant, user, and password for Keystone (for version 2.0 of the OpenStack Identity API), similar to the way OpenStack services tend to be configured. This way you can avoid setting the shared secret rgw keystone admin token in the configuration file, which should be disabled in production environments. The service tenant credentials should have admin privileges. For more details refer to the official OpenStack Keystone documentation. The related configuration options follow:

rgw keystone admin user

The Keystone administrator user name.

rgw keystone admin password

The keystone administrator user password.

rgw keystone admin tenant

The Keystone version 2.0 administrator user tenant.

A Ceph Object Gateway user is mapped to a Keystone tenant. A Keystone user has different roles assigned to it, possibly on more than one tenant. When the Ceph Object Gateway gets the ticket, it looks at the tenant and the user roles that are assigned to that ticket, and accepts or rejects the request according to the setting of the rgw keystone accepted roles option.

Tip
Tip: Mapping to OpenStack tenants

Although Swift tenants are mapped to the Object Gateway user by default, they can be also mapped to OpenStack tenants via the rgw keystone implicit tenants option. This will make containers use the tenant namespace instead of the S3 like global namespace that the Object Gateway defaults to. We recommend deciding on the mapping method at the planning stage to avoid confusion. The reason for this is that toggling the option later affects only newer requests which get mapped under a tenant, while older buckets created before still continue to be in a global namespace.

For version 3 of the OpenStack Identity API, you should replace the rgw keystone admin tenant option with:

rgw keystone admin domain

The Keystone administrator user domain.

rgw keystone admin project

The Keystone administrator user project.

21.12 Pool placement and storage classes

21.12.1 Displaying placement targets

Placement targets control which pools are associated with a particular bucket. A bucket’s placement target is selected on creation, and cannot be modified. You can display its placement_rule by running the following command:

cephuser@adm > radosgw-admin bucket stats

The zonegroup configuration contains a list of placement targets with an initial target named 'default-placement'. The zone configuration then maps each zonegroup placement target name onto its local storage. This zone placement information includes the 'index_pool' name for the bucket index, the 'data_extra_pool' name for metadata about incomplete multipart uploads, and a 'data_pool' name for each storage class.

21.12.2 Storage classes

Storage classes help customizing the placement of object data. S3 Bucket Lifecycle rules can automate the transition of objects between storage classes.

Storage classes are defined in terms of placement targets. Each zonegroup placement target lists its available storage classes with an initial class named 'STANDARD'. The zone configuration is responsible for providing a 'data_pool' pool name for each of the zonegroup’s storage classes.

21.12.3 Configuring zonegroups and zones

Use the radosgw-admin command on the zonegroups and zones to configure their placement. You can query the zonegroup placement configuration using the following command:

cephuser@adm > radosgw-admin zonegroup get
{
    "id": "ab01123f-e0df-4f29-9d71-b44888d67cd5",
    "name": "default",
    "api_name": "default",
    ...
    "placement_targets": [
        {
            "name": "default-placement",
            "tags": [],
            "storage_classes": [
                "STANDARD"
            ]
        }
    ],
    "default_placement": "default-placement",
    ...
}

To query the zone placement configuration, run:

cephuser@adm > radosgw-admin zone get
{
    "id": "557cdcee-3aae-4e9e-85c7-2f86f5eddb1f",
    "name": "default",
    "domain_root": "default.rgw.meta:root",
    ...
    "placement_pools": [
        {
            "key": "default-placement",
            "val": {
                "index_pool": "default.rgw.buckets.index",
                "storage_classes": {
                    "STANDARD": {
                        "data_pool": "default.rgw.buckets.data"
                    }
                },
                "data_extra_pool": "default.rgw.buckets.non-ec",
                "index_type": 0
            }
        }
    ],
    ...
}
Note
Note: No previous multisite configuration

If you have not done any previous multisite configuration, a 'default' zone and zonegroup are created for you, and changes to the zone/zonegroup will not take effect until you restart the Ceph Object Gateways. If you have created a realm for multisite, the zone/zonegroup changes will take effect after you commit the changes with the radosgw-admin period update --commit command.

21.12.3.1 Adding a placement target

To create a new placement target named 'temporary', start by adding it to the zonegroup:

cephuser@adm > radosgw-admin zonegroup placement add \
      --rgw-zonegroup default \
      --placement-id temporary

Then provide the zone placement info for that target:

cephuser@adm > radosgw-admin zone placement add \
      --rgw-zone default \
      --placement-id temporary \
      --data-pool default.rgw.temporary.data \
      --index-pool default.rgw.temporary.index \
      --data-extra-pool default.rgw.temporary.non-ec

21.12.3.2 Adding a storage class

To add a new storage class named 'COLD' to the 'default-placement' target, start by adding it to the zonegroup:

cephuser@adm > radosgw-admin zonegroup placement add \
      --rgw-zonegroup default \
      --placement-id default-placement \
      --storage-class COLD

Then provide the zone placement info for that storage class:

cephuser@adm > radosgw-admin zone placement add \
      --rgw-zone default \
      --placement-id default-placement \
      --storage-class COLD \
      --data-pool default.rgw.cold.data \
      --compression lz4

21.12.4 Placement customization

21.12.4.1 Editing default zonegroup placement

By default, new buckets will use the zonegroup’s default_placement target. You can change this zonegroup setting with:

cephuser@adm > radosgw-admin zonegroup placement default \
      --rgw-zonegroup default \
      --placement-id new-placement

21.12.4.2 Editing default user placement

A Ceph Object Gateway user can override the zonegroup’s default placement target by setting a non-empty default_placement field in the user info. Similarly, the default_storage_class can override the STANDARD storage class applied to objects by default.

cephuser@adm > radosgw-admin user info --uid testid
{
    ...
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    ...
}

If a zonegroup’s placement target contains any tags, users will be unable to create buckets with that placement target unless their user info contains at least one matching tag in its 'placement_tags' field. This can be useful to restrict access to certain types of storage.

The radosgw-admin command cannot modify these fields directly, therefore you need to edit the JSON format manually:

cephuser@adm > radosgw-admin metadata get user:USER-ID > user.json
> vi user.json     # edit the file as required
cephuser@adm > radosgw-admin metadata put user:USER-ID < user.json

21.12.4.3 Editing the S3 default bucket placement

When creating a bucket with the S3 protocol, a placement target can be provided as part of the LocationConstraint to override the default placement targets from the user and zonegroup.

Normally, the LocationConstraint needs to match the zonegroup’s api_name:

<LocationConstraint>default</LocationConstraint>

You can add a custom placement target to the api_name following a colon:

<LocationConstraint>default:new-placement</LocationConstraint>

21.12.4.4 Editing the Swift bucket placement

When creating a bucket with the Swift protocol, you can provide a placement target in the HTTP header's X-Storage-Policy:

X-Storage-Policy: NEW-PLACEMENT

21.12.5 Using storage classes

All placement targets have a STANDARD storage class which is applied to new objects by default. You can override this default with its default_storage_class.

To create an object in a non-default storage class, provide that storage class name in an HTTP header with the request. The S3 protocol uses the X-Amz-Storage-Class header, while the Swift protocol uses the X-Object-Storage-Class header.

You can use S3 Object Lifecycle Management to move object data between storage classes using Transition actions.

21.13 Multisite Object Gateways

Ceph supports several multi-site configuration options for the Ceph Object Gateway:

Multi-zone

A configuration consisting of one zonegroup and multiple zones, each zone with one or more ceph-radosgw instances. Each zone is backed by its own Ceph Storage Cluster. Multiple zones in a zone group provide disaster recovery for the zonegroup should one of the zones experience a significant failure. Each zone is active and may receive write operations. In addition to disaster recovery, multiple active zones may also serve as a foundation for content delivery networks.

Multi-zone-group

Ceph Object Gateway supports multiple zonegroups, each zonegroup with one or more zones. Objects stored to zones in one zonegroup within the same realm as another zonegroup share a global object namespace, ensuring unique object IDs across zonegroups and zones.

Note
Note

It is important to note that zonegroups only sync metadata amongst themselves. Data and metadata are replicated between the zones within the zonegroup. No data or metadata is shared across a realm.

Multiple realms

Ceph Object Gateway supports the notion of realms; a globally unique namespace. Multiple realms are supported which may encompass single or multiple zonegroups.

You can configure each Object Gateway to work in an active-active zone configuration, allowing for writes to non-master zones. The multi-site configuration is stored within a container called a realm. The realm stores zonegroups, zones, and a time period with multiple epochs for tracking changes to the configuration. The rgw daemons handle the synchronization, eliminating the need for a separate synchronization agent. This approach to synchronization allows the Ceph Object Gateway to operate with an active-active configuration instead of active-passive.

21.13.1 Requirements and assumptions

A multi-site configuration requires at least two Ceph storage clusters, and at least two Ceph Object Gateway instances, one for each Ceph storage cluster. The following configuration assumes at least two Ceph storage clusters are in geographically separate locations. However, the configuration can work on the same site. For example, named rgw1 and rgw2.

A multi-site configuration requires a master zonegroup and a master zone. A master zone is the source of truth with regard to all metadata operations in a multisite cluster. Additionally, each zonegroup requires a master zone. zonegroups may have one or more secondary or non-master zones. In this guide, the rgw1 host serves as the master zone of the master zonegroup and the rgw2 host serves as the secondary zone of the master zonegroup.

21.13.2 Limitations

Multi-site configurations do not support resharding a bucket index.

As a workaround, the bucket can be purged from the slave zones, resharded on the master zone, and then resynchronized. Depending on the contents of the bucket, this can be a time- and resource-intensive operation.

21.13.3 Configuring a master zone

All gateways in a multi-site configuration retrieve their configuration from a ceph-radosgw daemon on a host within the master zonegroup and master zone. To configure your gateways in a multi-site configuration, select a ceph-radosgw instance to configure the master zonegroup and master zone.

21.13.3.1 Creating a realm

A realm represents a globally unique namespace consisting of one or more zonegroups containing one or more zones. Zones contain buckets, which in turn contain objects. A realm enables the Ceph Object Gateway to support multiple namespaces and their configuration on the same hardware. A realm contains the notion of periods. Each period represents the state of the zonegroup and zone configuration in time. Each time you make a change to a zonegroup or zone, update the period and commit it. By default, the Ceph Object Gateway does not create a realm for backward compatibility. As a best practice, we recommend creating realms for new clusters.

Create a new realm called gold for the multi-site configuration by opening a command line interface on a host identified to serve in the master zonegroup and zone. Then, execute the following:

cephuser@adm > radosgw-admin realm create --rgw-realm=gold --default

If the cluster has a single realm, specify the --default flag. If --default is specified, radosgw-admin uses this realm by default. If --default is not specified, adding zone-groups and zones requires specifying either the --rgw-realm flag or the --realm-id flag to identify the realm when adding zonegroups and zones.

After creating the realm, radosgw-admin returns the realm configuration:

{
  "id": "4a367026-bd8f-40ee-b486-8212482ddcd7",
  "name": "gold",
  "current_period": "09559832-67a4-4101-8b3f-10dfcd6b2707",
  "epoch": 1
}
Note
Note

Ceph generates a unique ID for the realm, which allows the renaming of a realm if the need arises.

21.13.3.2 Creating a master zonegroup

A realm must have at least one zonegroup to serve as the master zonegroup for the realm. Create a new master zonegroup for the multi-site configuration by opening a command line interface on a host identified to serve in the master zonegroup and zone. Create a master zonegroup called us by executing the following:

cephuser@adm > radosgw-admin zonegroup create --rgw-zonegroup=us \
--endpoints=http://rgw1:80 --master --default

If the realm only has a single zonegroup, specify the --default flag. If --default is specified, radosgw-admin uses this zonegroup by default when adding new zones. If --default is not specified, adding zones requires either the --rgw-zonegroup flag or the --zonegroup-id flag to identify the zonegroup when adding or modifying zones.

After creating the master zonegroup, radosgw-admin returns the zonegroup configuration. For example:

{
 "id": "d4018b8d-8c0d-4072-8919-608726fa369e",
 "name": "us",
 "api_name": "us",
 "is_master": "true",
 "endpoints": [
     "http:\/\/rgw1:80"
 ],
 "hostnames": [],
 "hostnames_s3website": [],
 "master_zone": "",
 "zones": [],
 "placement_targets": [],
 "default_placement": "",
 "realm_id": "4a367026-bd8f-40ee-b486-8212482ddcd7"
}

21.13.3.3 Creating a master zone

Important
Important

Zones need to be created on a Ceph Object Gateway node that will be within the zone.

Create a new master zone for the multi-site configuration by opening a command line interface on a host identified to serve in the master zonegroup and zone. Execute the following:

cephuser@adm > radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east-1 \
--endpoints=http://rgw1:80 --access-key=SYSTEM_ACCESS_KEY --secret=SYSTEM_SECRET_KEY
Note
Note

The --access-key and --secret options are not specified in the above example. These settings are added to the zone when the user is created in the next section.

After creating the master zone, radosgw-admin returns the zone configuration. For example:

  {
      "id": "56dfabbb-2f4e-4223-925e-de3c72de3866",
      "name": "us-east-1",
      "domain_root": "us-east-1.rgw.meta:root",
      "control_pool": "us-east-1.rgw.control",
      "gc_pool": "us-east-1.rgw.log:gc",
      "lc_pool": "us-east-1.rgw.log:lc",
      "log_pool": "us-east-1.rgw.log",
      "intent_log_pool": "us-east-1.rgw.log:intent",
      "usage_log_pool": "us-east-1.rgw.log:usage",
      "reshard_pool": "us-east-1.rgw.log:reshard",
      "user_keys_pool": "us-east-1.rgw.meta:users.keys",
      "user_email_pool": "us-east-1.rgw.meta:users.email",
      "user_swift_pool": "us-east-1.rgw.meta:users.swift",
      "user_uid_pool": "us-east-1.rgw.meta:users.uid",
      "otp_pool": "us-east-1.rgw.otp",
      "system_key": {
          "access_key": "1555b35654ad1656d804",
          "secret_key": "h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q=="
      },
      "placement_pools": [
          {
              "key": "us-east-1-placement",
              "val": {
                  "index_pool": "us-east-1.rgw.buckets.index",
                  "storage_classes": {
                      "STANDARD": {
                          "data_pool": "us-east-1.rgw.buckets.data"
                      }
                  },
                  "data_extra_pool": "us-east-1.rgw.buckets.non-ec",
                  "index_type": 0
              }
          }
      ],
      "metadata_heap": "",
      "realm_id": ""
  }

21.13.3.4 Deleting the default zone and group

Important
Important

The following steps assume a multi-site configuration using newly installed systems that are not storing data yet. Do not delete the default zone and its pools if you are already using it to store data, or the data will be deleted and unrecoverable.

The default installation of Object Gateway creates the default zonegroup called default. Delete the default zone if it exists. Make sure to remove it from the default zonegroup first.

cephuser@adm > radosgw-admin zonegroup delete --rgw-zonegroup=default

Delete the default pools in your Ceph storage cluster if they exist:

Important
Important

The following step assumes a multi-site configuration using newly installed systems that are not currently storing data. Do not delete the default zonegroup if you are already using it to store data.

cephuser@adm > ceph osd pool rm default.rgw.control default.rgw.control --yes-i-really-really-mean-it
cephuser@adm > ceph osd pool rm default.rgw.data.root default.rgw.data.root --yes-i-really-really-mean-it
cephuser@adm > ceph osd pool rm default.rgw.gc default.rgw.gc --yes-i-really-really-mean-it
cephuser@adm > ceph osd pool rm default.rgw.log default.rgw.log --yes-i-really-really-mean-it
cephuser@adm > ceph osd pool rm default.rgw.meta default.rgw.meta --yes-i-really-really-mean-it
Warning
Warning

If you delete the default zonegroup, you are also deleting the system user. If your admin user keys are not propagated, the Object Gateway management functionality of the Ceph Dashboard will fail. Follow on to the next section to re-create your system user if you go ahead with this step.

21.13.3.5 Creating system users

The ceph-radosgw daemons must authenticate before pulling realm and period information. In the master zone, create a system user to simplify authentication between daemons:

cephuser@adm > radosgw-admin user create --uid=zone.user \
--display-name="Zone User" --access-key=SYSTEM_ACCESS_KEY \
--secret=SYSTEM_SECRET_KEY --system

Make a note of the access_key and secret_key as the secondary zones require them to authenticate with the master zone.

Add the system user to the master zone:

cephuser@adm > radosgw-admin zone modify --rgw-zone=us-east-1 \
--access-key=ACCESS-KEY --secret=SECRET

Update the period to make the changes take effect:

cephuser@adm > radosgw-admin period update --commit

21.13.3.6 Update the period

After updating the master zone configuration, update the period:

cephuser@adm > radosgw-admin period update --commit

After updating the period, radosgw-admin returns the period configuration. For example:

{
  "id": "09559832-67a4-4101-8b3f-10dfcd6b2707", "epoch": 1, "predecessor_uuid": "", "sync_status": [], "period_map":
  {
    "id": "09559832-67a4-4101-8b3f-10dfcd6b2707", "zonegroups": [], "short_zone_ids": []
  }, "master_zonegroup": "", "master_zone": "", "period_config":
  {
     "bucket_quota": {
     "enabled": false, "max_size_kb": -1, "max_objects": -1
     }, "user_quota": {
       "enabled": false, "max_size_kb": -1, "max_objects": -1
     }
  }, "realm_id": "4a367026-bd8f-40ee-b486-8212482ddcd7", "realm_name": "gold", "realm_epoch": 1
}
Note
Note

Updating the period changes the epoch and ensures that other zones receive the updated configuration.

21.13.3.7 Start the gateway

On the Object Gateway host, start and enable the Ceph Object Gateway service. To identify the unique FSID of the cluster, run ceph fsid. To identify the Object Gateway daemon name, run ceph orch ps --hostname HOSTNAME.

cephuser@ogw > systemctl start ceph-FSID@DAEMON_NAME
cephuser@ogw > systemctl enable ceph-FSID@DAEMON_NAME

21.13.4 Configure secondary zones

Zones within a zonegroup replicate all data to ensure that each zone has the same data. When creating the secondary zone, execute all of the following operations on a host identified to serve the secondary zone.

Note
Note

To add a third zone, follow the same procedures as for adding the secondary zone. Use different zone name.

Important
Important

You must execute metadata operations, such as user creation, on a host within the master zone. The master zone and the secondary zone can receive bucket operations, but the secondary zone redirects bucket operations to the master zone. If the master zone is down, bucket operations will fail.

21.13.4.1 Pulling the realm

Using the URL path, access key, and secret of the master zone in the master zonegroup, pull the realm configuration to the host. To pull a non-default realm, specify the realm using the --rgw-realm or --realm-id configuration options.

cephuser@adm > radosgw-admin realm pull --url=url-to-master-zone-gateway --access-key=access-key --secret=secret
Note
Note

Pulling the realm also retrieves the remote's current period configuration, and makes it the current period on this host as well.

If this realm is the default realm or the only realm, make the realm the default realm.

cephuser@adm > radosgw-admin realm default --rgw-realm=REALM-NAME

21.13.4.2 Creating a secondary zone

Create a secondary zone for the multi-site configuration by opening a command line interface on a host identified to serve the secondary zone. Specify the zonegroup ID, the new zone name and an endpoint for the zone. Do not use the --master flag. All zones run in an active-active configuration by default. If the secondary zone should not accept write operations, specify the --read-only flag to create an active-passive configuration between the master zone and the secondary zone. Additionally, provide the access_key and secret_key of the generated system user stored in the master zone of the master zonegroup. Execute the following:

cephuser@adm > radosgw-admin zone create --rgw-zonegroup=ZONE-GROUP-NAME\
 --rgw-zone=ZONE-NAME --endpoints=URL \
 --access-key=SYSTEM-KEY --secret=SECRET\
 --endpoints=http://FQDN:80 \
 [--read-only]

For example:

cephuser@adm > radosgw-admin zone create --rgw-zonegroup=us --endpoints=http://rgw2:80 \
--rgw-zone=us-east-2 --access-key=SYSTEM_ACCESS_KEY --secret=SYSTEM_SECRET_KEY
{
  "id": "950c1a43-6836-41a2-a161-64777e07e8b8",
  "name": "us-east-2",
  "domain_root": "us-east-2.rgw.data.root",
  "control_pool": "us-east-2.rgw.control",
  "gc_pool": "us-east-2.rgw.gc",
  "log_pool": "us-east-2.rgw.log",
  "intent_log_pool": "us-east-2.rgw.intent-log",
  "usage_log_pool": "us-east-2.rgw.usage",
  "user_keys_pool": "us-east-2.rgw.users.keys",
  "user_email_pool": "us-east-2.rgw.users.email",
  "user_swift_pool": "us-east-2.rgw.users.swift",
  "user_uid_pool": "us-east-2.rgw.users.uid",
  "system_key": {
      "access_key": "1555b35654ad1656d804",
      "secret_key": "h7GhxuBLTrlhVUyxSPUKUV8r\/2EI4ngqJxD7iBdBYLhwluN30JaT3Q=="
  },
  "placement_pools": [
      {
          "key": "default-placement",
          "val": {
              "index_pool": "us-east-2.rgw.buckets.index",
              "data_pool": "us-east-2.rgw.buckets.data",
              "data_extra_pool": "us-east-2.rgw.buckets.non-ec",
              "index_type": 0
          }
      }
  ],
  "metadata_heap": "us-east-2.rgw.meta",
  "realm_id": "815d74c2-80d6-4e63-8cfc-232037f7ff5c"
}
Important
Important

The following steps assume a multi-site configuration using newly-installed systems that are not yet storing data. Do not delete the default zone and its pools if you are already using it to store data, or the data will be lost and unrecoverable.

Delete the default zone if needed:

cephuser@adm > radosgw-admin zone delete --rgw-zone=default

Delete the default pools in your Ceph storage cluster if needed:

cephuser@adm > ceph osd pool rm default.rgw.control default.rgw.control --yes-i-really-really-mean-it
cephuser@adm > ceph osd pool rm default.rgw.data.root default.rgw.data.root --yes-i-really-really-mean-it
cephuser@adm > ceph osd pool rm default.rgw.gc default.rgw.gc --yes-i-really-really-mean-it
cephuser@adm > ceph osd pool rm default.rgw.log default.rgw.log --yes-i-really-really-mean-it
cephuser@adm > ceph osd pool rm default.rgw.users.uid default.rgw.users.uid --yes-i-really-really-mean-it

21.13.4.3 Updating the Ceph configuration file

Update the Ceph configuration file on the secondary zone hosts by adding the rgw_zone configuration option and the name of the secondary zone to the instance entry.

To do so, execute the following command:

cephuser@adm > ceph config set SERVICE_NAME rgw_zone us-west

21.13.4.4 Updating the period

After updating the master zone configuration, update the period:

cephuser@adm > radosgw-admin period update --commit
{
  "id": "b5e4d3ec-2a62-4746-b479-4b2bc14b27d1",
  "epoch": 2,
  "predecessor_uuid": "09559832-67a4-4101-8b3f-10dfcd6b2707",
  "sync_status": [ "[...]"
  ],
  "period_map": {
      "id": "b5e4d3ec-2a62-4746-b479-4b2bc14b27d1",
      "zonegroups": [
          {
              "id": "d4018b8d-8c0d-4072-8919-608726fa369e",
              "name": "us",
              "api_name": "us",
              "is_master": "true",
              "endpoints": [
                  "http:\/\/rgw1:80"
              ],
              "hostnames": [],
              "hostnames_s3website": [],
              "master_zone": "83859a9a-9901-4f00-aa6d-285c777e10f0",
              "zones": [
                  {
                      "id": "83859a9a-9901-4f00-aa6d-285c777e10f0",
                      "name": "us-east-1",
                      "endpoints": [
                          "http:\/\/rgw1:80"
                      ],
                      "log_meta": "true",
                      "log_data": "false",
                      "bucket_index_max_shards": 0,
                      "read_only": "false"
                  },
                  {
                      "id": "950c1a43-6836-41a2-a161-64777e07e8b8",
                      "name": "us-east-2",
                      "endpoints": [
                          "http:\/\/rgw2:80"
                      ],
                      "log_meta": "false",
                      "log_data": "true",
                      "bucket_index_max_shards": 0,
                      "read_only": "false"
                  }

              ],
              "placement_targets": [
                  {
                      "name": "default-placement",
                      "tags": []
                  }
              ],
              "default_placement": "default-placement",
              "realm_id": "4a367026-bd8f-40ee-b486-8212482ddcd7"
          }
      ],
      "short_zone_ids": [
          {
              "key": "83859a9a-9901-4f00-aa6d-285c777e10f0",
              "val": 630926044
          },
          {
              "key": "950c1a43-6836-41a2-a161-64777e07e8b8",
              "val": 4276257543
          }

      ]
  },
  "master_zonegroup": "d4018b8d-8c0d-4072-8919-608726fa369e",
  "master_zone": "83859a9a-9901-4f00-aa6d-285c777e10f0",
  "period_config": {
      "bucket_quota": {
          "enabled": false,
          "max_size_kb": -1,
          "max_objects": -1
      },
      "user_quota": {
          "enabled": false,
          "max_size_kb": -1,
          "max_objects": -1
      }
  },
  "realm_id": "4a367026-bd8f-40ee-b486-8212482ddcd7",
  "realm_name": "gold",
  "realm_epoch": 2
}
Note
Note

Updating the period changes the epoch and ensures that other zones receive the updated configuration.

21.13.4.5 Starting the Object Gateway

On the Object Gateway host, start and enable the Ceph Object Gateway service:

cephuser@adm > ceph orch start rgw.us-east-2

21.13.4.6 Checking the synchronization status

When the secondary zone is up and running, check the synchronization status. Synchronization copies users and buckets created in the master zone to the secondary zone.

cephuser@adm > radosgw-admin sync status

The output provides the status of synchronization operations. For example:

realm f3239bc5-e1a8-4206-a81d-e1576480804d (gold)
    zonegroup c50dbb7e-d9ce-47cc-a8bb-97d9b399d388 (us)
         zone 4c453b70-4a16-4ce8-8185-1893b05d346e (us-west)
metadata sync syncing
              full sync: 0/64 shards
              metadata is caught up with master
              incremental sync: 64/64 shards
    data sync source: 1ee9da3e-114d-4ae3-a8a4-056e8a17f532 (us-east)
                      syncing
                      full sync: 0/128 shards
                      incremental sync: 128/128 shards
                      data is caught up with source
Note
Note

Secondary zones accept bucket operations; however, secondary zones redirect bucket operations to the master zone and then synchronize with the master zone to receive the result of the bucket operations. If the master zone is down, bucket operations executed on the secondary zone will fail, but object operations should succeed.

21.13.4.7 Verification of an Object

By default, objects are not verified again after the synchronization of an object was successful. To enable verification, set the rgw_sync_obj_etag_verify option to true. After enabling, the optional objects will be synchronized. An additional MD5 checksum will verify that it is computed on the source and the destination. This is to ensure the integrity of the objects fetched from a remote server over HTTP including multisite sync. This option can decrease the performance of RGWs as more computation is needed.

21.13.5 General Object Gateway maintenance

21.13.5.1 Checking the synchronization status

Information about the replication status of a zone can be queried with:

cephuser@adm > radosgw-admin sync status
        realm b3bc1c37-9c44-4b89-a03b-04c269bea5da (gold)
    zonegroup f54f9b22-b4b6-4a0e-9211-fa6ac1693f49 (us)
         zone adce11c9-b8ed-4a90-8bc5-3fc029ff0816 (us-west)
        metadata sync syncing
              full sync: 0/64 shards
              incremental sync: 64/64 shards
              metadata is behind on 1 shards
              oldest incremental change not applied: 2017-03-22 10:20:00.0.881361s
data sync source: 341c2d81-4574-4d08-ab0f-5a2a7b168028 (us-east)
                  syncing
                  full sync: 0/128 shards
                  incremental sync: 128/128 shards
                  data is caught up with source
          source: 3b5d1a3f-3f27-4e4a-8f34-6072d4bb1275 (us-3)
                  syncing
                  full sync: 0/128 shards
                  incremental sync: 128/128 shards
                  data is caught up with source

The output can differ depending on the sync status. The shards are described as two different types during sync:

Behind shards

Behind shards are shards that need a full data sync and shards needing an incremental data sync because they are not up-to-date.

Recovery shards

Recovery shards are shards that encountered an error during sync and marked for retry. The error mostly occurs on minor issues like acquiring a lock on a bucket. This will typically resolve itself.

21.13.5.2 Check the logs

For multi-site only, you can check out the metadata log (mdlog), the bucket index log (bilog) and the data log (datalog). You can list them and also trim them. This is not needed in most cases as rgw_sync_log_trim_interval option is set to 20 minutes as default. If it is not manually set to 0, you will not have to trim it at any time as it could cause side effects otherwise.

21.13.5.3 Changing the metadata master zone

Important
Important

Be careful when changing which zone is the metadata master. If a zone has not finished syncing metadata from the current master zone, it is unable to serve any remaining entries when promoted to master and those changes will be lost. For this reason, we recommend waiting for a zone's radosgw-admin synchronization status to catch up on metadata sync before promoting it to master. Similarly, if changes to metadata are being processed by the current master zone while another zone is being promoted to master, those changes are likely to be lost. To avoid this, we recommend shutting down any Object Gateway instances on the previous master zone. After promoting another zone, its new period can be fetched with radosgw-admin period pull and the gateway(s) can be restarted.

To promote a zone (for example, zone us-west in zonegroup us) to metadata master, run the following commands on that zone:

cephuser@ogw > radosgw-admin zone modify --rgw-zone=us-west --master
cephuser@ogw > radosgw-admin zonegroup modify --rgw-zonegroup=us --master
cephuser@ogw > radosgw-admin period update --commit

This generates a new period, and the Object Gateway instance(s) in zone us-west sends this period to other zones.

21.13.5.4 Resharding a bucket index

Important
Important

Resharding a bucket index in a multi-site setup requires a full resynchronization of the bucket content. Depending on the size and number of objects in the bucket, this is a time- and resource-intensive operation.

Procedure 21.2: Resharding the bucket index
  1. Make sure that all operations to the bucket are stopped.

  2. Back up the original bucket index:

    cephuser@adm > radosgw-admin bi list \
     --bucket=BUCKET_NAME \
     > BUCKET_NAME.list.backup
  3. Disable bucket synchronization for the affected bucket:

    cephuser@adm > radosgw-admin bucket sync disable --bucket=BUCKET_NAME
  4. Wait for the synchronization to finish on all zones. Check on master and slave zones with the following command:

    cephuser@adm > radosgw-admin sync status
  5. Stop the Object Gateway instances. First on all slave zones, then on the master zone, too.

    cephuser@ogw > systemctl stop ceph-radosgw@rgw.NODE.service
  6. Reshard the bucket index on the master zone:

    cephuser@adm > radosgw-admin bucket reshard \
      --bucket=BUCKET_NAME \
      --num-shards=NEW_SHARDS_NUMBER
    Tip
    Tip: Old bucket ID

    As part of its output, this command also prints the new and the old bucket ID.

  7. Purge the bucket on all slave zones:

    cephuser@adm > radosgw-admin bucket rm \
      --purge-objects \
      --bucket=BUCKET_NAME \
      --yes-i-really-mean-it
  8. Restart the Object Gateway on the master zone first, then on the slave zones as well.

    cephuser@ogw > systemctl restart ceph-radosgw.target
  9. On the master zone, re-enable bucket synchronization.

    cephuser@adm > radosgw-admin bucket sync enable --bucket=BUCKET_NAME

21.13.6 Performing failover and disaster recovery

If the master zone should fail, failover to the secondary zone for disaster recovery.

  1. Make the secondary zone the master and default zone. For example:

    cephuser@adm > radosgw-admin zone modify --rgw-zone=ZONE-NAME --master --default

    By default, Ceph Object Gateway runs in an active-active configuration. If the cluster was configured to run in an active-passive configuration, the secondary zone is a read-only zone. Remove the --read-only status to allow the zone to receive write operations. For example:

    cephuser@adm > radosgw-admin zone modify --rgw-zone=ZONE-NAME --master --default \
                                                       --read-only=false
  2. Update the period to make the changes take effect:

    cephuser@adm > radosgw-admin period update --commit
  3. Restart the Ceph Object Gateway:

    cephuser@adm > ceph orch restart rgw

If the former master zone recovers, revert the operation.

  1. From the recovered zone, pull the latest realm configuration from the current master zone.

    cephuser@adm > radosgw-admin realm pull --url=URL-TO-MASTER-ZONE-GATEWAY \
                               --access-key=ACCESS-KEY --secret=SECRET
  2. Make the recovered zone the master and default zone:

    cephuser@adm > radosgw-admin zone modify --rgw-zone=ZONE-NAME --master --default
  3. Update the period to make the changes take effect:

    cephuser@adm > radosgw-admin period update --commit
  4. Restart the Ceph Object Gateway in the recovered zone:

    cephuser@adm > ceph orch restart rgw@rgw
  5. If the secondary zone needs to be a read-only configuration, update the secondary zone:

    cephuser@adm > radosgw-admin zone modify --rgw-zone=ZONE-NAME --read-only
  6. Update the period to make the changes take effect:

    cephuser@adm > radosgw-admin period update --commit
  7. Restart the Ceph Object Gateway in the secondary zone:

    cephuser@adm > ceph orch restart@rgw