This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.

Defining Alarms

You have to define alarms to monitor your cloud resources. An alarm definition specifies the metrics to be collected and the threshold at which an alarm is to be triggered for a cloud resource. If the specified threshold is reached or exceeded, the alarm is triggered and notifications can be sent to inform users. By default, an alarm definition is evaluated every minute.

To handle a large variety of monitoring requirements, you can create either simple alarm definitions that refer to one metrics only, or compound alarm definitions that combine multiple metrics and allow you to track and process more complex events.

Example for a simple alarm definition that checks whether the system-level load of the CPU exceeds a threshold of 90 percent:

cpu.system_perc{hostname=monasca} > 90

Example for a simple alarm definition that checks the average time of the system-level load of the CPU over a period of 480 seconds. The alarm is triggered only if this average is greater than 95 percent:

avg(cpu.system_perc{hostname=monasca}, 120) > 95 times 4

Example for a compound alarm definition that evaluates two metrics. The alarm is triggered if either the system-level load of the CPU exceeds a threshold of 90 percent, or if the disk space that is used by the specified service exceeds a threshold of 90 percent:

avg(cpu.system_perc{hostname=monasca}) > 90 OR
max(disk.space_used_perc{service=monitoring}) > 90

To create, edit, and delete alarms, use Monitoring > Alarm Definitions.

The elements that define an alarm are grouped into Details, Expression, and Notifications. They are described in the following sections.

Details

For an alarm definition, you specify the following details:

  • Name. Mandatory identifier of the alarm. The name must be unique within the project for which you define the alarm.

  • Description. Optional. A short description that depicts the purpose of the alarm.

  • Severity. The following severities for an alarm are supported: Low (default), Medium, High, or Critical.

    The severity affects the status information on the Overview page. If an alarm that is defined as Critical is triggered, the corresponding resource is displayed in a red box. If an alarm that is defined as Low, Medium, or High is triggered, the corresponding resource is displayed in a yellow box only.

    The severity level is subjective. Choose a level that is appropriate for prioritizing the alarms in your environment.

Figure 4.2. Creating an Alarm Definition

Creating an Alarm Definition

Expression

The expression defines how to evaluate a metrics. The expression syntax is based on a simple expressive grammar. For details, refer to the monasca API documentation.

To define an alarm expression, proceed as follows:

  1. Select the metrics to be evaluated.

  2. Select a statistical function for the metrics: min to monitor the minimum values, max to monitor the maximum values, sum to monitor the sum of the values, count for the monitored number, or avg for the arithmetic average.

  3. Enter one or multiple dimensions in the Add a dimension field to further qualify the metrics.

    Dimensions filter the data to be monitored. They narrow down the evaluation to specific entities. Each dimension consists of a key/value pair that allows for a flexible and concise description of the data to be monitored, for example, region, availability zone, service tier, or resource ID.

    The dimensions available for the selected metrics are displayed in the Matching Metrics section. Type the name of the key you want to associate with the metrics in the Add a dimension field. You are offered a select list for adding the required key/value pair.

  4. Enter the threshold value at which an alarm is to be triggered, and combine it with a relational operator <, >, <=, or >=.

    The unit of the threshold value is related to the metrics for which you define the threshold, for example, the unit is percentage for cpu.idle_perc or MB for disk.total_used_space_mb.

  5. Switch on the Deterministic option if you evaluate a metrics for which data is received only sporadically. The option should be switched on, for example, for all log metrics. This ensures that the alarm status is OK and displayed as a green box on the Overview page although metrics data has not yet been received.

    Do not switch on the option if you evaluate a metrics for which data is received regularly. This ensures that you instantly notice, for example, that a host machine is offline and that there is no metrics data for the agent to collect. On the Overview page, the alarm status therefore changes from OK to UNDETERMINED and is displayed as a gray box.

  6. Enter one or multiple dimensions in the Match by field if you want these dimensions to be taken into account for triggering alarms.

    Example: If you enter hostname as dimension, individual alarms will be created for each host machine on which metrics data is collected. The expression you have defined is not evaluated as a whole but individually for each host machine in your environment.

    If Match by is set to a dimension, the number of alarms depends on the number of dimension values on which metrics data is received. An empty Match by field results in exactly one alarm.

    To enter a dimension, you can simply type the name of the dimension in the Match by field. The dimensions you enter cannot be changed once the alarm definition is saved.

  7. Build a compound alarm definition to combine multiple metrics in one expression. Using the logical operators AND or OR, any number of sub-expressions can be combined.

    Use the Add button to create a second expression, and choose either AND or OR as Operator to connect it to the one you have already defined. Proceed with the second expression as described in Step 1 to Step 6 above.

    The following options are provided for creating and organizing compound alarm definitions:

    • Create additional sub-expressions using the Add button.

    • Finish editing a sub-expression using the Submit button.

    • Delete a sub-expression using the Remove button.

    • Change the position of a sub-expression using the Up or Down button.

Note

You can also edit the expression syntax directly. For this purpose, save your alarm definition and update it using the Edit Alarm Definition option.

By default, an alarm definition is evaluated every minute. When updating the alarm definition, you can change this interval. For syntax details, refer to the monasca API documentation on Alarm Definition Expressions.

Notifications

You can enable notifications for an alarm definition. As soon as an alarm is triggered, the enabled notifications will be sent.

The Notifications tab allows you to select the notifications from the ones that are predefined in your environment. For a selected notification, you specify whether you want to send it for a status transition to Alarm, OK, and/or Undetermined.